AI & Automization, XR & Immersive Journalism

We tried different ways to create a virtual presenter – here's what we've learned

April 17, 2024, 10 minutes read

What do you need to consider when creating an AI avatar? Which workflow is the best? Here's what our DW Lab "Avatario" team learned while creating a very special host for vertical video content.

Let's start this post with an important disclaimer: We don't intend to replace human hosts with artificial ones. And we're fully aware that AI-driven applications carry a significant risk of manipulation, discrimination and malfunction. There's also the question of general journalistic credibility. That's why DW guidelines state that "artificial intelligence is only used where it contributes to the fulfillment of our mission: to deliver high-quality, independent, impartial, diverse and trustworthy journalism."

With that in mind, we wanted to know: Is there a way to develop a trustworthy, virtual host for our tech magazine DW Shift? Because it would come in really handy, for two reasons:

Moderated reels fare much better on social media than unmoderated ones.
Smaller DW newsrooms often lack the resources to provide a host for a translated/localized version of a DW Shift Story – and thus have to post less successful standard clips.

So can an AI-based avatar fix that problem?

A non-human look

After an initial discussion, it quickly became clear that our AI avatar presenter would have to look non-human and non-photorealistic–and rather like a robot or digital entity. That's because a realistic avatar might have accidentally resembled a real person, potentially violating their personal rights. This is a concern that has been discussed extensively in the video game industry and, more recently, in relation to Generative AI. We didn't want to take any risks. Apart from that, we also wanted to make our avatar look "techy" enough, making sure it would be recognized as a part of the tech world. After all, its focus was on tech reporting, right? So this was an editorial decision.

2D or 3D? May the best workflow win!

Since our final prototype would result in a "flat" medium – i.e.: a vertical video – we weren't sure if we should create a 2D or a 3D avatar. Consequently, we decided to try both, aiming to develop a simple workflow for editors with lots of daily tasks and limited time/resources.

Workflow 1: Creating a 2D avatar with Midjourney, Blender, and D-ID

In this test, we started out with creating two images of a robot. The first one was put together with Midjourney (with a prompt à la "portrait of a friendly humanoid robot"), then remixed with another robot picture, then altered many times to change specific parts of the robot's face. The second image was designed from scratch in Blender–without the use of any AI tools. The designs were then fed into D-ID, an image-to-video tool that creates an animated clip based on an image and a script (and integrated speech synthesis). Well, both the look and the animation of our 2D character didn't convince us. The avatar's 'lip movements' seemed out of sync at times, resulting in an unwanted uncanny valley effect. Furthermore, using non-human faces as source material messed up the animation. Ultimately, the D-ID test failed for legal reasons: The platform's general terms and conditions didn't meet DW standards for copyright, usage, and data protection. We therefore decided not to pursue this workflow any further.

A little creepy, technically flawed, and not compliant with DW regulations: The D-ID/Midjourney Avatar

Workflow 2: Creating a 3D avatar with Blender, After Effects, ElevenLabs, and Adobe Character Animator

This time, we initiated the process in Blender, where we designed, animated and rendered the main avatar look. We then exported the 3D data (.FBX) to After Effects for camera and facial positioning. To give the avatar a synthetic voice, we also created a sound file with ElevenLabs. With the help of Adobe Character Animator, we analyzed the sound file and generated visemes (i.e. specific facial expressions used by animators to create the illusion of speech), which were subsequently exported to After Effects where they replaced the avatar's mask face. We also added eye animation there. Finally, we rendered the avatar clip with alpha for further editing in Adobe Premiere.

So what about the results? The good news: They were flawless. The bad news: The process was very time consuming.

Polished, popular, and rather difficult to create: The DW Blender/Adobe/ElevenLabs bot.

Workflow 3: Creating a 3D avatar with Blender and a virtual control room with Unreal

In this case, we imported the .FBX described above to Unreal.

We then set up a 3D level as a stage, placed virtual cameras and virtual lighting, and turned the avatar into a playable character – including a character blueprint, an animation blueprint, an idle animation, and a visme face. Done.

In this workflow, we eventually had to deal with different kinds of problems: First of all, video export was only possible via an external command-line encoder (FFMpeg) which led to a very slow rendering process. Secondly, video export from Unreal didn't support visemes, leaving the avatar without facial expressions – and thus making it useless.

Professional, metaversy, and impossible to export in video form: Our bot avatar in the Unreal Engine.

Workflow 4: Creating a 3D avatar and a video generator with Blender and XCode

Once again, we used the .FBX avatar put together with Blender (s. Workflow 2), but then went in a new direction. We created a prototype app for macOS and iOS:

Users type up or import a script, which the app will then turn into synthetic audio. Subsequently, the software uses the generated audio to create face animations, applied to the 3D model's face texture. The renderer combines face animations with predefined body animations. In the end, users look at an AV clip (with a clean background) that can be exported to any external video editing app.

The macOS/iOS apps already work pretty well, so this can be considered a solid proof of concept.

Allows users to create an avatar-hosted clip in no time: The Avatario iOS app.

Can our AI avatars act as web video hosts?

We're currently testing our 3D prototypes with selected users. The avatars fare relatively well, but people also tell us that there's room for improvement:

Some users find the avatar's movements too robotic and repetitive and suggest we should work on its general appearance. Some say we should make it more recognizable as a DW host ("Maybe it could wear a tie?", "This one looks like a character from a children's program"). Others express dissatisfaction with its voice ("too cold", "too impersonal", "not smooth enough").

NaN:NaN min

Not bad, but also not good enough for publishing: Our bot avatar speaking to you in – you've guessed it – synthetic Tamil.DW Lab

So our answer to the questions is: No, our AI avatars can't act as web video hosts – not yet. However, 3D tech and AI voice synthesis is getting better every day, and with a revamp of the avatar's look and animation, who knows what will happen? Maybe our avatars will really come to live in the future.

Special thanks to: Andy Giefer, Philip Kretschmer, Lars Jandel, Jens Röhr, Juan Gomez Lopez, Marie Kilg and everybody else who supported the Avatario project.

Authors

Daniela Späth

DW Innovation