Being able to generate synthetic media content “at the push of a button” is as fascinating as it is dangerous. On the one hand, there is the significant boost in productivity and creativity. On the other hand, there is the dystopian vision of a technology that fosters disinformation and violates a plethora of rights.
The term “synthetic media” refers to texts, sounds, photos, videos, and other digital assets that have been generated or significantly manipulated by computers. Their rising popularity goes back to major advances in the field of artificial intelligence (AI), particularly with regard to machine learning (ML) and the creation of generative adversarial networks (GANs). Ultimately, modern media synthesis is about autonomous, automated, efficient recognition and reproduction of patterns in writing, sound, and images.
The importance of synthetic media (and other synthesized things) can hardly be overstated. Futurists like Amy Webb say that the “synthetic decade” is upon us. It is already impossible to imagine journalism and digital media without synthetic content, especially when all its varieties are included.
An early example of natural language generation (NLG) comes from AP: In 2014, the American news agency launched a project that established the automated production of short business articles via AI algorithms, with minimal or no human intervention. In the meantime, this “robot journalism” has made its mark at various media companies across the globe. First and foremost, well-structured facts-and-figures reports have been fully computerized. In addition to economic data, the focus is on weather, elections or sporting events. AI tools can now produce a large number of surprisingly solid posts in a very short time. This (ideally) enables more comprehensive standard reporting and frees up a lot of time for human editors who can now focus on analytical texts and comments. The power of NLG tools like GPT-3 and authoring tools like Arria NLG Studio is likely to give synthetic text generation a further boost in the medium term.
In the audio sector, synthesis plays a major role when it comes to voice control. Alexa, Google Assistant, Siri etc. facilitate access to journalistic offerings. There is also a trend toward incorporating read-aloud functions into apps and websites. In 2020, the Washington Post announced that it would henceforth offer all articles with an audio option, made possible by native Android and iOS text-to-speech (TTS) functions. That same year, the BBC partnered with Microsoft to build its own synthetic voice to read aloud text on its main news site. Sophisticated TTS software (e.g. Amazon Polly) is also being used by many media organizations to generate voice-over tracks for multilingual video content without much effort. And there is yet another use case: Powerful AI music synthesizers like Splash and Jukedeck (already acquired by TikTok) can speed up and facilitate the process of creating signature tunes and even entire movie scores.
A prominent example for synthetics in the photo sector are the face filters on Instagram (and other platforms), which users can also build themselves. A particularly powerful tool for technophile photo editors is Luminar AI. Reporters who need to protect their privacy during investigative research on the web can turn to services like “This Person Does Not Exist” to find a rather convincing, but completely artificial avatar.
The pinnacle of synthesis is AI-based generation of AV content. What was originally reserved for Hollywood productions (example: AI characters in Rogue One – A Star Wars Story) has now arrived in the normal media business. The Chinese news agency Xinhua can be considered a pioneer here, presenting a deceptively real AI news anchor as early as 2018. A good example for semi-synthesis would be the BBC’s 2019 AI weather report experiment. In this case, a team of technologists recorded a number of short, generic videos with a real weatherman – and later on had them pieced together by an algorithm that followed specific instructions (“fake AI”) and thus produced a large number of customized reports.
100% artificial and very successful are the so-called VTubers (= virtual YouTubers), who use a digital alter ego for their web video entertainment. A particularly well-known character is Code Miko (Twitch) or rather the engineer behind it: She uses a motion capture body suit to make her synthetic avatar look extra realistic.
Abuse of synthetic content and deepfakes
Sadly, but not surprisingly, AI-driven media synthesis is not only used for journalistic work and harmless shenanigans. It is also a convenient tool for shady characters and criminals. For example, software like GPT-3 can be used to generate massive amounts of written disinfo (which is then automatically disseminated on social media by bots).
A particularly problematic trend in the AV sector began in 2017: This year saw the first appearance of semi-synthetic pornography, in which the faces of the original actors and actresses had been swapped for those of celebrities, all with the help of AI. The consequences of this were discussed almost immediately. In this context, the term “deepfake” was coined, a portmanteau of “deep-learning” and “fake” that puts people on alert up to the present day. Later on, “face swapping” was also used on a massive scale to expose and shame random internet users – mostly women. In addition, fully synthetic pornography emerged.
In journalism, deepfakes play a role primarily when it comes to disinformation and propaganda – or verification and media literacy. Outstanding examples include: Ali Bongo as Ali Bongo (actually real), Deepfake Data King Mark Zuckerberg (an art project), and Claire Wardle as Adele (an educational video). Hazel Baker and Reuters also experimented with Deep Fakes – and learned a lot about ways to debunk them.
The Deep Fake’s little sister is the less engineered and easier to create Shallowfake, which many commentators consider an even greater danger. Exhibit A: the allegedly drunk Nancy Pelosi.
Both rather funny and highly problematic deepfakes are about to flood the mainstream internet: Between 2019 and 2020 alone, their number is said to have increased from just under 15,000 to about 100 million (s. Sentinal Report PDF).
In addition to ethical and moral discussions, the rise synthetic content has also kicked off a complex legal debate that can only be briefly touched upon here. Guiding questions are: What data is the content based on? Am I allowed to use it? Who is the creator/copyright holder of an AI work? There is also a number of grave concerns when it comes to personal rights or rather: their systematic violation with the help of cutting-edge AI technology.
How is DW exploring the topic?
Deutsche Welle has been active in the field of synthetic media for quite a while now. For example, the plain X production platform developed by DW Innovation and external partners can generate artificial voice-overs (among other things). The Digger project plunged into the topic of deep fakes, shallow fakes and digital audio forensics, and even before that the Reveal consortium dealt with digital image forensics. In AI4Media, “advances in content synthesis” play an important role, and there is special focus on “trustworthy AI”.
New projects are already in the making, and members of our human language technology and verification focus groups are constantly reading up on new developments in the field of artificially generated media. The AI-driven, synthetic future will be incredible – and incredibly scary. Our task is to a) understand what the tech does to journalism and media and b) figure how we can put it at service for an inclusive, democratic society.
Author: Alexander Plaum
Photo: Synthetics #2 by MRfrukta