AI for Content Verification I: Status Quo and Current Limitations,
In the scope of the EU co-funded AI4Media project, we recently contributed to a long, public report entitled “AI technologies and applications in media: State of Play, Foresight, and Research Directions”. For better accessibility and readability, we’ve now turned our section into a multi-part blog series (with a couple of edits and updates). Here’s part one, which looks at the status quo and current limitations of AI in the field of content verification.
The phenomenon of online disinformation has evolved since around 2010 and can be defined as “false, inaccurate, or misleading information designed, presented and promoted to intentionally cause public harm or for profit”. There are actually many definitions for disinformation. In this case, we have chosen the one first presented by Claire Wardle and Hossein Derakhshan in “Information Disorder: Toward an interdisciplinary framework for research and policy making.” (2017); this definition is also used in the report issued by the European Commission’s High Level Expert Group on Fake News and Online Disinformation.
While the spreading of false or manipulative information has occurred for centuries, the significance and negative impact of the phenomenon has increased with the emergence of social media, digital information production and consumption as well as advances in technology, including artificial intelligence (AI). Although the effects of online disinformation have been addressed by fact checking and verification specialists for almost one decade, events like the US presidential election in 2016 and the Covid-19 pandemic have brought to attention the significant risks for individuals and democratic society in general.
Many different stakeholders are engaged in the process of content verification: not only social media platforms, fact-checking initiatives, open-source intelligence specialists, news media organizations and academia, but also government agencies, educational institutions, and civil society initiatives.
One or more of the following interconnected approaches generally play an important role in content verification:
- verifying content (e.g., posts, pictures, videos) and social media accounts
- checking non-factual statements (claims) made by public figures
- identifying disinformation narratives/stories in social media
- setting up media literacy and education/training programmes
- establishing self-regulation schemes and regulatory frameworks
- developing counteractive methods, technologies, and support tools
The shortcomings of AI tech in disinfo countering
For several years now, AI technologies have played an important role in the process of content verification. They are key components of the tools and systems used by fact-checkers and disinformation analysts – and can deliver remarkable results. Meanwhile, the need for AI-driven tools is increasing, which has mainly two reasons:
- The frequency and scope of disinformation has grown to a level that simply can’t be managed manually anymore.
- Bad actors use advanced AI and automation for targeted campaigns, content manipulation or synthetic media production – which in turn can only be detected by very advanced systems.
Even though AI functionality is available in content verification, it has a number of shortcomings and limitations and thus cannot ensure long-term success in exposing and counteracting disinfo.
Here’s a comprehensive look at what’s there and what’s missing:
Specific? Yes. Complex? No.
Most AI solutions today are good at specific, narrowly defined tasks that can help identify disinfo elements and claims in social media. Examples are:
- reverse image search
- geo search
- detection of bot accounts
- comparison of digital content to detect changes/manipulation (text, pictures, video)
- detection of deepfake face-swaps in photos or videos
- audio analysis to detect manipulation
- key word scans in larg large data repositories
- analysis of certain aspects of content
- analysis of relationships between accounts
What AI cannot do yet is:
- detect and analyze entire, complex disinformation narratives
- handle more complex tasks across social/digital platforms that involve multimodal data types
- cover all aspects and types of synthetic media detection/manipulation.
Wanted: more and better data
There is also the challenge of suitable datasets for AI solutions:
Although there is research on multimodal approaches, the software in practical use at the moment mostly relates to one data type (text or image or audio) and/or one content source (e.g. Twitter). Furthermore, there’s the challenge to collect quality datasets, which in many cases only relate to one domain (e.g. politics). In addition, many datasets are difficult to obtain, maintain, and exploit – in the light of regulations, intellectual property rights (IPR), ethical requirements, or terms and conditions.
Can we rely on the machine?
Current AI-powered tools do a great job at speeding up the verification process, reducing stress, and cutting costs, but they still depend on manual pre-detection (e.g. presenting the AI with a suspicious photo or video) and human oversight (manual analysis of results or interim results).
Where semi or full automation could be technically achieved, humans usually don’t trust the machine – for good reasons such as the lack of accuracy or contextual knowledge. So far, approaches for true human-machine collaboration or socially acceptable automation are very limited.
Most current AI functions and services used in the context of content verification are accuracy- and performance-oriented. Third party providers provide little or no information about the AI function/model itself and its legal compliance. They usually don’t explain details, mitigate potential bias or increase reliability/robustness. This lack of responsible/trustworthy AI has an impact on content verification for various reasons:
- in this domain, many decisions are related to the complex concept of “truth”
- in contrast to those who use AI applications primarily for commercial purposes, the work of fact checkers, verification specialists or journalists is also influenced by immaterial aspects (e.g. societal and public value systems)
- fact checkers, verification specialists, journalists, or open-source intelligence analysts are usually very curious, show attention to detail and question any piece of information before using it further
- all stakeholders (specialist staff, editorial managers, boards) are bound by editorial control rules (e.g. dual control principles), journalistic codes and specific organizational values as well as legal frameworks related to publishing/journalism
In short, the current lack of responsible/trustworthy AI can lead to a lot of (entirely justified) critical questions within media organizations and – as a result – reduce the acceptance and use of AI-driven tools there.
AI vs. UI
While there already are many useful stand-alone AI software solutions, it remains difficult to “translate” AI insights and predictions, turn everything into handy components, and/or visualize results on dashboards that are adequate for less tech-savvy users in content verification. In other words: AI also has a UX/UI problem. There are still huge gaps between AI algorithms, dockerized containers, APIs and user-friendly web interfaces.
Where are the tools we need?
So far, various specialist initiatives and projects have conducted research and delivered AI-functions for content verification at the level of research outcomes, piloted prototypes, or open-source solutions. However, the market is highly fragmented and consists of many small players. There is a very limited number of commercial solutions (like Truly Media) that specifically target the fact checking and verification workflow. Some are hosted outside of Europe. Some of them have yet to reach maturity.
In part two of this series (coming soon) we’ll look at research areas to be explored and key challenges to be tackled by the AI for content verification R&D community.