On the left: a researcher on a computer. On the right: Another researcher in protective clothes holding a warning sign. Illustration: Y. Dwiputri & Data Hazards Project / Better Images of AI
AI & Automization, Verification

GAI and LLMs in Content Verification: Proceed. With Caution.

In this post, we will focus on the potential of utilizing generative artificial intelligence (GAI) and large language models (LLMS) for content verification – with a reference to DW's own use cases in that domain (and those of our partners). We will try and answer questions like: What is the technical status quo in the domain of fact-checking? What are the limitations of AI-driven tools used there? What do GAI and LLMs bring to the table? What are the benefits and challenges of the new technology? Who is experimenting with what kind of tool? And: What about the latest trends, potential impact, and future research? The text is a slightly edited version of a chapter we recently submitted for an AI4MEDIA whitepaper that will be published later this year.

In mid-2024, media professionals face an ever-growing tide of mis- and disinformation: Manipulated content, fabricated content, content taken out of context – and many other varieties. Whatever the type of false info, it is all getting a boost in scale thanks to AI technology and a wide range of bad actors. As a result, the public is becoming increasingly skeptical of what is presented as "a fact" – and that poses a major problem for democratic societies. Legacy media institutions struggle to keep pace with the real-time nature of online disinformation. Social media platforms and messengers accelerate the process of spreading false narratives.

Up until recently, fact checking and verification organizations mostly relied on a combination of human expertise, digital tools, and semi-automation. More specifically, journalists and investigators would trust their experience and (hopefully) sharp senses while simultaneously utilizing software capable of finding sources, highlighting keywords, or analyzing patterns – to give just a few examples.

The exhaustive Bellingcat toolkit or the smaller, but entirely self-developed collection of WeVerify tools (plus respective communities and tutorials) are good examples for this approach. Another one is the suite created by Full Fact and Google, which has also received global attention; it was, for instance, used by Nigerian fact-checkers when their fellow citizens went to the polls in 2023. The claim search, claim alert, and realtime transcription functionalities in this example already made extensive use of modern AI technology, but still strongly depended on human investigators setting up and analyzing everything. Finally, there is the AI4Media demonstrator, a core outcome the R&D mentioned in the beginning. The prototype comprises nine smart verification tools (with a focus on forensics), is even more progressive, and draws on cutting-edge AI models and systems–albeit not the latest generation of them. As an R&D experiment, the AI4Media demonstrator (which also makes use of the established Truly Media platform) has only been operated by test users so far, not by actual investigators in a newsroom.

"Traditional" tools, proto-AI, and a brick wall

Unfortunately, in an information sphere influenced by the rise of synthetic media and a global polycrisis, both the "traditional" and the proto-AI concepts outlined above are hitting a brick wall. The sheer volume of information makes it difficult for fact-checkers to keep pace. Standard algorithmic tools struggle with nuanced language, semiotics, and context; they are also susceptible to bias. And more advanced tools, especially in the domain of AV forensics, are always at risk of being outpaced and outsmarted, then failing to detect manipulations and disinformation.

Here is merely a quick overview of the challenges (discussed at length in this Reuters Institute article):

  • the tools are trained on limited datasets and thus do not perform well on real-world content that includes blurry images and other irregularities

  • simple manipulations like cropping or changing the resolution of a file can confuse the tools

  • detection results can be misleading – and always require an understanding of how a specific tool works

  • deepfake creators can adapt to novel techniques – and make fake content undetectable once again; it seems that the exploitation of generative adversarial networks (GANs), diffusion models, and neural radiance fields (NeRFs) were only the beginning

Blurry lines and and reactive methods

On top of that, the lines between real and synthetic content are starting to blur: Modern smartphone photo apps tend to use AI for image editing or quality enhancement – which may result in images that are essentially authentic, but look somewhat fake.

On a meta level, there is the question whether or not it still makes sense to stick to reactive methods, i.e. to the strategy of debunking false narratives after they have already gained traction. Some experts also wonder if the verification game can still be won, considering the (literal) armies of bad actors and the next-level AI technologies enlisted by them.

In any case, fact-checkers and verification experts need to step up their game.

The promise of a new approach

Enter GAI and LLMs. These technologies promise a new approach to verification and fact-checking. OpenAI's (proprietary) GPT-4 or Meta's (open-sourced) Llama 3 can be trained on massive datasets of text and code. Despite a couple of flaws (which will be discussed later), they perform rather well when it comes to extracting knowledge, highlighting relevant info, understanding context, compiling and matching entities, answering complex questions, translating, shortening, and generating text. With regard to specific verification tasks, these powerful tools can

  • extract claims from articles, social media posts, or transcribed interviews

  • flag claims that seem suspicious based on language patterns or factual inconsistencies

  • check suspicious claims against credible sources

  • analyze the credibility of a source by examining historical accuracy, domain expertise, and potential biases

  • scan and interpret vast amounts of information in a minimum of time (thus also allowing real-time checks)

  • generate clear and concise summaries of fact-checks (thus aiding in the dissemination of truth)

Case studies and examples

One of the earliest fact-checking experiments involving a cutting-edge LLM was carried out by Politifact in early 2023. The organization explored if ChatGPT was able to do the job of a human fact-checker. The engine performed poorly. It got some answers right, but also made a lot of mistakes due to lack of recent information, inconsistency in responses depending on question phrasing, and a focus on providing users what ChatGPT thought they would like to hear. The system also had difficulty understanding context and nuance, made factual errors, and invented information (AI hallucination).

In an interview on the Inria Blog published in early 2024, representatives of the French research institute and the broadcaster France Info had more positive news to share regarding AI and fact-checking. They mostly discussed the functionality and solid performance of two tools developed by Inria and partners: StatCheck and ConnectionLens. StatCheck is an automated fact-checking program that verifies information by comparing it with large amounts of data (e.g. official statistics databases). It can also understand natural language (e.g. to analyze social media posts). ConnectionLens is a tool that can be used to cross data sources in all sorts of formats and from all sorts of origins. It is useful when it comes to identifying potential conflicts of interest, for example.

On April 4th, 2024 (International Fact-Checking Day), Indian media organization Factly introduced their work on the Poynter portal. Factly, which has to address the challenges of misinformation in a country with many languages and a large internet user base, currently uses two main products driven by AI: The first, Dataful, seems quite similar to what the software Inria has been building: It is basically a portal that provides access to public government data sets. This makes it easier for fact-checkers to find the information they need. The second product, Tagore, shows similarities to ChatGPT, but there is much more focus and customization. Factly describes the tool as a generative AI platform that uses chatbots. These chatbots are built on custom databases that the organization has created over time. There are different chatbots that focus on different areas, such as SACH (that goes through fact checks) and Parlens (that searches data from the Indian Parliament).

An interesting special purpose tool harnessing the power of LLMs was first introduced by DW Innovation in late 2023: SPOT is A Natural Language Interface for Geospatial Searches in OSM and helps investigators find and investigate a scene of news–and it does so much faster than a "traditional" tool like Overpass Turbo. The software is still in closed beta, but already works pretty well (according to test users). Another promising geolocation tool driven by cutting-edge technology is GeoSpy, which uses GAI to try and "guess" where an image was taken. Experts (e.g. at DW Innovation) say that it is quite good, but suffers from the typical GAI inconsistencies.

The lessons learned in the case studies and examples introduced here can be summarized as follows:

  • Generative AI and LLMs can make some parts of fact-checking and verification faster and more efficient

  • Generative AI and LLMs currently lack the ability to accurately determine what is true and what is false–and run the risk of producing hallucinations

  • Human fact-checkers remain essential for critical thinking, understanding the entire workflow, and ensuring accuracy

Integrating GAI and LLMs: benefits and challenges

On the whole, GAI and LLMs offer promising solutions to expedite and enhance verification processes–but their integration comes with its own set of challenges.

The biggest benefit of the technology is its ability to analyze massive amounts of text/data and thus identify factual inconsistencies and/or disinformation at scale.  Especially when fitted with a smooth, user-friendly interface (like a chatbot accepting complex prompts in everyday language), those tools can significantly reduce the manual workload for human fact-checkers – as already documented.

Regarding ethical concerns, there is the problem that LLMs trained on biased datasets can and will perpetuate those biases in their outputs. This necessitates careful selection and curation of training data. It is especially important to make LLMs fit for smaller or Non-Western markets, where Euro- and US-centric tools often fail to capture linguistic and cultural nuances, thus rendering them impractical for fact-checkers. Another big issue is that the inner workings of LLMs can be opaque, making it difficult to understand how they arrive at their conclusions. This lack of transparency can hinder trust and accountability in the fact-checking process. Solutions to this problem include: thorough research, the development of trustworthy AI concepts, and excellent documentation. But then of course, there is still the problem of hallucination, already addressed in the "case studies" section: It is significant, because it means that all machine-based fact-checks eventually need a fact-check of their own. And it will not go away soon, because to a system based on statistics and probability, virtually every piece of information is likely to be true at some point.

Last, but not least, there is the issue of computational power: Training and running GAI tools/LLMs requires a lot of resources. This is likely to have a negative impact on the environment (as most servers produce a lot of greenhouse gas emissions), and it can be a barrier for smaller fact-checking organizations (who may not be able to pay for expensive infrastructure).

Key trends in verification with GAI – and alternative approaches

Unsurprisingly, the field of generative AI for verification and fact-checking is evolving fast. Key trends include:

  • fact-checking with retrieval augmented generation (RAG): Researchers and practitioners have started working on concepts that combine more traditional database queries (which produce reliable answers) with LLM architectures (which guarantee natural language communication and output that is easy to understand)

  • fact-checking with explainable AI: Researchers have started looking into methods and features that allow LLMs to explain their reasoning and present some sort of evidence trail. This 2023 paper provides a survey of what is called "rationalization for explainable NLP"

  • multilingual disinformation detection: As false narratives transcend borders, projects like the Multilingual Fact-checker Chatbot (funded by EMIF) try to offer AI-driven fact-checking assistance to a global target group.

  • advanced deepfake and synthetic media detection: Several research groups are working on tools that are able to detect highly sophisticated fake media content. A prominent example is FakeCatcher, built by Intel in cooperation with SUNY Binghamton. Among other things, the software analyzes pixels indicating the blood flow of a subject to see if the image is real or synthetic.

It is also worth noting that major players like Open AI have started working on detection tools for their own GAI (e.g. there is the DALL-E Detection classifier, which unfortunately is a very mixed bag). At the same time, IT companies like The Hive, who train their own models and offer all sorts of AI-driven services to their customers, have included "AI-Generated Image Detection" as a standard service in their (commercial) toolkit.

As for alternative approaches that focus on the verifiable origin of content, it is important to mention the Coalition for Content Provenance and Authenticity (C2PA) – which basically combines the work of the Content Authenticity Initiative and Project Origin. In this alliance, big tech and media players like Adobe, Microsoft, or the BBC are trying to create standardized ways of embedding metadata and labeling trusted sources. The general idea is to let users check where an asset comes from, when and by whom it was modified – and if there are any irregularities. At the same time, companies like Liccium work on the ISCC standard which can verify content ownership of photos and videos, even if they no longer contain metadata.

Many journalists and educators also push for more AI literacy – and better media and information literacy (MIL) in general. They explain how to spot deepfakes (e.g. in these posts by MIT Lab, DW, or AP), curate entire learning guides (e.g. Tackling Disinformation by DWA) or host workshops for organizations dedicated to critical thinking (e.g. Lie Detectors)

The (potential) impact of GAI in verification

Coming back to the approach of verifying/falsifying via technical content analysis (which is, after all, the focus of this post) – we can state that the integration of GAI seems promising. Its impact may include:

  • a reduced fact-checking workload: automating laborious tasks like claim identification and basic verification can free up human expertise for in-depth analysis.

  • an increased fact-checking capacity: AI solutions could empower smaller organizations with limited resources and help fill blank spots on the global verification map

  • an enhanced public trust in information: By providing faster and more transparent fact-checking, generative AI can contribute to a healthier information ecosystem.
    A call for multi-stakeholderism

To harness the full potential of generative AI and LLMs in verification and fact-checking, collaboration between various stakeholders is crucial:

  • joint initiatives of tech companies and research institutions (potentially funded by the Horizon Europe or EMIF program) need to explore advanced AI architectures and interfaces

  • fact-checking organizations and media outlets (organized via the IFCN and similar groups) need to ensure AI-powered tools align with real-world needs and journalistic standards

  • policymakers and regulators need to establish guidelines and laws for responsible AI development and deployment

Conclusion

In summary, it is safe to say that GAI and LLMs will soon become essential in complex, potentially global verification workflows. They can provide insights and much needed assistance where more "traditional" solutions fail. As a matter of fact, investigators are likely to be rushed off their feet without the new tools.

However, the technology still has inherent flaws, and much more research is needed to ensure a responsible and effective use. There are a lot of ethical, technical, and social challenges to be considered. Users must stay alert, cautious, and skeptical. Developers must strive for transparency, explainability, and good design. In any case, GAI and LLMs can only be a success in the domain of verification if academia, industry partners, and media experts collaborate.

Developed and deployed in the right way, the technology can play a significant role in the fight against disinformation – and for a more informed public.

Author
team_alexander_plaum.jpg
Alexander Plaum