Geolocation verification with AI tools, what's our solution?

September 12, 2024, 9 minutes read

Geolocating where an image or a video was taken is a tedious job. We look at the importance and challenges and explain why and how we’ve built our own tool to make geolocation verification easier.

Why geolocation verification is so important

Very often, people claim that images or videos were been taken in a specific spot, and it quickly turns out they weren't. In uncovering a piece of disinformation, knowing the exact location can get you quite far: You can check the actual spot of events, verify the time of a recording, or double check against other pieces of information on the same event. So literally “knowing where to start” is key to successfully debunking false information online.

What’s the problem?

In many cases, the process of finding the exact location of an image or a video involves a lot of manual work. You need to develop a gut feeling for a general location based on architecture, flora and fauna, or even just the vibe of a place. You need to detect specific elements in the image like landmarks (e.g. a specific bridge or statue) or typical patterns (e.g. a coastline) that can help you narrow down the area. In many cases experts spend lots of time on satellite or mapping services to match the information patterns they found to a precise set of geo coordinates. There are some tools available to help translate the patterns, but they are either not precise enough or (too) difficult to use for a broad audience.

The road travelled

In 2022, we started a project on building new OSINT tools, funded by the BKM. One thing that really inspired us and that we wanted to improve was Overpass Turbo (OT). OT is a very helpful, yet not so easy to use, programmable user interface for Open Street Map (OSM), the open source mapping service. It lets you search for locations in OSM by referencing several objects you’ve spotted in your image/video and putting them in relation to each other. If you know the Overpass Turbo Query Language, you can easily search for something like "a garbage bin 15 m away from a bench in a park and a pharmacy within 100 meters, in Straßbourg". OT will highlight all possible hits matching that query (within the range of the available data) on a map.

While this sounds easy and straight forward, the difficult part is the proper code. You must know how to write technical queries and have knowledge of the OSM tagging system in order to receive usable results. As for as we know, this also applies to all other interfaces built to improve OT (e.g. whatiswhere or the OSM-Search by Bellingcat).

“Why can’t we just write a question into a system and get relevant results?”, is what we asked ourselves. "Can we make geolocation as easy as writing a simple question?" That’s how we started developing SPOT.

Can generative AI help?

When our project kicked off, it was pre-“generative AI-revolution” time. We had to find ways to get from a sentence to OT Query Language. We decided to try an older, open source large language model called T5. While we were preparing the system for training, ChatGPT and its competitors were introduced to the public, changing the whole game.

People tried using the power of LLMs to directly translate from natural language to OT Query Language. However, success was limited, as ChatGPT made too many mistakes. This made it clear to us that we had to fine-tune any LLM that we would implement to our specific use case.

Screenshot Twitter/X by AricToler, (formerly) of Bellingcat, about the lack of success while using ChatGPT for writing OT Prompts — Screenshot of a conversation on Twitter (now X) by Aric Toler, formerly BellingCat member, about the use of ChatGPT for creating OT querries

With the introduction of image recognition capabilities in LLMs, people switched to searching for locations by uploading images in ChatGPT and requesting a location. While the results were quite astonishing at first, it soon became clear that the systems were prone to make mistakes. The results were mostly indicative, a good estimation of an area, but not a precise location.

Screenshot of Twitter/X by Roland Vergeer about the capabilites of ChatGPT 4 Vision for OSINT — Screenshot of a demo of the visual capabilities of early ChatGPT 4V

These developments showed us that our work was still highly relevant as the LLM based geolocation verification options were extremely interesting, but far from accurate (or at least accurate enough). However, the new models helped us improve our system setup and build a stronger foundation for what we had in mind with SPOT.

What is SPOT?

SPOT is our answer to making the geolocation challenge easier: An AI-assisted geolocation service that works with natural language prompts. It requires no coding skills and builds on the precision of OT.

OT showed us that OSM tags, the basic information units on the mapping service, work well, and that we can search for geospatial information patterns in OSM. So we've built an AI pipeline that translates natural language queries via an Intermediate Representation (IMR) into patterns on a map.

The SPOT Interface displays those patterns and allows users to check the location on Google Street View or similar services right away. Saving or sharing the results is also possible. For the prompt “a parking space within 200 meters from a communication tower and a power tower in Legnago, Italy” SPOT returns one single result – and it's accurate. Every time we prompt it.

Screenshot of the SPOT application user interface showing results for a natural language search on a map — Screenshot of the SPOT User Interface highlighting the entities fitting the search from the prompt window

We are very excited that we've come this far, but of course SPOT isn't perfect yet. The application is limited by the accuracy of the LLMs (currently Llama3 8B) when it comes to understanding and translating user prompts into OSM tags. Sometimes we have to rephrase our prompts a couple of times to get relevant results. We are constantly working on making SPOT better by retraining the model, benchmarking, analyzing mistakes, adding OSM tags, etc. This is not a trivial task and takes up most of our time.

The application is also limited to objects that are listed and correctly tagged in OpenStreetMap: We can't find a park that is marked as a public square. The good thing is that the OSM community is constantly working on improving the OSM data which will reflect positively on the accuracy of results.

Want to try SPOT yourself?

SPOT has been released as free public Beta version on findthatspot.io early December 2024. Test it yourself and please drop us a line with your feedback at hey@findthatspot.io.

Authors

Ruben Bouwmeester

Tilman Wagner