What’s the state of the art when it comes to producing multilingual, accessible content? How can we monitor vast amounts of information and cluster it in terms of people, topics, and diversity characteristics? What’s new in the AI-driven newsroom? Those and similar questions were answered at the first SELMA User Day, hosted by DW in Bonn on October 12th.
The hybrid event (which also featured a live stream and remote attendants) brought together professionals from all sorts of European media organizations, including ARTE, the BBC, Lusa, and Priberam. It gave insights into the latest advances in language technology, and how it’s applied in the field of journalism and media production. And of course, it also showcased DW’s R&D projects in the field, some of which were launched no less than two decades ago (watch the session of our pioneering linguist Peggy van der Kreeft: Artificial Intelligence (HLT) & The Newsroom)
SELMA, Monitio, plain x – and other HLT auxiliaries
In terms of current projects and challenges, there was a spotlight on the H2020 project SELMA. Its consortium is building a (backend) software solution that allows for searches in extremely large data sets (up to 10 Million news items per day!) and enables media professionals to use advanced NLProc services (like transcription, translation, voice-over, summarization, or speaker recognition). This comes in very handy in a world of seemingly endless data and news items. More on this in the official SELMA trailer video.
Another powerful tool presented at the User Day was Monitio. The software currently crawls and analyzes up to 200,000 international news articles every day including DW feeds in 30+ languages. An important aim is to provide “big picture” information for better decision-making. Details are explained in the Monitio session video.
A very recent and special feature of Monitio is the display of diversity data for public figures (e.g. politicians), based on information retrieved from Wikidata. Which directly brings us to another well-received session: Counting Diversity explained how respective data can be used to support newsrooms in decision-making and producing content for diverse audiences. User soundbite: “This gives me an instant overview of our output – and our weak spots. A lot to think about!”
A third HLT heavy-weight showcased in Bonn was plain X, a SaaS solution that simplifies video adaptation workflows via speech-to-text, machine translation, subtitling etc. – and originated in a number of R&D projects undertaken by DW, Priberam, and other partners. Learn about the details in the plain X session video.
Other User Day sessions included the Laboratoire Informatique d’Avignon (LIA) discussing the creation and customization of synthetic voices. There was also an exhaustive demo of the Podcast Creator. Based on automated search, summarization, and speech synthesis tech, this tool allows for the creation of an “instant” podcast (more in this video). Last, but not least, a group of experts from different fields got together to discuss the “Footprint of AI & NLP Technology In The Media?” as well as new frontiers and ethical limitations. Watch the panel video here.
A general takeaway for everyone was the fact that HLT has indeed made tremendous progress. What required loads of resources only a couple of years ago, can now be done by a small team, on a budget. Journalists and media producers clearly benefit from this, but also have to tread carefully.
Outlook and resources
A second SELMA/HLT User day is coming up in 2023. Stay up-to-date by following SELMA and DW Innovation on Twitter.
To learn more about SELMA, the User Day, and the field of HLT, check out the project’s official website and repositories on GitHub as well as the event’s full video playlist on YouTube