It's funded by the Google Digital News Initiative (DNI) and aims to create an innovative prototype platform able to handle a full workflow of automated processes of transcription, translation, and voiceover in a large number of languages.
The ultimate goal is to upload any news video in any language–and make it available in virtually any other language (text and speech) within a couple of minutes. LETA has taken the part of the technical developers while we coordinate, gather requirements, and serve as beta testers.
How speech.media works
The speech.media platform receives Deutsche Welle content in all 30 languages through the DW API, then processes selected content consecutively through the speech-to-text API, the machine translation API, and the voiceover API.
A user interface shows a clear overview of the progress of the different steps:
- content download
- subtitling in the source language
- subtitling in the target language
- voiceover production
The user can select source and target languages and available content items, and display the automated output as either subtitles or voiceover. An editorial component allows media professionals to correct and enhance the automated output between each processing step, thus improving subsequent output.
Why we rely on third party technologies
- We can keep requirements and costs for in-house natural language processing (NLP) development to a minimum.
- Google, IBM and other third parties provide high-end tools that are constantly maintained and tweaked.
For example, Google Translate has shown tremendous improvement over the past year: Its sophisticated neural machine translation system is expanding to more and more languages. This has an immediate impact on speech.media, its quality and application range respectively.
How speech.media will help media organizations and end users
speech.media has a lot of potential. Right now, these seem to be its biggest advantages:
- It will help our colleagues make video content more accessible and searchable, independent of its source language. In a highly multilingual organization such as Deutsche Welle, this is a major benefit: speech.media will make lower-resourced languages more approachable and add to the value of produced content through reuse in other languages. Editors can quickly browse through information or "order" a full translation.
- The tool will allow end user to view video-on-demand in their language of choice, be it their mother tongue or a common language such as English. This significantly increases the amount of content available to end users. Naturally, speech.media will issue a warning message that it uses automated translation and thus inaccuracies are to be expected.
speech.media project status quo
The project is in its fourth month of a six month development period. Results are already very promising, and the consortium is progressively enhancing the speech.media platform. So far, we have integrated all services, albeit for a very limited number of major languages and building on top of Google and IMB services only.
We've discussed the concept and shown the initial version on a number of occasions, for example at the BBC speech technology hackathon in February 2017 and at the Media and Entertainment Services Alliance (MESA) Content Localisation Technology Showcase meeting in March 2017. A lot of people showed interest.
Almost everybody at DW Innovation has tested a live demo of the tool by now. The feedback is being processed by LETA as we're writing this. An in-depth evaluation is coming up later this month. We're also preparing further demonstrations at Deutsche Welle and the upcoming Global Medium Forum (Bonn, June 19th-21st).
Challenges and future plans
speech.media is far from perfect yet. The major obstacles are–as expected–errors in the automated output. In particular the lack of proper sentence splitting and capitalization in the speech-to-text stage results in rather poor text quality without proper sentence structures. This currently provides a weak basis for the next stage, affecting machine translation text output and subtitle quality. Machine translation looks at the context–and proper sentence structure is essential for providing such context.
Synchronization with the video playout, subtitle size and positioning needs optimization, too. Normalization of numbers in subtitles should also be improved.
In order to tackle these challenges and come up with a truly exploitable tool that will also offer additional features, we have submitted a follow-up DNI project application called NEWS-BRIDGE.
speech.media has made it clear that platforms for creating video transcriptions, translations, and voiceovers are in demand – and that it’s feasible to develop them.
P.S.: Stay tuned for some speech.media live action – we’re currently working on a screencast.