post_speech_media.jpg

Create Video Transcriptions, Translations, and Voice-Overs With Just One Tool: speech.media

June 06, 2017, 8 minutes read

Since we’re part of a multilingual media company, Human Language Technologies (HLT) have been a focus in our projects for more than a decade. Truth be told: Results haven’t been anywhere near operational requirements. Up until now.

Following a mash-up concept – i.e. a concept of combining existing tools rather than designing new software from scratch – Latvian News Agency LETA and we have teamed up for a very promising project called speech.media.

It’s funded by the Google Digital News Initiative (DNI) and aims to create an innovative prototype platform able to handle a full workflow of automated processes of transcription, translation, and voiceover in a large number of languages.

It's funded by the Google Digital News Initiative (DNI) and aims to create an innovative prototype platform able to handle a full workflow of automated processes of transcription, translation, and voiceover in a large number of languages.

The ultimate goal is to upload any news video in any language–and make it available in virtually any other language (text and speech) within a couple of minutes. LETA has taken the part of the technical developers while we coordinate, gather requirements, and serve as beta testers.

How speech.media works

The speech.media platform receives Deutsche Welle content in all 30 languages through the DW API, then processes selected content consecutively through the speech-to-text API, the machine translation API, and the voiceover API.

A user interface shows a clear overview of the progress of the different steps:

content download
transcription
subtitling in the source language
translation
subtitling in the target language
voiceover production

The user can select source and target languages and available content items, and display the automated output as either subtitles or voiceover. An editorial component allows media professionals to correct and enhance the automated output between each processing step, thus improving subsequent output.

Why we rely on third party technologies

speech.media uses external tools. Currently, it specifically relies on IBM Watson and Google Translate. There are two main reasons for this approach.

We can keep requirements and costs for in-house natural language processing (NLP) development to a minimum.
Google, IBM and other third parties provide high-end tools that are constantly maintained and tweaked.

For example, Google Translate has shown tremendous improvement over the past year: Its sophisticated neural machine translation system is expanding to more and more languages. This has an immediate impact on speech.media, its quality and application range respectively.

How speech.media will help media organizations and end users

speech.media has a lot of potential. Right now, these seem to be its biggest advantages:

It will help our colleagues make video content more accessible and searchable, independent of its source language. In a highly multilingual organization such as Deutsche Welle, this is a major benefit: speech.media will make lower-resourced languages more approachable and add to the value of produced content through reuse in other languages. Editors can quickly browse through information or "order" a full translation.
The tool will allow end user to view video-on-demand in their language of choice, be it their mother tongue or a common language such as English. This significantly increases the amount of content available to end users. Naturally, speech.media will issue a warning message that it uses automated translation and thus inaccuracies are to be expected.

speech.media project status quo

The project is in its fourth month of a six month development period. Results are already very promising, and the consortium is progressively enhancing the speech.media platform. So far, we have integrated all services, albeit for a very limited number of major languages and building on top of Google and IMB services only.

We've discussed the concept and shown the initial version on a number of occasions, for example at the BBC speech technology hackathon in February 2017 and at the Media and Entertainment Services Alliance (MESA) Content Localisation Technology Showcase meeting in March 2017. A lot of people showed interest.

Almost everybody at DW Innovation has tested a live demo of the tool by now. The feedback is being processed by LETA as we're writing this. An in-depth evaluation is coming up later this month. We're also preparing further demonstrations at Deutsche Welle and the upcoming Global Medium Forum (Bonn, June 19th-21st).

Challenges and future plans

speech.media is far from perfect yet. The major obstacles are–as expected–errors in the automated output. In particular the lack of proper sentence splitting and capitalization in the speech-to-text stage results in rather poor text quality without proper sentence structures. This currently provides a weak basis for the next stage, affecting machine translation text output and subtitle quality. Machine translation looks at the context–and proper sentence structure is essential for providing such context.

Post-transcription/-translation editing.

Synchronization with the video playout, subtitle size and positioning needs optimization, too. Normalization of numbers in subtitles should also be improved.

In order to tackle these challenges and come up with a truly exploitable tool that will also offer additional features, we have submitted a follow-up DNI project application called NEWS-BRIDGE.

speech.media has made it clear that platforms for creating video transcriptions, translations, and voiceovers are in demand – and that it’s feasible to develop them.

P.S.: Stay tuned for some speech.media live action – we’re currently working on a screencast.

Authors

Alexander Plaum

Peggy van der Kreeft