It probably makes sense to start with the most tangible result of the project, something that doesn’t require special skills and is very suitable for daily use: GoURMET Translate. The web-based application now offers 16 (!) “exotic” languages that can be translated from and into English: Amharic, Bulgarian, Burmese, Gujarati, Hausa, Igbo, Kyrgyz, Macedonian, Pashto, Serbian, Swahili, Tamil, Tigrinya, Turkish, Urdu and Yoruba. Just give it a try; the tool is hosted by the BBC and free of charge.
If you’re more on the technical side of things, make sure to check out GoURMET on GitHub – hosted by the University of Edinburgh. The open source repository offers detailed information on the collected language models, sample code, and links to a site where everything can be downloaded as dockers.
There is also plenty of academic reading material, like this paper published by the University of Edinburgh, the University of Alicante, the University of Amsterdam, the BBC, and DW:
A bit of context: Pashto is an important, but underrated and often neglected Indo-European language. Along with Dari, it’s one of the two official languages of Afghanistan. Almost half of the country’s population as well as a number of communities in India and Tajikistan speak Pashto, bringing estimates of native speakers to something between 45 and 50 million people. GoURMET focused on Pashto in a “surprise challenge”. The idea was to create a language model between Pashto and English – basically from scratch – and come up with a robust tool that can be used for international journalism. The challenge was met with a lot of interest and applause, e.g. by Waslat Hasrat-Nazimi, head of DW’s Afghanistan service:
Impact and outlook
NMT has become an important and much sought after technology at basically all international media houses. Having sophisticated translation tools at their fingertips doesn’t only mean journalists and media producers can understand sources from basically any region in the world, it also means they can monitor and adapt content – and make it more accessible for all kinds of audiences.
In the case of GoURMET, new language models were integrated into plain X, a new software for transcription, translation, subtitling and voice-over generation used at DW. The output of GoURMET is also used in SELMA OSS, a research tool that brings together open-source language models and technologies to create and process language tasks. The platform has been created in the scope of the SELMA project, another H2020 HLT effort DW is involved in.
Look out for more language technology news coming to you on this channel!