skip to Main Content

Google brings in the machines for Wikipedia translations

Google and Wikipedia recently announced that they are aiming to translate more than 16 million words of Wikipedia content into ‘smaller languages’.

While Wikipedia already contains a wealth of information in English, German and French, languages such as Arabic, Hindi, Tamil, Swahili, Telugu and Gujarati are under-represented.

Google is aiming to rectify this – and further its aim of internationalising knowledge – by translating Wikipedia’s most read pages into these languages.

Google is using its data analytics tools to identify the most read English articles on Wikipedia in each language (for example, the most read English articles by Indian web users).

These articles are then translated using post-edited machine translation (PEMT), which involves translating the content from English into the target language using Google Translate, and then having volunteer linguists collaborate to edit and improve the machine translations.

But while Google is to be applauded for improving multilingual accessibility to Wikipedia’s information, it’s worth noting that this strategy is far from the most efficient. PEMT can often be more time consuming and error-prone than simply using human translators from the outset.

While statistical machine translation tools such as Google Translate can be effective for highly repetitive technical texts, when it comes to creative text, which constitutes countless Wikipedia articles, machine translation lacks the linguistic and cultural understanding of a human translator, and meaning is often misconstrued or even lost.

Language is a very personal thing, and at Lingo24 we often find that translators prefer to translate text from scratch, rather than make many small amendments, as is often necessary with PEMT.

In an ideal world, all content published on the web would also be translated by native speaking professional translators into all the target languages in which it is required.

Until we reach that multilingual web utopia, however, there will be a clear hierarchy in web content between that which is machine translated, that which is post-edited through crowdsourcing, and that which is translated from scratch by professional translators.

Which category are your translations in?

Christian Arno, Founder and President, Lingo24

Christian Arno is Founder and President of Lingo24. He started the company in 2001 after graduating from Oxford University with a degree in languages. He has won numerous awards including HSBC Business Thinking and International Trade Awards (2010), and TAUS Excellence Award (2012) for innovative technology. He contributes to leading industry publications and has been featured on the BBC, in the Financial Times and other media around the world.

Back To Top