skip to Main Content
Data Analysis

Data simplification – have your cake and eat it too

It is a really exciting time at Lingo24. 2018 proved to be a great year for us generally, but it has also been the time where we have been able to invest and develop the processes associated to large scale product-based projects, or other data-heavy applications with similar content patterns.

We are using our technology and people to create an alternative approach to the way content is analysed and adapted prior to commencing large localisation tasks.
By changing the way we think about these content sets and how we interact with them, we have been able to make some massive leaps in reducing our clients’ redundancy in the translation process.

What does this mean? Well, basically we have had huge success in reducing the scope of translation projects where we have large data sets; this means that you can reduce cost, improve speed to market, and remove redundancy in the content you are sending for translation.

data analysis

Most translation processes start with a set of content, historically, you counted the number of words and provided a cost.
We have started working in a different way: we are looking at what is being translated, the structure, the quality and the accessibility. Especially, we are looking at patterns in the form of highly similar sentences and/or sub-sentence elements that can be reduced to increase associated match rates.
We can also isolate differences caused by sizes, units, SKU, dynamic place holders and the like.

What is significant in this change is that we are front-loading this work and using AI tools adapted in our terminology and Machine Translation work to help with the heavy lifting. This means that, by the time the content gets to the translation process, you have maximised the benefits in the use of traditional tools like Translation Memory, and are able to leverage Neural Adaptive Machine Translation to reinforce the gains, improving speed and reducing cost further.

On some recent projects the metrics produced are impressive:
• On a replacement parts product catalogue we were able to reduce the content by 32% and reduce the associated costs accordingly; as this was a high 6 figure project, these gains are significant.
• More recently, a product catalogue for electronics was reduced by 11% of the total word count, which equated to 37% of additional cost per language in the content raw state. As the overall word count was 3m words and there were 11 languages, the savings are significant.

It is important to stress that these savings are made in addition to the usual benefits from repetitions and/or matches from legacy translations.
Indeed in the case of these two examples we had no existing translations. By transforming the approach in analysing data and being granular in maximising the matches, customers really can have it all – faster, cheaper, with no quality implications, especially with the increased consistency through creating a cleaner data set.

Welcome to the new world! I am very excited about what 2019 will see as we develop these solutions further.
We have created a page on our site where we go into significantly more detail, if you want to find out more then click here.


*Photo credits: geralt /

Jeremy Clutton, Global Director, eCommerce & Channel Partners, Lingo24

Jeremy leads the London sales operations for Lingo24. He specialises in advising e-commerce businesses on everything related to translation, localisation and global marketing.
Jeremy has extensive experience in the translation business, working with a wide range of blue chip and SME companies on managing their language needs.

Back To Top