skip to Main Content
Bilingual Glossaries

Accelerating terminology development with Lingo24’s TermFinder

Here at Lingo24, the technology team and I are constantly challenged to find new ways to help clients manage their quality expectations.

With a fundamental belief that quality translations and the tools that deliver them should be easily available to everyone, we are always looking for ways to improve our free-to-access tools to meet the needs of our users.

As such, we regularly take the time to explore different areas of the translation and localisation process to see what we can do to improve the experience for our clients and our teams.

Through our work across a range of diverse markets, we have found that there is one main area that is consistently a driver of high-quality translations: a client-specific terminology.

This led us to put our mind to the challenge of how we make terminology an integral part of a client’s translation journey when they partner with us – producing some new and exciting features in our platform that we feel delivers this.

Let me explain….

Terminology and its impact

For me, terminology is now arguably as important as Translation Memory for mature translation buyers – a view that played out during my conversations at the TAUS Annual Conference.

Having a solid glossary or terminology or TermBank (we have many terms for this in our own industry!) for writers, translators and reviewers to tap into is invaluable in delivering an efficient translation workflow in a number of ways.

When used upstream within the content creation process, it supports the creation of higher quality documentation by producing less variation, more quickly, through the use of validated terms that have already been researched (and hopefully provide additional context).

Following on from that, during the translation and localisation process, having an existing glossary drives more classic Translation Memory matches, as well shortening both the research and review cycles. This results in reducing the cost of translation and localisation, whilst improving quality, through greater consistency.

More importantly they help ensure your chosen voice, tone and style as well core domain context is captured in a body of knowledge that live beyond the original author(s), helping new members of the team onboard and deliver quicker.

All great benefits! Yet in practice we find many clients who start working with us have no terminology-related materials, or those that have some, have struggled to nurture them, feeling they need a lot of work before being treated as a canonical source for content creation or translation work.

Why is this so?

Kicking around the topic, we asked ourselves ‘given the benefits, why do so few clients have good quality glossary assets?’.

We concluded there were two main reasons; the high barrier to entry in creating a high-quality set of glossary assets and the need to implement robust terminology tooling to ensure their shared use.

It is clear that setting out the process and structure for effective use of terminology requires planning, with practical consideration of the following topics:

  • How do I create my initial TermBanks?
  • How are they segmented?
  • Who is involved in reviewing / approving terms?
  • How does this work across teams and divisions?
  • How do I keep it up to date?

Furthermore, once a high-quality set of glossary assets is available the effective sharing, application and up-keep of these assets requires centralised (and often expensive) tooling, which is not commonplace in most organisations.

So how do we move beyond that shared, but often forgotten and out-of-date, Excel spreadsheet that is more commonplace in many organisations?

Putting the power of terms in our clients’ hands

Having considered the problem, we decided to have a go at solving the terminology challenge end to end by working on two separate but linked pieces of technology. The first was the development of our TermFinder tool, used to accelerate terminology development by using existing assets you will already have. The second was the Terminology Management and Validation interface in our Coach Translation Platform which supports the validation and then ongoing management and application of the terminology assets.

Within our TermFinder tool, we use a refreshed statistical approach by combining techniques that look at the data differently than traditional frequency based methods that are often used.

We start by analysing an existing Translation Memory and other assets to identify potential terms by extracting monolingual term sets from both the source and target sides. Then we strip out all “stopwords” (like “a”, “the” etc), and compare the frequency of each potential term in a generic corpus vs the Translation Memory, which gives us its log likelihood. What this means is that we’ll be looking for terms that appear more frequently in the Translation Memory than would be expected given their frequency in a generic text.

The resulting terms are then aligned and ranked by training a phrase-based Statistical Machine Translation engine using the Translation Memory. During this process, we use customised features to identify good terms (e.g. running a DBPedia check) that are more likely to be relevant.

It is important to note, that we are not using Machine Translation to generate terms. We are building a Machine Translation engine as ephemeral data used to merely help rank and align potential terms based on commonality and uniqueness.

You can learn more about this process by reading our research paper, written with Andy Way from CNGL at Dublin City University, on the topic.

The outcome of this process is a high-quality, focused, bilingual TermBank, that can easily be approved by one or more internal reviewers quickly using our refreshed Terminology Management and Validation interface in Coach, reducing the typical development cycle from months to weeks.


The results so far speak for themselves, with a high accuracy rate on validated source terms.

The Overall Result

We are pleased with the results of our hard work on both tools, and have seen that by bringing high-quality automatically discovered terms to the right people in an organisation, using tailored tasks to validate and approve them (ensuring they are then used appropriately in translations), there is a significant impact on quality for our clients.

And for those looking to use Machine Translation within their translation workflow, we can train engines specifically using this agreed terminology in order to improve the output. But that is a whole other topic, for another day.

If you would like to learn more about the technology behind this, or try it out on your translation memory assets, check out our slides from the TAUS Innovation Award on Speaker Deck or get in touch via our website – the team and I can talk about terminology and technology for hours!

*Photo credits: alejandro dans neergaard /

Dave Meikle

Dave joined Lingo24 as Chief Technology Officer in January 2014, having previously worked as the Head of Digital at Sopra Group. He’s also a Vice President at the Apache Software Foundation. Dave heads up our overall technology strategy, focusing on how we can make customers’ lives easier through innovative tech. Follow Dave on Twitter @dameikle and on LinkedIn.

Back To Top