Software localization presents unique challenges. And these can easily multiply as you add more languages and…
by Christof Schneider, 19th October, 2004
Christof Schneider’s article contains a sneak preview on the topic of corpus related computer tools
Have you ever wanted to sort out the terminology of a translation before you start? Did you ever think how wonderful it would be if you could see all occurrences of one term in its various contexts? If so, then you’re no doubt familiar with the task of sitting down, reading through the source text, finding the “complicated” words and writing them down so you can look them up.
However, if you have the right file format of the text in question, why not use an electronic tool to help you to find the terminology – and to show you all the occurrences of a single term in all its contexts at the same time?. A small and usually relatively inexpensive ($0-300) concordance software can help you carry out your pre-translation text analysis.
Your terminology research can be supported by creating a so-called ‘frequency list’. This list shows every single word (or each string of characters, for that matter)contained in the text which is being searched. It also includes the frequency with which each term appears. The search result can be readily sorted according to frequency or in alphabetical order. Such a list then makes a great base for your terminology research – and you can even create a glossary using the list.
Some programmes even offer to display lemmatised lists, in which each of the most common word forms are shown together. For example “goes” or “gone” would be listed together with “go” (although it won’t recognise “went” as a form of “to go”). Some of these tools allow extracting not only of lists of single words, but also of groups of two or three words.
But what about the context of a word? Here the second functionality of the software comes into play: the concordance function. This function searches for and displays all occurrences of a particular search term. So if you want to see a term in context, simply search for the concordances of this term. The search result is displayed in a format, which is called KWIC (“key word in context”).
Search results can be ordered in many ways. You can display the five words to the left and/or the right of the term in all kinds of variations. The context at either side of the search term can, for example, be adjusted to show a certain number of words to the left and the right – and all of this can be sorted alphabetically if you so wish.
This way you can check for collocations and detect even multi-word patterns. You might find, for example, that “computer” quite often has the ‘right neighbour’ “hardware”, or “software” (and often “problem” too).
The software in question is quite useful for translators and relatively inexpensive. A search in Google for “concordancer” will lead you to more information.
© Christof Schneider, May 2004