The idea of automated translation between different languages has been a goal of many for a long time. The British, and even more so the Americans, are notorious for their poor linguistic skills and seem to think that everyone should just learn English – even if the two groups can’t agree on just what English is. Other nations disagree and, although languages are dying off at a substantial rate, according to some source about two dozen annually, it doesn’t look like we’ll all be speaking in one tongue for quite some time.
There are two main methods used by computers to translate between languages. The first involves documenting the language’s rules in a machine readable format, adding in the exceptions – of which there are many, languages are not logical – and incorporating the vocabulary. This method has never worked well. The second method is to compare vast amounts of text to a known good translation and build up a knowledge of what should be replaced by what when translating. This is the approach used by Google Translate. Google’s arrays of computers have been fed with millions of lines of text from such sources as the United Nations and the EU parliament which constantly require documents in multiple languages. It’s like using the Rosetta Stone to decipher an unknown tongue. This works much better than the first method but I recently stumbled upon an instance were its translations were completely wrong.
Pangrams are sentences that use all the letters in a specific alphabet. The classic English one is:
The quick brown fox jumps over the lazy dog.
(Note that it’s jumps, not jumped as I often hear.)
They are used to display fonts and test handwriting skills etc. I was looking at some in other languages recently (I know, I should get out more) and came across this one for Spanish:
El veloz murciélago hindú comía feliz cardillo y kiwi, la cigüeña tocaba el saxofón detrás del palenque de paja.
It’s quite long as it has all the diacritical marks (in Spanish ñ, ch and ll are also counted as separate letters and therefore some insist that pangrams must include these). I wanted to translate this so pasted the first part into Google, this is the result:
The quick brown fox ate happy golden thistle and kiwi.
Because pangrams are rarely literally translated but are often near their foreign counterparts Google Translate has decided that El veloz murciélago hindú , in reality the quick Hindu bat, means the quick brown fox. It also decided that the end of the sentence, palenque de paja, meant lazy dog. The actual full translation should read:
The quick Hindu bat happily eats golden thistle and kiwi, the stork played the saxophone behind the straw arena.
Now you know where Dalí got his inspiration, no one said pangrams had to make much sense.
So the majority of interpreters (who do real time translation) and translators won’t be out of a job just yet, even using automated tools you still need a fluent speaker to tidy up the results, but for many less formal translations tools like Google Translate are good enough.
For some areas, such as legal and scientific documentation it will probably take a lot longer for automation to be reliable.
(There is a feature in Google Translate that lets you change the result. If anyone notices that these translations are now correct it might be because Google updates based on these contributions.)
And if you want to test Google even further you can use the old favourite:
Time flies like an arrow, fruit flies like a banana.
It makes a good job it getting fruit flies correct, but fails on the second like.