Steward zum Kapitän:
„Herr Kapitän, wir haben einen blinden Passagier an Bord. Was sollen wir mit ihm machen?
Kapitän: „Sperrt ihn ein.“
Ca. 10 Minuten später kommt der Steward wieder zum Kapitän:
„Und was machen wir jetzt mit dem Hund?“
Steward to the captain:
“Captain, we have a stowaway on board. What should we do with him?
Captain: “Lock him up.”
About 10 minutes later the steward comes back to the captain:
“And what do we do with the dog now?”
Machine translation is everywhere: you get it for free on Facebook, you can use it with no fuss in Word and other Microsoft services, and – of course – there is Google Translate in all flavours (like Google Lens). If you are more into text and documents, you may know of some MT services that work for your language better than Google: for example, Baidu in China, DeepL in Europe, or Yandex in Russia.
You have certainly been impressed by how good MT can translate: you may easily get an output document that not only has all titles, images and footnotes perfectly in place, but also reads smoothly and makes a lot of sense.
…Or does it really?
Machine Translation AD 2022 can model language really well. For a source text written correctly and in fairly standard language, the MT output will very likely be fluent and understandable. The biggest risk, however, is that the sense and meaning can be lost: distorted, negated, or changed to something totally irrelevant. Or embarrassing.
The MT from German to English quoted above demonstrates that DeepL can use an idiom, but it cannot know that the whole text is non-translatable (needs to be localized or transcreated). Let’s have a look at more serious examples from legal domain Polish to English.
Polish to English – distorted meaning
Source: Przeprowadzenie testu proporcjonalności naruszenia prawa własności przez sąd nie może zostać oderwane od prawidłowych ustaleń faktycznych dokonanych w rzetelnym postępowaniu, w którym strona może brać aktywny udział i korzystać ze wszystkich uprawnień procesowych wynikających z prawa do obrony.
Machine translation: Conducting the proportionality test of a court’s infringement of a property right cannot be divorced from the correct findings of fact made in a fair proceeding in which a party can actively participate and exercise all procedural rights under the right of defence.
Source text is about the proportionality test conducted by the court, and the test pertains to some infringement of a property right.
Translated text is about the proportionality test conducted due to a court’s infringement of a property right.
Not the same thing! Would you immediately know that anything is wrong if you only had the machine translation available?
This is the worst case of MT errors:
- the machine translated text makes sense,
- there is nothing suspicious, at least not at the first glance,
- however, the sense of the translation is not what’s meant in the source.
Polish to English – wrong key term
Source: Where relationship and respect are emphasized in personal education classes (as they are in Holland), the rate of unprotected sex in teens is low, the age of first intercourse is higher, and fewer teens report that their first sexual experience resulted from coercion.
Machine translation: Tam, gdzie na lekcjach wychowania fizycznego kładzie się nacisk na relacje i szacunek (tak jak w Holandii), wskaźnik uprawiania seksu bez zabezpieczenia wśród nastolatków jest niski, wiek pierwszego stosunku wyższy, a mniejsza liczba nastolatków przyznaje, że ich pierwsze doświadczenie seksualne było wynikiem przymusu.
Source text is about personal education classes.
Translated text is about physical education classes.
This is almost as bad as the previous case – almost, because it’s only one wrong term:
- the translated text makes sense,
- even without understanding any English, the Polish text can make you wonder how physical education might impact sex life of teens,
- but without reading the source, it’s hard to figure out what class was actually in question, so machine translation is hardly usable.
Polish to English – just embarrassing
Source: Zakład Postępowania Administracyjnego i Sądowoadministracyjnego
Machine Translation: Plant of Administrative and Administrative Judicial Procedure
Source text is about a Department of Administrative and Judicial-Administrative Proceedings.
Translated text made it into a plant.
This mistake is rather obvious and easy to detect even without reading into English text, because – in a legal text – you expect a “department” rather than a “plant” (and Polish word “zakład” may mean either of them, depending on the context). However, such an omission can be very embarrassing if the machine-translated text reaches its audience without a human touch.
Now let’s rephrase the subject of this article: What are the risks of using machine translation, if no one verifies its correctness against the source text?
- The biggest risk is misleading your target audience. If proper understanding of the text is critical in the sense of human life, health, political decisions or large financial gain/loss, then the meaning must be conveyed correctly. The more mission-critical the text, the more necessary becomes the professional validation of MT against the source text, before the translated reaches its target audience.
- The second big risk is a damage to your reputation. Whether machine-translated text has unintended obscene meaning, or involuntary joke caused by translating person/company name, or just speaks loud and clear “I am machine-translated and never verified” – same logic applies: the more important is the text to your profile, the more mandatory is to have it verified by a professional who has access to, and understanding of, the source text.
And where can you use raw, unverified machine translation (rather) safely? For example in these areas:
- Non-critical content like chats, user comments to a webpage, low-value goods descriptions in online shops or press news with limited impact. In each of these use cases, before publishing raw MT, you need to assess the risk of misleading or offensive translation. (Does any client refrains from buying a nicely priced shirt online, even if the description is a bit off, like in the picture provided?)
- Gisting – getting an overall sense of content (e.g. patents, legal or audit documentation) to decide which parts should be translated or professionally post-edited, or what to translate first. In this use case, the risk is minimal, because raw MT is never to be published without a thorough human validation.
Polish to English examples provided by DeepL and by Anna Setkowicz-Ryszka who also contributed to content and as the first reviewer.
German to English examples provided by DeepL and by Bianca Blüchel.
Disclaimer: We have chosen examples from DeepL not because it is a bad MT – on the contrary, it is one of the best generic, publicly available MTs for both German<>English and Polish<>English language pairs. And exactly because it is so good, fluent and reads natural, it is very risky in mission-critical use cases. As DeepL themselves warn in their Terms and Conditions:
“Customer may use the Products solely for the purpose agreed between the Parties. In particular, Customer may not, and will not allow third parties (including Internal Users and End Users) to use the Products, translations created using the Products, Documentation or other data, information or service provided by DeepL unless expressly authorised by DeepL in written form
a) in connection with or for the purpose of operating critical infrastructure such as electrical power stations, military or defence equipment, medical appliances or other equipment whose failure or impairment would result in unforeseeable economical or physical damages, including but not limited to critical infrastructure in terms of the European Directive 2008/114/EC”