MT news

AD 2024: MT or PE?

A client has asked about using machine translation for a large document. They don't even know if they want it post-edited :-) Here are a few tips, assuming that the language pair is between English and Dutch, German or Polish.

1) You and your client may find useful a how-to about preparing text for MT – whether the machine translation shall be published as-is, or you will post-edit it.

2) For your language pair, DeepL still seems to be the best choice.

3) To ensure proper protection of your client’s intellectual property, as well as to enable a few more advanced options, I recommend that you use a DeepL Pro account.

4) The feature with the best effort-to-performance ratio that I highly recommend, even if you need to produce MT and not post-edit it, is the Glossary function of DeepL – available both from browser and from CAT plug-in. The glossary allows to apply preferred terminology in the MT in a consistent way. You can also use the Glossary to exclude from translation the do-not-translate terms like company and person names.

  1. Skim the source document for key terminology
  2. Prepare a glossary with translation of these key terms as a comma-separated file
  3. Upload the file to DeepL
  4. The machine translation will pick up your preferred terminology.

5) After you prepare MT for the client (using the above hints), let the client decide – for example, by evaluating samples of machine translation – if they want to publish it as is, or if they decide to order full post-editing service from you. I once wrote down a few hints how to decide about post-editing: it’s less about the quality of MT and more about the role of published text!

6) If the client decides to publish a machine-translated text without full post-editing, it is a good practice to clearly mark it as such 🙂

The risks of using Machine Translation

Steward zum Kapitän:

„Herr Kapitän, wir haben einen blinden Passagier an Bord. Was sollen wir mit ihm machen?

Kapitän: „Sperrt ihn ein.“

Ca. 10 Minuten später kommt der Steward wieder zum Kapitän:

„Und was machen wir jetzt mit dem Hund?“

Steward to the captain:
“Captain, we have a stowaway on board. What should we do with him?
Captain: “Lock him up.”
About 10 minutes later the steward comes back to the captain:
“And what do we do with the dog now?”

Machine translation is everywhere: you get it for free on Facebook, you can use it with no fuss in Word and other Microsoft services, and – of course – there is Google Translate in all flavours (like Google Lens). If you are more into text and documents, you may know of some MT services that work for your language better than Google: for example, Baidu in China, DeepL in Europe, or Yandex in Russia.

You have certainly been impressed by how good MT can translate: you may easily get an output document that not only has all titles, images and footnotes perfectly in place, but also reads smoothly and makes a lot of sense.

…Or does it really?

Machine Translation AD 2022 can model language really well. For a source text written correctly and in fairly standard language, the MT output will very likely be fluent and understandable. The biggest risk, however, is that the sense and meaning can be lost: distorted, negated, or changed to something totally irrelevant. Or embarrassing.

The MT from German to English quoted above demonstrates that DeepL can use an idiom, but it cannot know that the whole text is non-translatable (needs to be localized or transcreated). Let’s have a look at more serious examples from legal domain Polish to English.

Polish to English – distorted meaning

Source: Przeprowadzenie testu proporcjonalności naruszenia prawa własności przez sąd nie może zostać oderwane od prawidłowych ustaleń faktycznych dokonanych w rzetelnym postępowaniu, w którym strona może brać aktywny udział i korzystać ze wszystkich uprawnień procesowych wynikających z prawa do obrony.

Machine translation: Conducting the proportionality test of a court’s infringement of a property right cannot be divorced from the correct findings of fact made in a fair proceeding in which a party can actively participate and exercise all procedural rights under the right of defence.

Source text is about the proportionality test conducted by the court, and the test pertains to some infringement of a property right.
Translated text is about the proportionality test conducted due to a court’s infringement of a property right.
Not the same thing! Would you immediately know that anything is wrong if you only had the machine translation available?

This is the worst case of MT errors:

  • the machine translated text makes sense,
  • there is nothing suspicious, at least not at the first glance,
  • however, the sense of the translation is not what’s meant in the source.

Polish to English – wrong key term

Source: Where relationship and respect are emphasized in personal education classes (as they are in Holland), the rate of unprotected sex in teens is low, the age of first intercourse is higher, and fewer teens report that their first sexual experience resulted from coercion.

Machine translation: Tam, gdzie na lekcjach wychowania fizycznego kładzie się nacisk na relacje i szacunek (tak jak w Holandii), wskaźnik uprawiania seksu bez zabezpieczenia wśród nastolatków jest niski, wiek pierwszego stosunku wyższy, a mniejsza liczba nastolatków przyznaje, że ich pierwsze doświadczenie seksualne było wynikiem przymusu.

Source text is about personal education classes.
Translated text is about physical education classes.

This is almost as bad as the previous case – almost, because it’s only one wrong term:

  • the translated text makes sense,
  • even without understanding any English, the Polish text can make you wonder how physical education might impact sex life of teens,
  • but without reading the source, it’s hard to figure out what class was actually in question, so machine translation is hardly usable.

Polish to English – just embarrassing

Source: Zakład Postępowania Administracyjnego i Sądowoadministracyjnego

Machine Translation: Plant of Administrative and Administrative Judicial Procedure

Source text is about a Department of Administrative and Judicial-Administrative Proceedings.
Translated text made it into a plant.

This mistake is rather obvious and easy to detect even without reading into English text, because – in a legal text – you expect a “department” rather than a “plant” (and Polish word “zakład” may mean either of them, depending on the context). However, such an omission can be very embarrassing if the machine-translated text reaches its audience without a human touch.

Now let’s rephrase the subject of this article: What are the risks of using machine translation, if no one verifies its correctness against the source text?

  • The biggest risk is misleading your target audience. If proper understanding of the text is critical in the sense of human life, health, political decisions or large financial gain/loss, then the meaning must be conveyed correctly. The more mission-critical the text, the more necessary becomes the professional validation of MT against the source text, before the translated reaches its target audience.
  • The second big risk is a damage to your reputation. Whether machine-translated text has unintended obscene meaning, or involuntary joke caused by translating person/company name, or just speaks loud and clear “I am machine-translated and never verified” – same logic applies: the more important is the text to your profile, the more mandatory is to have it verified by a professional who has access to, and understanding of, the source text.

And where can you use raw, unverified machine translation (rather) safely? For example in these areas:

  • Non-critical content like chats, user comments to a webpage, low-value goods descriptions in online shops or press news with limited impact. In each of these use cases, before publishing raw MT, you need to assess the risk of misleading or offensive translation. (Does any client refrains from buying a nicely priced shirt online, even if the description is a bit off, like in the picture provided?)
  • Gisting – getting an overall sense of content (e.g. patents, legal or audit documentation) to decide which parts should be translated or professionally post-edited, or what to translate first. In this use case, the risk is minimal, because raw MT is never to be published without a thorough human validation.

Polish to English examples provided by DeepL and by Anna Setkowicz-Ryszka who also contributed to content and as the first reviewer.

German to English examples provided by DeepL and by Bianca Blüchel.

Disclaimer: We have chosen examples from DeepL not because it is a bad MT – on the contrary, it is one of the best generic, publicly available MTs for both German<>English and Polish<>English language pairs. And exactly because it is so good, fluent and reads natural, it is very risky in mission-critical use cases. As DeepL themselves warn in their Terms and Conditions:

“Customer may use the Products solely for the purpose agreed between the Parties. In particular, Customer may not, and will not allow third parties (including Internal Users and End Users) to use the Products, translations created using the Products, Documentation or other data, information or service provided by DeepL unless expressly authorised by DeepL in written form

a) in connection with or for the purpose of operating critical infrastructure such as electrical power stations, military or defence equipment, medical appliances or other equipment whose failure or impairment would result in unforeseeable economical or physical damages, including but not limited to critical infrastructure in terms of the European Directive 2008/114/EC”

Ask Your Mentor Anything

On February 19, 2021, Virginia Katsimpiri has hosted me in her Business Mentoring for Translators session. The recording is available on YouTube. We talked about:

  • How I got into the translation and localization business (back in the 90s…)
  • Why machine translation post-editing is a thing for translators (now – in 2021)
  • How this service looks like from technical perspective (in a CAT tool)
  • How to evaluate if a price for MT PE is fair (with paper, pencil and clock!)
  • What is there for us to use except for Google Translate 🙂