Google’s neural networks are translating gibberish into vaguely coherent passages

Abner Li | Jul 20 2018 - 12:56 pm PT

Back in 2016, Google Translate began leveraging Neural Machine Translation to drastically improve the quality of translations. Just last month, Google announced that NMT now works completely offline on mobile. The service is now seeing an odd occurrence where inputting gibberish outputs readable passages.

As reported by Motherboard, Google Translate seems to recognize repeating, nonsense words as a foreign language and then converting it into a readable sentence. Oddly, some inputs are recognized as belonging to certain languages, with some of those results consistently taking a religious theme.

For example, entering a varying number of the word “ag” — recognized as Irish — into Translate will output the following:

10: “And its length was one hundred bits at one end”
21: “As a result, the total number of the members of the tribe of the sons of Gershon was one hundred fifty thousand”
25: “As the name of the LORD was written in the Hebrew language, it was written in the language of the Hebrew Nation.”

This is such a common occurrence that a Reddit community (1, 2, 3) has formed to highlight this phenomenon. Motherboard spoke to computer scientists and machine learning experts that suggested how the materials used to train Neural Machine Translation algorithms might be at fault.

NMT involves comparing identical texts written in different languages to create rules, or a model, between them. When trying to achieve the first variants of machine translation, Google in the early 2000s would use documents from the United Nations that were “skillfully translated” as sources.

More recently, NMT might have used the Bible, given that it is the most sold book in history, and has been translated into a significant number of languages. This could explain why the religious-themed responses are more common in languages that have not been widely translated. When given “nonsense inputs,” the system could “hallucinate” these strange phrases in trying to provide the user a fluent response, according to another researcher Motherboard spoke to.

Google has removed the translation examples Motherboard provided to the company, but did not specify what source text it uses for training.

With the increased presence of AI in consumer products, Google has noted in the past how it wants to make sure the new technology is capable of providing explanations to the results and determinations it arrives at. For example, in the case of medicine, an AI algorithm could explain why such and such a recommendation was given, so that machine learning is not a “black box” of answers.

Check out 9to5Google on YouTube for more news: