Unsupervised machine translation at Facebook: Researchers at Facebook have been honing a technique that may help the social media platform learn to translate between rare pairs of languages in the future.
Translations between popular languages like English to French or German to Spanish are better since there’s a lot of existing parallel data, but not so much for pairs like Vietnamese to Welsh or Maori to Tamil.
For these rare language pairs developers can’t really rely on supervised machine learning. Models can’t learn how to translate between languages by relying on one-to-one mappings, where the same sentence is written in two languages. Instead, the researchers have been exploring unsupervised machine learning methods.
The system is fed different texts in various languages and converts each word to a vector space, known as word embeddings. The idea is that the distance between two vectors describes how closely related the words are. So, for example, the word embedding for ‘dog’ should be closer to ‘animal’ than ‘skyscraper’. These patterns should be pretty similar in other languages too.
“Because of those similarities, we proposed having the system learn a rotation of the word embeddings in one language to match the word embeddings in the other language, using a combination of various new and old techniques, such as adversarial training. With that information, we can infer a fairly accurate bilingual dictionary without access to any translation and essentially perform word-by-word translation,” Facebook said in a blog post.
Researchers used this to translate between English to Russian, English to Romanian, and the even rarer case of English to Urdu. It’s pretty impressive, but don’t get too excited as the results are not anywhere near human translation standards. Facebook’s model did seem to do better than other unsupervised learning techniques, but wasn’t always better than traditional systems that use supervised machine learning.
This shows that although AI and machine translation is improving and making exciting leaps, human translation is still imperative in localizing content to different cultures.