This introductory text to statistical machine translation (SMT) provides all of the theories and methods needed to build a statistical machine translator, such as Google Language Tools and Babelfish. In general, statistical techniques allow automatic translation systems to be built quickly for any language-pair using only translated texts and generic software. With increasing globalization, statistical machine translation will be central to communication and commerce. Based on courses and tutorials, and classroom-tested globally, it is ideal for instruction or self-study, for advanced undergraduates and graduate students in computer science and/or computational linguistics, and researchers in natural language processing. The companion website provides open-source corpora and tool-kits.
This book gives a good one-volume summary of statistical machine translation (SMT), the technique that powers Google Translate and similar applications. Philipp Koehn is one of the best-known people in the field, and is very active both on the theoretical and the practical side. The Open Source Moses engine, which he and his group at Edinburgh University have developed over the last few years, has now become more or less the de facto standard toolkit for SMT. So: an authoritative, well-informed account of a new field.
The basic idea of SMT is shockingly simple, and, when the first papers started coming out in the early 90s, people in the language-processing community were indeed shocked. Suppose you're translating from French into English. All you do is take a large amount of bilingual text - the first experiments were done with the proceedings of the Canadian Parliament - line it up, and extract tables which list apparent correspondences between French phrases and English phrases and their relative frequencies. You then analyze the English text and produce a second set of tables which give the relative frequencies of English phrases on their own.
To translate, you take a French sentence, find bits of it that match French/English table entries, write down the associated frequencies both for the translation rules and for the resulting English phrases, and pick the combination that gives you the best score. There are two main reason why it's not completely straightforward. First, there are millions of possible combinations. Most words can be translated in several ways; for instance, à can be "on", "in" or "for", or, to choose a more interesting example that Not recently drew to my attention, branlette can be either "sugar shaker" or "hand job". The possibilities, needless to say, multiply out. Second, and at least as seriously, the English words will often be in a different order from the French words, so you need to take account of that in some way; here, the basic solution is for the translation algorithm to impose a penalty for changing the order, with big changes costing more than small ones.
But surely there must be more to translation than just looking things up in huge tables and picking the highest-scoring combo? Indeed there is: the fact of the matter, however, is that, with our present level of understanding, this is the method that works best. At the end of the book, there is a chapter briefly describing smarter methods that pay some attention to grammar; but they're not that much smarter, they're much more challenging to implement, and the gains are modest.
I am irresistibly reminded of the discussions of Ptolemaic astronomy in Laplace's wonderful Exposition du système du monde. When you don't really understand planetary motion, you use the best model you can come up with and try to make it fit the data as well as you can. It is hard to believe that the ancient Greek astronomers really thought that the planets moved on invisible crystal spheres attached to other invisible crystal spheres, but you can make it work quite well as a predictive theory if you're prepared to do the necessary number-crunching. As Laplace says, this turned out to be a far more fruitful research direction than imaginative armchair theorizing. People developed the system of equants, deferents and epicycles as far as it would go, and, by carefully studying what went wrong, they eventually found something that was genuinely better. In Machine Translation, we haven't yet reached the Newtonian stage. But if you want to know the details of how those crystal spheres work, Koehn's book is the one to buy. ______________________________
Here's a cute experiment I just heard about from one of Philipp Koehn's colleagues. Go to Google Translate and try translating the two sentences "I saw few people" and "I saw a few people" into various languages. In some cases, the results will, as you'd expect, be different; in others, they'll be the same.
I suppose there might be some languages where they actually should be the same, but it's definitely getting it wrong in Swedish and I'm almost sure it's wrong in Russian too. It's definitely right in French, and I think in Norwegian. Basically, statistical machine translation contains a strong element of randomness.
If you speak a non-English language fluently, feel free to tell the rest of us what happens in your language!
This is a heavy book. It was quite difficult to extract the theoretical bits from all the statistical equations all through the book. I suppose it is quite a good start for those interested in how machine translation works. If anything, it removed the magic from the process.
A very detailed textbook on SMT. Shame that SMT has become basically obsolete since around 2015, when Neural Machine Translation (e.g., LSTM / Transformer models), took over...!
I'll definitely come back to this one - it was a good intro. Very heavy on equations and some theory that I'm unfamiliar with, but great for getting a good idea of the topic and where to research further.