The MIT Technology Review recently posted an article touting the success of machine learning in the translation of long-last languages. Think Linear-A or Linear-B. These are languages found on ancient tablets, from the Minoan civilizations. Like many ancient languages, they were untranslated for a long time, almost two millennia, in this case.
In the non-machine-assisted manner, you have a man of a different culture and a different time looking at the language pattern and what is known of the culture, and interpreting the symbols in a manner that is considered translation. Language IS culture, so if you don’t know the culture, you probably cannot really translate the language. If you think of English today, and the different varieties, and the jargons, one can see that you might have words you recognize, but you may, in context, have no idea what they mean. And this is in a world you largely recognize — though maybe not.
I often point to the images of the Sumerian gods, when asked about translation, and when asked about machine translation, I did the same. Take the following image, and imagine, if you will, that this person lived on Earth, had wings, carried pail, wore a great skirt, and had two magical flower watches. Place yourself within your six-year old mind and the stories you could tell about what this person could do. Now imagine you are a British gentleman from the 1800s. What stories would he tell? And as you are, today, now? We’ve thrown out so many possible interpretations of the image, because we don’t believe in things now, that may have been believed then. And we certainly don’t all believe the same things, today.
So back to machine translation, and machine learning. Someone has to input the culture, or, alternately, ignore that there may be a variance. This article is about statistical analysis of a language structure, and makes the assumption that it can apply to more than one, and that this is meaningful, and that the output may approximate a reality we cannot know.
Another way to think about this, I translate for you, something from English to French, and I tell you it is true in both languages, because I know it to be true, because it matches the pattern.
Scholarly article is here. They speak of ‘decipherment of lost languages’ in which they are looking for cognates. They seem to assume success with their correct translation of 63.7% of the cognates. Last year I did some linguistic research for the McCain Institute, on the meaning of human rights. In our research, done in the US, across demographics, people couldn’t agree on the meaning of human rights, equality, equity, justice, freedom, and other such words. Yes, these are concepts that can be complex, but the degree of variation in meaning was surprising, and I’ve been doing this kind of research for a long time. Part of the issue seems to be that the meanings are changing very fast right now, combined with the media and fake news and all these other pressures on language and on culture. A corpus analysis of usage in the media shows enormous variation in the past few years, in usage, structure, and domains. So what I am saying, is we can’t do this for English from 2019 to 2009.
Yet here is research, saying they’ve derived grammar from probabilities, so here we have a statistical approach to meaning. And as anyone that ever reads this knows, I don’t agree that we can model culture mathematically, nor that we can understand or know the past when there were no living speakers to tell us what concepts meant. Our view is reductive, especially post-enlightenment.
Here’s my fellow, crossing the space-time barrier with his fancy beard and his magic pine cone.