Home » Uncategorized » Page 2

Category: Uncategorized

TAUS, “data-driven translation” and the future

A few weeks ago my attention was drawn to an article at http://www.translationautomation.com/perspectives/translation-leaks.html. In the associated discussion, I said that “it would be easy to pick holes in his largely spurious arguments if I had time”. Someone commented that, “suggesting that you have the capability but not the time to pick apart somebody else’s cogent arguments is a bit like those people who sometimes offer to black one’s eye over the Internet,” so now I have a little time, here are my thoughts. I reproduce the article in full, so that my comments, which are in red, have the full context.

Tuesday, 07 December 2010 15:00 Jaap van der Meer

You trusted your bank. You trusted your currency. You trusted your government. You trusted your translations.

So what happens now? Your certainties are being unraveled one after another. The system you trusted is leaking. It is unsettling. And even scary… But then you realize: trust is good, and knowing the facts is always the best policy.

You trusted your translations, your carefully chosen terminology, and your translation memories built up with great care, and well protected against unauthorized use. And now, you realize that your translations are as good or as bad as any others, that your customers rarely read your translations and often rely on machine-translated texts – machine translations that, annoyingly, are sometimes surprisingly good. (Certainly some, and probably most of my clients do read my texts, in some cases quite carefully. If they don’t read yours, you must be a different branch of the industry.) You realize that others are sharing translation memories, perhaps even your translation memories that you have nurtured as your own assets. You realize now that your secure world of translation is leaking. You are losing control. The model isn’t working anymore. It is upsetting, but it is better to face this new reality. Let go of the illusion of control. (I never did have an illusion of control, thank you. I maintain my translation memories as carefully as I can, but I have sold my translations – it is open, as it always was, to my clients to do with them whatever they want.)

New reality

The world is changing and the translation industry is lagging behind. The industry still operates with a 20th century, western world mindset: the developed world exporting its goods and spreading its civilization to customers rich enough to pay. (That’s what I call business.) Translation is largely one-directional – from English into the major languages – and one translation per language fits all customers. (I don’t have the figures to challenge this, but it strikes me as a surprisingly narrow and extraordinarily Anglo-centric view to be held by somebody who works in the translation field. English may, indeed, be at present one of the world’s most important technical and commercial languages, but surely, by that very token, a huge slice of the translation business involves translating texts into English, and I am quite sure that another huge slice is between languages where neither is English.) Translation is priced by the word (in many cases) and managed in projects (that’s the way business is done). A project is traditionally product documentation, or instructions for use, or the user interface. Each project in principle is meant to increase sales in new markets and is measured by ROI. Efficiencies come from an overly simplistic (I take it that “overly simplistic” is an overblown way – should I say “overlyblown” – of saying “very simple”) technology called translation memory that was invented in the 80’s of the last century and has hardly advanced since then. (I am at a total loss trying to understand why the author regards software of this sort as in some way excessively simple. If he believes that it has not advanced over the last 20 years, I suspect he has not been using it. It has advanced hugely, as any user must surely know.)

Translation in the 21st century requires a very different vision. (I believe we are getting to his main commercial pitch.) Western hegemony is over. Products and services are developed, manufactured and marketed everywhere and anywhere. Customers are more self-confident and don’t read manuals anymore. They read blogs and peer reviews and pull information from customer support sites when they need it. In fact new generation users don’t need user instructions at all and if new products are well designed, they’re completely intuitive and let the users ‘plug and play’. But new generation customers are also more discerning when they buy a product.

In this new regime, translation is multidirectional, from any language into any language. (I think it always was – see above.) Quality requirements are different for different users and different usages. Machine translation is good enough for the largest volumes of dynamic web content (I wonder which language is he has tried this out on. Recently I have had occasion to make considerable use of Google translate for Italian, as I now live in Italy without speaking the language. Italian may be less important than English, but it is scarcely an obscure language. The plain fact is that in most cases the reader is lucky if the sense rises above gibberish. Worse still than being incomprehensible, it is not uncommon for it to be entirely wrong, failing for instance to see negatives or inserting negatives where they do not exist.), whereas pre-sales texts require a step-up in quality from the current one-translation-fits-all policy. (Who uses this policy? I suspect most of my clients have always had a higher standard.) Tuning in to the style and sub-culture of niche customer groups makes all the difference in an increasingly global marketplace. Word-based pricing and ROI measurement do not make much sense in this new economic reality. Translation memory software still serves its goals in the shrinking business of manual translations (Is there any evidence that the market is actually shrinking? I somehow doubt it), but it is totally inadequate for the growing volumes of dynamic content.

A Lesson for Translation in the 21st Century

If you can see even half way through this new reality, your concerns over translation leaks will begin to give way to a growing sense of excitement. Translation is coming out of the dusty libraries. (I think that happened some while ago!) Translation is gaining in relevance and significance as a worldwide service industry with billions of customers. Translation as a feature on every web site and every mobile device is the key to a vital global economy.

It is still unsettling of course when you realize that 90% of the translated words will be generated by machine translation engines, probably at no charge to the end-user. But considering there is a non-stop stream of multimedia information, the translation market will certainly innovate and assert its value in different ways. Naturally there is no need for every business to change and everyone to automate. In fact there will be a growing need for high-quality, tailored translations. But if you are tempted to join the innovation wave, I am sure you will figure out a way to prosper in this rapidly changing environment.

One aspect of the future of translation, however, is easily overlooked: the importance of ‘data’. ‘Data’ replaces the role of ‘translation memories’ as the key to efficiency. A jet engine with a thousand times the power of those 1980s propellers. Data drive  translation engines. Data will control the quality and the efficiency of translation in the future. (This is a highly contentious point, and there is no substantial evidence yet that data-driven translation engines can do an even adequate job, let alone a good one, of anything but the most restricted, narrow types of text with highly concrete references. Perhaps the directions for how to get from Wimbledon to Andover might be translated this way, although I would want to be sure that the source didn’t contain anything “challenging” like “don’t turn left here, even though the sign tells you to”. I concede that data may indeed control the quality of this kind of translation – it is likely to keep it at a low level.) Whoever has access to the data controls the future of translation. Privileged or monopolized access to data will jeopardize the blossoming of a 21st century translation industry. (Jeopardize the author’s project?) Ownership of translation memories – translation data – is therefore an important and sensitive topic of debate. The legal argument will not help us much longer in this age of translation leaks. (I take it that he is trying to preemptively prepare the ground in the hope of deflecting the criticism that much of what TAUS plans may well be in breach of copyright.) Data are mined, scraped, masked, shared and used by everyone from individual translators to large global corporations. Attempting to make a legal case against the unauthorized use of translation data will probably not work. The practical argument is all that counts (he hopes), and once translations are published, there is no way to control the leaks (he hopes). And to be honest, wouldn’t you rather turn the whole argument round? If your translations are not confidential, why not simply share these data with everyone who can use them to improve the efficiency and quality of translation as a whole. What stops you from doing this? (What stops me? Professional discretion, for a start, even if I haven’t signed a nondisclosure agreement (NDA). I don’t know quite how typical my position is, but I do know that it is not unusual. As a freelance translator, I have sold my translations. I feel that I have the right to recall my past translations, and to use technology to assist me in that task, and in that way to improve my future translations. But it seems to me quite clear that, at least in the vast majority of cases, copyright was assigned to the client. Suppose that a German nut manufacturer wrote some brilliant advertising copy, and imagine that, after my attentions, the English version was still brilliant. Let’s imagine that the two were published side by side on the client’s website. Now let’s suppose that a German bolt manufacturer sees this and thinks “Wunderbar, ve kann kopy zis text, und verr ze nut makers haff written “nut”, ve kann write “bolt”, und ve haff ze über-brilliant advertising kopy for ten cents only und ve kann sell our bolts to ze British and ze Amerikans!”. Clearly there would be a problem, and it seems to me obvious that it would be the nut-makers who would have a case against the bolt makers for breach of copyright. My translation was sold, and it was up to them to do whatever they like with it. Now over the years I have worked for over 50 clients, and since many of those have been agencies, there must have been several hundred end-clients. If I were to share my translation memories with TAUS I would therefore need to get written permission from hundreds of owners, many of whom, for obvious commercial reasons, are unknown to me. And that is true even for those where I have not signed any sort of NDA. I wonder to what extent TAUS is simply hoping that nobody will ever notice how much copyright material is present in the database they are assembling.

Oh, yes, there is another reason that stops me from doing this. Believe it or not, rather than paying me a substantial sum for joining them and sharing my translation memories, they want me to pay them. Work that one out!)

Wishing you Wisdom (I think he means “Wishing you will come over to my point of view”)

Here we are, at the start of the second decade of a new millennium, facing some real dilemmas. I know it’s hard to take such a radical step from one world into another. But be aware that while you are puzzling over which direction to go, your translations are leaking.

We wish you wisdom and success in 2011 and the decade ahead. We at TAUS are here to help as the industry think tank (or so they want us to believe), your innovation partner and a safe harbor for sharing your translation memories.


8am PDT / 5pm CET
Wednesday 15 December
45-minutes in duration

We share insights on the future of the translation industry. Content is based on a number of market and collective intelligence exercises undertaken by TAUS during 2010. This includes continuous review of the market, ideation sessions with major translation decision makers, and discussion with leading scientists, amongst others

(I included this advert from the bottom of the webpage, as an example of the language used by these language professionals. “Ideation sessions.” Good grief.)

What do people think about translators?

A (very good) agency for whom I work has sent me a job from a client I shall, of course, not name. It consists of 60 or 70 picture captions for a magazine. The end-client does not seem to think it worth giving the translators access to the actual pictures. After all, all we do in this business is retype what is there, but we just do it in a different language, don’t we?

Machine translation and human clangers

The New York Times has recently run a comparison of Web translation tools.  Over the last few weeks I have had reason to look at some Italian sites, and the existence of these tools has made me realise that there cannot be much money left in the business of “gist” translations. What has reassured me, however, as someone who makes a living from translation, is that even now these tools often fail even to give a “gist”.  Of course, we can expect the results to improve, and possibly to do so quite fast, but in many cases, especially if the source is at all complex, the result is near gibberish.  Consider, for instance, the word “provanti”, which does indeed seem to be some kind of Italian word.  But what does it mean?  Attempts to use the Google translator on phrases containing this word yield gibberish: “Casa Provanti”, for instance, is yielded as “home hard to deal with”, and other trials show that at the moment the translation engine believes that “provanti” means “hard to deal with”. No, no, no, no, no.  It is a participle associated with a verb for attempting or trying, as in “I am trying to please my guests”.  The engine seems to have fallen for what I recall as a schoolboy joke:

“Can’t you do that with a bit more effort?”
“Sorry, I’m trying.”
“Yes you are, very!”
(Boom, boom!)

But humans make mistakes too. Consider, for instance, a site to which I was recently referred, in the “languagering”, offering ” Training in writting”, and telling us that “writing can be a tedious tasks”.  It would appear that for that writer, checking spelling and proofreading were just too tedious by far!

And here is another, although the author of this one can be perhaps forgiven, as the text was submitted for proofreading and correction.  I will not name the source, as it is a client who pays me. It illustrates the mistake that can be made by some Germans (and no doubt those with other native tongues) who plan to save money by doing the “translation” themselves, then paying a native of the target for the proofreading only.  The problem is, of course, that the difficulties created by this process mean that the “proofreading and correction” may demand more time and money than simply translating in the first place.  The source text was “Bei Werkstattmontage gebohrt”. For those who don’t know German, “bei” is related to the English “by”, and carries meanings like “in association with” or “at the same time as”, as well as “next to” and so on.  It is not used, however, to convey agency in the same way as in “I was knocked down by a car”. “Werkstattmontage” is a simple example of a German compound noun, and can uncontroversially be translated as “workshop assembly”. “Bohren” is to make a hole, as into “bore into a piece of leather”; in engineering contexts it is most often translatable as to “drill”, and a “Bohrung” is a drilled hole.  So our phrase is a comment on a hole, and tells us that it is “Drilled during workshop assembly“. Our German “translator”, however, had rendered it as (wait for it…): “Bored by workshop assembly“.  They probably were!

Disclaimer: having criticised the spelling and command of language of others it is a cosmic law (WIP – Wilding’s Inevitability Principle) that I have made at least one silly mistake in this article.  Don’t blame me!


Now that I have lived in Sydney for a little while, I have naturally started to notice Australianisms. It’s the subtle ones that I find most interesting. Australians are well aware that things about the way they speak are particularly Australian, and there are plenty of books available on Australian slang. Many are now just corny cliches – I’m not sure if anybody still seriously refers to their mates as “cobber”. I imagine that most Australians who speak of putting “snags on the barbie this arvo” knows that this is Australian. I was not sure if I needed to explain that it means “sausages on the barbecue this afternoon”, but since my spell checker wanted to capitalize barbie, presumably in the belief that I was referring to a plastic doll, I thought perhaps I should.

I am more interested in the kind of Australianisms that a reasonably well-educated person would use when speaking more or less formally and perhaps not realize that an English (or for that matter Scots, American or what have you) English-speaker would find the expression odd, and perhaps detect that the speaker was Australian. As an example from elsewhere, I noticed in Ireland how the word “avail” was used in an extremely un-English way. For a start, the Irish say “avail” quite commonly, where in English English it is relatively rare, and rather more formal than Irish usage. What grated on my sensitivities when I first heard it, until I realized that it simply is the normal Irish way of speaking English, is that they do not use it reflexively. So where the English person might be judged to be pompous, but correct, to say “I availed myself of the opportunity to enter the dwelling”, an Irish grocer might put up a notice saying “Just collect coupons to avail of our half-price offer”. To see “to avail yourself of” in that context would be a surprise.

Inevitably I have now forgotten most of the subtle Australianisms that I have noticed until now, so this post is an opportunity to collect them over time. So far I have these:

  • Trifecta – something like a hat-trick. Scarcely known in English English (well I, at least, had never heard it before) but not at all uncommon here.
  • Identity – used in a context where English English might say “figure”: one hears, for instance of “underworld identity Bruce Smith” rather than ” underworld figure Bruce Smith”.
  • Bash – whereas I would think of this as a somewhat colloquial word for a blow or series of blows, as when one bashes a nail into the woodwork, or even bashes somebody in the eye, in Australian English a bashing is used quite formally (again, I’m thinking of television news, for instance) to refer to somebody being mugged, beaten up or seriously assaulted. In the 19th century the word was sometimes used for a flogging – perhaps that is the origin of this usage?

And I know that I’ve noticed more, but what were they?