Jörg Porsiel is Project Manager of machine translation at Volkswagen AG in Wolfsburg, Germany. He is a translation graduate and has been working in translation, terminology management and foreign language corporate communication since 1990.
joerg.porsiel(at)volkswagen.de
Machine Translation and data security
Most people know Machine Translation (MT) from the internet. Providers such as Google Translate, Yahoo! Babel fish, and others supply online translations within seconds “from every language into every language”, as it were. And all this free of charge! Naturally, decision makers in companies wonder why they should spend any money on erratic and eccentric human translators who produce between 10-15 pages on a good day, when the whole lot can be done much faster, at any time of the day, 365 days a year, and (almost) free of charge?“… Words are trivial … meaningless and forgettable … Words are very unnecessary. They can only do harm.”There is no way of avoiding MT
(Depeche Mode: Enjoy the Silence)
There is no longer any way of avoiding MT. CPU speed and throughput capability, prices of computers, computing time, as well as MT software quality no longer impose the same restrictions they used to. MT certainly does not provide the solution to all translation problems. It can, however, at least provide a great deal of assistance in accomplishing certain tasks (provided it is used properly!). The only factors regarding how - or if it makes sense - to use MT in the future, are text type, purpose (e.g. gisting), target group, and language combination. There are situations which lend themselves to the use of MT and others which do not, and will not for the foreseeable future. Either way, this requires changes to the approach and process of translation. Depending on what we specifically wish to achieve, the future role of the professional translator will be different to the one known to most of us today. There is already considerable demand for a task known as pre- and post-editing for which, as far as I am aware, there is still no concrete definition or job profile, let alone dedicated courses at university level.
For many years, the topic of MT has been under discussion by many people, and as I perceive it, especially by those who lack an appropriate level of expert knowledge on the subject. The reason for this lies in the fact that MT is seen by many decision makers within companies as the solution for all of their translation problems with regard to severe policies on cost-cutting within companies. Many allow themselves to be influenced or even deceived by the figures on paper that forecast relatively short-term potential for savings on their in-house translation employees (if indeed there are any), as well as considerable reductions in the cost of using translation agencies. They also see potential for achieving shorter turnaround times and greater translation volumes. Many believe that an unprecedented ROI in the area of translation could be achieved by simply installing a CD and going through (very) short orientation and introductory phases.
These developments have made me think a lot recently about how much gossip columns have in common with MT. Apparently, nobody pays money to read gossip columns and everybody knows that they are of poor quality. Yet somehow, millions of people are talking about and are entertained by Lindsay Lohan's latest battle with the courts, or how much Jessica Simpson weighs...this week. This is similar to MT. Apparently, nobody makes use of the services because they know that they produce poor-quality translations. They talk about and are entertained by the absurd translation results, nevertheless millions and millions of people visit these websites on a daily basis, maybe even to have their gossip columns translated. Google & Co. are now free and widely available and suddenly everybody has an expert opinion.
The success of Google Translate has virtually gone through the roof in terms of the number of users, and the number of language pairs offered. First of all, this means that the demand exists, that it is unimaginably high, and that it will continue to grow. From the point of view of the user, the whole situation is ideal insofar as every thing is free (at least it seems to be free, but more on this later). As users do not have to spend even one cent on this service, they are inclined to accept poorer quality, and in return feel entitled to laugh at the results (some of the time). But, and it is a big BUT, users usually underestimate or are completely unaware of what actually happens with their data that they have allowed to be translated somewhere online. The users of this supposedly free service actually pay a high price for their machine translations: their (highly) personal data, unwittingly uploaded onto a server, have been revealed to strangers and can be used and abused by third parties.
Some unsolved mysteries
In these times of harvesting, phishing, social engineering and cyber attacks à la Stuxnet, internet users need to think about the fact that the number of free online MT service providers is increasing on a steady basis. Apart from the “main players” mentioned above there are an increasing number of providers to niche markets offering less widely-spoken minority languages such as Afrikaans.
When faced with a foreign-language text, employees only have to be under enough pressure for a fast translation to make them search for an MT tool online. When they find one (there are plenty of ways of bypassing URL blocks), the text is uploaded, they receive their translation back, and done! Or is it? The users of the range of translation services available online should know the answers to the following questions before uploading anything: Who is the MT provider? In which country is the provider based? Where is the server? Who has access to it? Why are they offering this service (free of charge)? What happens with my data from a technical perspective? If there are no answers to these questions the service should be avoided. Hence, before making use of these services, potential users should consider if it is the right decision, and if so, for what type of text/content.
Possible scenario
Picture the following MT scenario for any of the online providers:
You receive an email with an encrypted attachment. Incidentally you ask why it is actually encrypted. Does it contain confidential or classified information, only aimed at certain people? That was by all means, a rhetorical question. You decrypt the text by means of a password and find out that it is in a language which you do not understand very well. However, you know the sender and know that this is an important subject (otherwise it would not have been sent as an encrypted document). You do not have the time or the means to translate the document yourself (or your pride will not let you inform colleagues or supervisors that you do not speak the language). You go online to search for MT. You upload the decrypted confidential text (of course decrypted or else it could not be processed!) via an open data connection through unknown routes to the tool, and within seconds you receive back a translation. And done! Is it really?
Worst case scenario
When using internet translation, the majority of users probably do not waste a thought about the technical side of what occurs or could occur with their data. Many consider the process the same as creating and saving a completely normal file. If two files are saved under the same name, the current file overwrites the previous one. Therefore, many assume that the translated file or text received back from an MT tool is the original file or data which has just been sent back in a different form. Hence, the user thinks they have not lost any data or left any trail of data behind. At best they probably think that the source text has “evaporated” somewhere in Internet Nirvana while being processed.
But this is not the case! The source text has most probably been saved along with the time stamp, the IP address and the company URL of the sender on the provider's server. In the same way, the internet translation is saved along with the time stamp and IP address of the receiver. In addition, it is conceivable that at least one more file is saved; a file containing terms not previously known to the MT system. All of this data can be linked together easily and freely.
Well, depending on the motives and the criminal intent of the provider, it is likewise conceivable that data gathered is evaluated by special software, collected and put together, e.g. according to particular search words, IP ranges or URLs. Depending on the content of the source text, subject matters can be analyzed in conjunction with the IP address of the sender/receiver and with relatively little effort from an IT perspective, an entire personal profile could even be created. Of course personal profiles could be created because the source text may contain personal information or obtainable data such as for example, names, addresses, email addresses, company ID numbers, bank details, contract details, deadlines, payment details, currencies used for payment, etc.
So much for "free" service; in return for more or less poor-quality translation, the user gives the provider potentially priceless personal or financial information, without being aware or even having suspicions that they have done so. This data can always be used at a later date in any way, and for any purpose.
Implementing data protection standards
In order to use MT on a large corporate scale, data security and protection against industrial espionage should certainly be goals of paramount importance from a business point of view. Therefore, the implementation of MT can ideally take place only within the company's own firewall and to be extra cautious, on a separate, possibly encrypted server, reachable via a secure data connection, and looked after by specially-trained personnel (from the company's own workforce).
Source text quality
As every experienced translator knows, the quality of the source text is crucial in determining the quality of the end product. However, this is especially relevant and vital to the quality of MT output because the system will not process what it does not know or recognize (or process it "wrongly"), even if it's just a word spelled incorrectly. The Garbage in Garbage Out (GIGO) principle comes into play in this regard. In a productive environment, the use of MT in conjunction with translation memories and pre- and post editing for the generation of High Quality Machine Translations (HQMT) of predetermined text types using company-specific terminology, requires a high degree of planning, commitment, and expertise on the part of the employees responsible for the process. Requirements for a good outcome and success of a company’s MT project are that the quality of the source text is suitable, that all employees combine their efforts, and that all interested parties recognize the overall benefit.
Let me conclude by sharing an email with you that I received recently and which I found quite entertaining. Occasionally users complain about the terrible quality of MT output, often even providing examples. The email I received contained one such example. Somebody was complaining to me that the tool was "bad" because it could not even translate the simple word, “scool” [sic!]. Just before I had a chance to react, I received a second email from the same sender with a further item of information; the tool neither knew nor translated the simple word, “educatoin” [sic!]. And this cost money, whereas Google & Co. would have been free and would have done a better job. Outrageous! I keep wondering if the sender was trying to have gossip about Lindsay or Jessica translated.
No comments:
Post a Comment