The Rise of the Machines

… when will they get around to it?

Recently I have been reviewing new entrants (here and here) to the Computer Assisted Translation market. If you read that stuff you will have noticed a jaded and dissatisfied air to my comments. The truth is, I have been waiting, for some time, to be made at least quasi-redundant by a CAT package that is to translation what the word processor was to typing. No such luck. Translation still relies almost entirely on the neurons inside my head, which is a shamefully inefficient way to get things done. I remain, like both Siddhartha Gotama and Mick Jagger, dissatisfied.

Then, a bit more than a week ago, an acquaintance from the big bad world of patent translation, who I hadn’t spoken with in years, contacted me to let me know that she was doing some market research for another new player in the CAT field. This is Kilgray Translation Technologies, an outfit I had never heard of before. My friend was asking translators and agencies what their wish list for a CAT tool would be. Rather than making things simple, and just answering the good lady’s question, I decided to blog it.

My CAT Wish List

  1. Runs in MS Word (or perfect mimic)
  2. Does automatic pre-look-ups in both dictionaries and concordance
  3. Stores glossary/dictionary word choice
  4. Semi-automatically adds to glossary
  5. Capable of MT pre-processing entire document
  6. Able to recreate related documents
  7. Capable of running in dedicated editing mode
  8. Runs in the cloud
  9. Provides assisted consistency checker
  10. Runs automatic errors and omissions check

Boy, now that I look at it in black and white, I can see that I am greedy. The awful thing is, I held back. There are a few ideas that I feel too possessive about to describe in a public blog, and several others that I left out because I didn’t want you to think I was spoiled and lazy.

In more detail:

1. Runs in MS Word (or perfect mimic)
I have macros that involve several hundred lines of code. My translation environment needs to be able to run them. If the CAT tool is an independent environment, it has to run VBA. I also want to be able to format as I edit, and I want to be able to edit in my translation environment. (See 7.)

2. Does automatic pre-look-ups in both dictionaries and concordance
In this day and age, it is shameful for me to have to copy and paste a word into a dictionary, nor do I want to wait more than 100 milliseconds for my CAT program to provide me with answers. CAT programs should scour all possible sources for all the words and phrases in the upcoming sentences and store the results in working memory. When I select a word in the source text, I should see pop-ups for, not only for the CAT glossary entries, but also any external dictionaries and concordance searches.

3. Stores glossary/dictionary word choice
The program should be automatically monitoring and storing my word choices. If I translate “port” as “door” in one segment, the first choice offered to me when I select that word in a subsequent segment should be “door.” And, please, I don’t want to have to do anything to make that happen. If the tool can’t figure out what word in the source corresponds to what word in the target, it’s not much of a tool.

4. Semi-automatically adds to glossary
In keeping with the above, if I make a glossary/concordance/dictionary inquiry for a term in the source, and it is not in the CAT glossary, or my chosen translation is not in the CAT glossary, I should be given the chance to add my translation to the CAT glossary with a single key stroke.

5. Capable of MT pre-processing entire document
It should be easy to ask the program to pre-segment the entire document then run MT on the pre-segmented text. This way, I can use search and replace to forward edit the MT results.

6. Able to recreate related documents
When I open a document, the program should analyze the entire document (in the background, please, I don’t ever want to be asked to wait while the CAT tool thinks) and determine if there are other similar documents that have been translated in the past. If so — if the number of 100% matches attributable to some other source document is beyond a threshold value — it should recreate said documents from the TM and load them in a separate window so that I can look at them. This would be like concordance just on a bigger scale.

7. Capable of running in dedicated editing mode
This one is key. In the translation of patents, the matching function of CAT tools is actually not that useful. Sometimes, when one is translating a family of patents (in house translators will see a lot of this), you will get a lot of hits, and of course the headings are all hits, but most of the time 75%+ matches are rare. So the main point of CAT tools for organizations like PTI is the clear presentation of the source and the target in discrete units. This makes it less likely that the translator will overlook any portion of the text. It allows the translator to easily and accurately reference the source when reviewing their own work. It allows the checker to visually compare the source and target without being distracted by the surrounding text. And it allows the translator to instantly verify the legitimacy of changes made by the checker.

That said, most CAT tools suck as editing platforms. For example, using the Track Changes function is essential to meaningful collaborative editing, but in standalone CAT platforms track changes functionality is either unavailable or requires a separate comparison step which is subject to errors. Working within MS Word, it is necessary to activate and deactivate Track Changes every time you advance to the next segment. We have a macro to do this automatically but it is hard to get freelance checkers and editors to use macros. There is also a problem with segment markers, which can easily be damaged by editors.

A good CAT tool will be a robust editing environment, allowing for editing segment by segment, with completely transparent segment marker protection, track changes functionality, commenting and other advanced bilingual editing. I’m not even going to get started on what I mean by other advanced bilingual editing.

8. Runs in the cloud
It is awful to have to deploy software on another person’s machine. It doesn’t work right, or they have a different version, or they don’t know how to use it and can’t explain why. Collaborative solutions have to run in the cloud.

9. Provides assisted consistency checker
There are various tools for checking terminology consistency in CAT platforms. The problem is that they are all active solutions. You have to set rules and/or run checks. A good program would provide passive consistency checking by noticing terminology usage and, at the moment that an inconsistency arose within one document, alerting the translator. A really good program would automate or semi-automate the correction.

Is that too much to ask?

Yeah, probably. A lot of this functionality requires that the CAT tool provide word-level analysis while operating at the segment level. It’s totally doable. I know this because I have written macros that provide that sort of functionality within the Wordfast environment. But, with the notable exception of Snowball, most CAT tools don’t even want to go there. If it is on the word level, they want you manually indicate your choices. In my opinion, that is precisely the sort of drudgery that computers were sent here to eliminate.

My ideal CAT program also requires a simple, robust, unobtrusive and intuitive user interface (think Mac) and, unfortunately, CAT tool developers have got it into their heads that translators are fine with fiddly, complex interfaces with eighteen windows, and nine pre-processing steps. And it’s true, many of us are nerds. But if it takes more than ten minutes to be up on running on something, it cannot really be deployed across the board in the sort of short term collaborations that are bog-standard in the translation industry.

Hmmm, I set out to write a Wish List but, coming to the end, I’ve got to say the tone is closer to Manifesto. Oh well, it can’t be helped, comrades.


6 thoughts on “The Rise of the Machines

  1. You must translate patents from a language written logically. I translate from what appears to be Japanese, so CAT software would be of limited value. My wish is for a writing course for Japanese patent attorneys.

  2. David,
    I know what you mean. I have to keep reminding myself that the author is my buddy, not my adversary. That helps in lowering my blood pressure, but sometimes it still ends up being like this:

    I usually have PDFs OCRed but with the ones where the image quality is so bad that the cost of proofreading the OCR is just not worth it, I have a trick. I copy and paste each paragraph as a graphic, right into Word. Long paragraphs I split (usually at sentence breaks). If you use something like PDFXChange Viewer, the selection remains on screen so that, when you come back for your next chunk, you know where you left off. It doesn’t help at the word level, but it does prevent skipping any sizable chunks of text.

  3. I would put a special emphasis on consistency checks, which a reccurent source of troubles.
    If cat tools could evolve in a way to help translators avoid such mistakes, that would be just perfect.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s