ארכיון הקטגוריה: CAT-COMPARISONS

OT ver MmQ – Chapter-3: Leveraging Legacy

Very often, your projects are completely unique, so you start from scratch and begin chugging away.
Sometimes however, you have a TM from a previous similar project, so you can just plug it in and start translating.
And sometimes, a client provides you with previously translated documents that you may be able to use if you can covert them into a TM.
This is essentially a time consuming effort, so make sure that it is worth you time.

OmegaT does not have a native aligner, so we will use another CAT tool called Felix.

OT ver MmQ – Chapter-1: Setting up a project

This is the first in a series of tutorials dealing with translation workflows using the OmegaT and the MemoQ CAT tools.

In this project I will translate a Hebrew document into English, using both tools in parallel.

Stage 1:

Looking at the document

At the first stage, we just look at the document. It's a .docx file, so both CAT tools shouldn't have a problem with it, and the formatting is pretty neat. We would like to have the same formatting in the target file.

Firing up a project

We now open a project in OmegaT and a project in MemoQ in parallel.

Both tools require us to establish a directory for the project files,  define the source and target languages and load the documents.

 

Looking at the file structure

Now we look at the OmegaT and MemoQ file structure. OmegaT has created a complete directory and sub-directory structure for all aspects of the project:

These sub-directories are named after their contents, and it is now up to the translator to place appropriate files within them: dictionaries and glossaries, source and  document and legacy TMs.  OmegaT will eventually fill the "target" directory with the translation target and the omegat directory with the project's TM.

MemoQ on the other hand created two directories, one for its TM and one for the project, with a translation documents sub-directory. It may be seen that both tools created copies of the original document.

 

As regards comparison:


Feature OmegaT MemoQ
Tag Forest Large problematic tag forest Sparse tag forest
File structure Complete and obvious Partial and hidden

 

Dealing with the OmegaT tag-forest problem

When we load the .docx into OmegaT we see a large tag "forest" that makes translating difficult.

Browsing the internet, it turns out that this problem is common to a number of translation tools and has to do with the structure of the docx file, so much so that David Turner has created a CodeZapper just to get rid of docx hidden tags. He describes his program thus:

"CodeZapper" is a set of Word VBA macros designed to “clean up” Word files before being imported into a standalone translation environment (DVX, memoQ, SDL Studio, TagEditor, Swordfish, OmegaT, Wordfast Pro, etc.).
Word documents are often strewn with “rogue codes” or junk tags (so-called “smart tags”, language tags, track changes tags, spellchecker tags, soft hyphenations, scaling and spacing changes, redundant bookmarks, etc.).
This tagged information shows up in the translation grid as spurious codes{1}around{2}, or even in the mid{3}dle of, words, making sentences difficult to read and translate and generally negating many of the productivity benefits of the program.

So we will now try to "cure" the .docx file in such a way as to enable its import into OmegaT without tags. However, since the combination of Hebrew and English requires certain bidi codes, we had best tread carefully and try more than one way of preparing the same document.
The methods we have used are:
1.An .odt transformation method, where we first saved the .docx file as a 2003 .doc file. then opened the .doc file with OpenOffice then saved it as .odt. We loaded the .odt file into OmegaT and most of the tag forest was gone.
2. The CodeZapper method, where we used David Turner's CodeZapper Macro on the file and saved it again as .docx. This file too, when loaded did not display many tags in OmegaT.
So it seems both methods solve the problem. However, the proof of the pudding is in the eating, and we will not know if this method is applicable to files with bidi codes that intermix both English and Hebrew before we complete the translation.