CAT-COMPARISONS | פלד – תרגומים

This is the first in a series of tutorials dealing with translation workflows using the OmegaT and the MemoQ CAT tools.

In this project I will translate a Hebrew document into English, using both tools in parallel.

Stage 1:

Looking at the document

At the first stage, we just look at the document. It's a .docx file, so both CAT tools shouldn't have a problem with it, and the formatting is pretty neat. We would like to have the same formatting in the target file.

Firing up a project

We now open a project in OmegaT and a project in MemoQ in parallel.

Both tools require us to establish a directory for the project files, define the source and target languages and load the documents.

Looking at the file structure

Now we look at the OmegaT and MemoQ file structure. OmegaT has created a complete directory and sub-directory structure for all aspects of the project:

These sub-directories are named after their contents, and it is now up to the translator to place appropriate files within them: dictionaries and glossaries, source and document and legacy TMs. OmegaT will eventually fill the "target" directory with the translation target and the omegat directory with the project's TM.

MemoQ on the other hand created two directories, one for its TM and one for the project, with a translation documents sub-directory. It may be seen that both tools created copies of the original document.

As regards comparison:

Feature	OmegaT	MemoQ
Tag Forest	Large problematic tag forest	Sparse tag forest
File structure	Complete and obvious	Partial and hidden

Dealing with the OmegaT tag-forest problem

When we load the .docx into OmegaT we see a large tag "forest" that makes translating difficult.

Browsing the internet, it turns out that this problem is common to a number of translation tools and has to do with the structure of the docx file, so much so that David Turner has created a CodeZapper just to get rid of docx hidden tags. He describes his program thus:

"CodeZapper" is a set of Word VBA macros designed to “clean up” Word files before being imported into a standalone translation environment (DVX, memoQ, SDL Studio, TagEditor, Swordfish, OmegaT, Wordfast Pro, etc.).
Word documents are often strewn with “rogue codes” or junk tags (so-called “smart tags”, language tags, track changes tags, spellchecker tags, soft hyphenations, scaling and spacing changes, redundant bookmarks, etc.).
This tagged information shows up in the translation grid as spurious codes{1}around{2}, or even in the mid{3}dle of, words, making sentences difficult to read and translate and generally negating many of the productivity benefits of the program.

So we will now try to "cure" the .docx file in such a way as to enable its import into OmegaT without tags. However, since the combination of Hebrew and English requires certain bidi codes, we had best tread carefully and try more than one way of preparing the same document.
The methods we have used are:
1.An .odt transformation method, where we first saved the .docx file as a 2003 .doc file. then opened the .doc file with OpenOffice then saved it as .odt. We loaded the .odt file into OmegaT and most of the tag forest was gone.
2. The CodeZapper method, where we used David Turner's CodeZapper Macro on the file and saved it again as .docx. This file too, when loaded did not display many tags in OmegaT.
So it seems both methods solve the problem. However, the proof of the pudding is in the eating, and we will not know if this method is applicable to files with bidi codes that intermix both English and Hebrew before we complete the translation.

פלד – תרגומים | עברית אנגלית

נסיון, אמינות, דייקנות

ארכיון הקטגוריה: CAT-COMPARISONS

OT ver MmQ – Chapter-3: Leveraging Legacy

OT ver MmQ – Chapter-1: Setting up a project