This is the first in a series of tutorials dealing with translation workflows using the OmegaT and the MemoQ CAT tools.
In this project I will translate a Hebrew document into English, using both tools in parallel.
Stage 1:
Looking at the document
At the first stage, we just look at the document. It's a .docx file, so both CAT tools shouldn't have a problem with it, and the formatting is pretty neat. We would like to have the same formatting in the target file.
Firing up a project
We now open a project in OmegaT and a project in MemoQ in parallel.
Both tools require us to establish a directory for the project files, define the source and target languages and load the documents.
Looking at the file structure
Now we look at the OmegaT and MemoQ file structure. OmegaT has created a complete directory and sub-directory structure for all aspects of the project:
These sub-directories are named after their contents, and it is now up to the translator to place appropriate files within them: dictionaries and glossaries, source and document and legacy TMs. OmegaT will eventually fill the "target" directory with the translation target and the omegat directory with the project's TM.
MemoQ on the other hand created two directories, one for its TM and one for the project, with a translation documents sub-directory. It may be seen that both tools created copies of the original document.
As regards comparison:
| Feature | OmegaT | MemoQ |
|---|---|---|
| Tag Forest | Large problematic tag forest | Sparse tag forest |
| File structure | Complete and obvious | Partial and hidden |
Dealing with the OmegaT tag-forest problem
When we load the .docx into OmegaT we see a large tag "forest" that makes translating difficult.
Browsing the internet, it turns out that this problem is common to a number of translation tools and has to do with the structure of the docx file, so much so that David Turner has created a CodeZapper just to get rid of docx hidden tags. He describes his program thus:
"CodeZapper" is a set of Word VBA macros designed to “clean up” Word files before being imported into a standalone translation environment (DVX, memoQ, SDL Studio, TagEditor, Swordfish, OmegaT, Wordfast Pro, etc.).
Word documents are often strewn with “rogue codes” or junk tags (so-called “smart tags”, language tags, track changes tags, spellchecker tags, soft hyphenations, scaling and spacing changes, redundant bookmarks, etc.).
This tagged information shows up in the translation grid as spurious codes{1}around{2}, or even in the mid{3}dle of, words, making sentences difficult to read and translate and generally negating many of the productivity benefits of the program.
So we will now try to "cure" the .docx file in such a way as to enable its import into OmegaT without tags. However, since the combination of Hebrew and English requires certain bidi codes, we had best tread carefully and try more than one way of preparing the same document.
The methods we have used are:
1.An .odt transformation method, where we first saved the .docx file as a 2003 .doc file. then opened the .doc file with OpenOffice then saved it as .odt. We loaded the .odt file into OmegaT and most of the tag forest was gone.
2. The CodeZapper method, where we used David Turner's CodeZapper Macro on the file and saved it again as .docx. This file too, when loaded did not display many tags in OmegaT.
So it seems both methods solve the problem. However, the proof of the pudding is in the eating, and we will not know if this method is applicable to files with bidi codes that intermix both English and Hebrew before we complete the translation.
Hello
n95 masks directly from our factory in U.S.A.
We have large stocks.
Order here https://screenshot.photos/masksinstockn95
Yours truly
"Sent from my Smart Phone"