Conversions: file gymnastics

Conversions: file gymnastics

The vast array of tools and formats available has significant implications in terms of the formats that will be used to translate and to deliver the content, which may or may not match. Conversion becomes an issue when the source text format is not conducive to translation or distribution, or when the translation work takes place in a format different from that of the original. These kinds of circumstances require converting files from one format to another, often a challenging task.

If you contract with translators or agencies that use translation software, you will minimize the need for conversions, as they will return usable files to you. For example, source language html files will come back as translated html files, as will PowerPoint presentations, Word files, and many other types of documents. Even complex desktop publishing files (InDesign, for example) can be restored to their original format after translation.

If the members of your team are not familiar with translation software, you may have to contend with extracting or converting the source text to have it translated, and then reinserting it into the original format. This can be quite complex, and time consuming.

So, what are your options?

  • First, try to avoid the need to convert files. If you and your translators can work with the same tools, or with highly compatible tools, you will reduce the need for conversion. For example, if your translators can work with your PowerPoint files, or work with translation software, everything will likely go well, and only minor formatting may be required after translation.
  • Second, strategize how to effectively minimize the need for conversion. If the source language files are not in the format that ultimately will be used, logically you should convert them into the final format before having them translated.
  • Third, when conversions are unavoidable, try to make them as painless as possible by choosing the most compatible tools.

As you might imagine, there is no single solution to address file conversions, and each case will likely be different, and require different approaches. These are some ideas on how to tackle your own file gymnastics.


Educational materials are commonly distributed online in html format, but often they are first developed in a word processor, such as MS Word, and you may have access to both sets of source language files.

If you have such files, and some of the people involved in the translation effort can only work in Word, you could choose to send have the Word files translated, and then to rebuild the html files. In this case, you could even allow collaboration using Google Docs: converting Word files to that format and back is quite easy, although complex Word formats may be lost, and the final file may require more formatting, especially if re-exported to Word.

Another aspect to keep in mind is whether the Word files are the most up-to-date: because it is easier to make last minute changes directly in the html files, it is not uncommon for the original Word files to be left unaltered. In this case, asking your team to use an html editor instead might be a better option, since it may eliminate the need to identify the differences or convert the files.

Word-corrupted html files

Suppose you send some translated html files out for final review, and they come back to you clearly contaminated with html code from Word. Depending on the number of files and the complexity of the content, you may have to resort to other means to generate the final translated product.

Manual cleanup

While Word does corrupt the files, they can often be cleaned up satisfactorily in an html editor. For example, Dreamweaver's Clean Up Word HTML option removes a lot of Word-specific markup, and fixes various types of formatting and tag issues. While the cleanup is not perfect, it will improve the files substantially; other changes will become more manageable, and can often be made via search and replace. Corrupted links will probably have to be rebuilt manually.


Manually cutting and pasting the translated text into a copy of the original file is one possibility. The cut/paste method is less time consuming than one might think, especially for small files, though it can take considerable time for heavily coded files. This will require the translator to review the final product carefully.

TM through file alignment

If you have access to a translation program, one possibility is to align the source (original) and target (translated) files to create a translation memory that can then be processed through the translation software to generate the finished product. The TM approach requires at least one copy of the tool, and at least one person capable of using it. In addition, file alignment involves matching the content segments by segment, so it requires a certain level of comprehension of the source and target languages. This method also requires that a translator review the final product carefully.

PDF files

PDF files offer a convenient way to publish or distribute materials, but this is not a good format for translation. Depending on how they are made, PDFs can be flat (like a graphic, nothing can be selected), and protected, so you can't extract the text, even if you are able to select it. PDFs are always generated from a source file, be it Word, InDesign or any other file.

You can send PDFs for translation, and translation software environments can convert many PDFs, but the conversions are often imperfect, and can make translation laborious because of the many codes that are introduced into the working files. If you send a PDF, some translators will ask you for the source file, and this is actually for your benefit as much as theirs.

Imagine the rather extreme case of a PDF file made from a PowerPoint presentation. No text would be accessible, which means the translator won't be able to use translation software or native PowerPoint software; what you will likely get in return is a text file, completely unformatted. If you locate the source file you'll probably get back a formatted file; in the worst case, you'll have to adjust the format, without having to start from scratch.

If the source file is missing or lost, depending on its length, format, and complexity you may decide to try to extract the text, perhaps by OCR, and then format the material before proceeding with its translation. Files produced in this manner need to be checked carefully for OCR errors, such as misplaced words, missing paragraph marks, etc., before being sent to translation.

Once the text is extracted and cleaned up, you will have a source file that you can format to suit your needs prior to sending it to translation. At this point, it's worth remembering that you can either reproduce the original format, or reformat the document, perhaps to better suit your needs.

Last modified: Wednesday, March 1, 2017, 10:22 AM