Given the constant competitive pressure on executives to expedite product time-to-market, many developers are given tight deadlines to deliver functional software. This software is often geared for localization once the source language version is ready for release.
Keeping these pressures in mind, developers can strive to ensure that basic internationalization (i18n) principles are maintained while developing software to facilitate localization efforts – and meet time-to-market requirements for all the required languages, not just the source.
Here are 12 internationalization (i18n) do’s and don’ts that all developers should read and apply in their work:
- Do externalize messages in Message Catalogs, resource files, and configuration files. Messages are textual objects that are translatable components. These catalogs or files, such as Java resource bundle message files or Microsoft resource files, are installed in a locale-specific location or named with a locale-specific suffix.
This i18n practice will facilitate the localization process, since localizers can work on these resource bundles without the need to modify source code. It will also permit the use of a single source code for all languages, where only the resource bundles will have different language flavors.
- Don’t internationalize fixed textual objects. These are objects that should not be translated, such as comments, commands, and configuration settings. Only externalize the strings needing translation. If these objects appear in resource or configuration files, they should be marked “NOT_FOR_TRANSLATION.”
Here are some examples of fixed textual objects not requiring i18n:
- User names, group names, and passwords
- System or host names
- Names of terminals (/dev/tty*), printers, and special devices
- Shell variables and environment variable names
- Message queues, semaphores, and shared memory labels
- UNIX commands and command line options (e.g., ls -l is still ls -l in all locales)
- Commands such as /usr/bin/dos2unix and /usr/ccs/bin/gprof
- Commands that are XPG4-compliant (in /usr/xpg4/bin/vi) and have equivalent non-XPG4 commands; non-XPG4 commands that are not fully internationalized. For example, /usr/bin/vi does not process non-EUC codesets, but /usr/xpg4/bin/vi is fully internationalized and can process characters in any locale.
- Some GUI textual components, such as keyboard mnemonics and keyboard accelerators
Do allow for text expansion in messages (especially for GUI items).
Here are some Microsoft translations into German:
- bullet –> Aufzählungszeichen
- bundle –> Einzelvorgangsbündel
- Link –> Verknüpfung
- Login –> Anmeldung
- Update –> Aktualisierung
- Undo –> Rückgängig (machen)
- Geschäftsaktivitätsüberwachung replaces the acronym BAM (Business Activity Monitoring)!
Apply the following expansion rules when possible during i18n. When the source text is:
- 0 – 10 characters: The expansion required is from 101 – 200%.
- 11 – 20 characters: 81 – 100%
- 21 – 30 characters: 61 – 80%
- 31 – 50 characters: 41 – 60%
- 50 – 70 characters: 31 – 40%
- Over 70 characters: 30%
But keep the string length well below your limit (usually 254 characters) to account for the extra characters needed.
Try to place the labels above the controls, not beside them. The expansion of a label can increase the width of the form more than the expected resolution, which will force horizontal scroll bars or cause truncation. This also simplifies localizing applications required into bidirectional languages (languages that are read from different directions [RTL or LTR], such as Arabic and Hebrew).
- Don’t use variables when you can avoid them. Variables create questions in the translator’s mind as to the gender of the term to substitute, making it difficult to correctly translate the sentences that incorporate it. If variables are to be used, offer a list of replacements. Also allow for gender and plurals variations in the translation of the sentences that incorporate the variable. For example:
if err = 400
errtext = “server”
errtext = “connection”
<P> The <%=errtext%> is currently unavailable </P>
While this displays grammatically correct sentences in English, the translation in French will be problematic. In French, the word “server” is masculine, while the word “connection” is feminine. The translator cannot use the correct translation for the article “the” based on the translation of the differing genders of server and connection.The code should be instead:
if err = 400
<P> The server is currently unavailable </P>
<P> The connection is currently unavailable </P>
At the same time and for similar reasons, don’t use composite strings. A composite string is an error message or other text that is dynamically generated from partial sentence segments and presented to the user in full sentence form. Use complete sentences instead, even at the expense of repeating segments. This will ensure the accuracy of the translation, regardless of gender, plurality, conjugation, or sentence structure.
Also, avoid using the same placeholders when using multiple variables in the same string, since the sentence structure does change in different languages. For example, (as in Total 5, 1 of 5) might read “5 of 1, Total 5″ in the translated text. Instead, use numbered placeholders (e.g., “Total %1, %2 of %3″).
Do perform pseudo-translation. Pseudo-translation is the process of replacing or adding characters to your software strings to detect character encoding issues and hard-coded text remaining in the source files. Here’s an example of a few strings from a C resource file, with their respective pseudo-translations in Japanese:
IDS_TITLE_OPEN_SKIN “Select Device”
IDS_TITLE_OPEN_SKIN “日本Sイlイct Dイvウcイ本日”
In these strings, Japanese characters replace the vowels in all English words. After compilation, testers can easily detect corrupt characters (junk characters replacing the Japanese characters) or strings that remain fully in English (source strings still embedded in the code).
- Don’t use IF Conditions or rely on a sort order in your code to evaluate a string value. For example, avoid (IF Gender = “Male” THEN). Always depend on enumeration or unique IDs
- Do use Unicode functions and methods to support all scripts. Applications that store and retrieve text data need to accept and display the characters from any given language. Using Unicode encoding solves the problem of unsupported character sets and the display of junk characters
- Don’t insert hard carriage returns in the middle of sentences. Translation memory tools key off hard returns and assume that the sentence has ended. Inserting them in the middle of a sentence leads to incomplete sentences in the translation database and corrupts the sentence structure in the target language files. Instead, replace hard returns with soft returns (or better yet, use a break tag of some sort, such as <BR>). Also be aware that sentence structures change in different languages, as well as the length of sentence parts. So, additional breaks may be needed in target languages.
- Do choose your third-party software provider carefully. Insist they support Unicode and comply with the above internationalization (i18n) practices. Often problems are encountered with third-party software, and the fact that you don’t have control over their code to fix the problems makes the localization tasks particularly difficult.
- Don’t use text in icons and bitmaps. The translated text may be too long to fit. Also, avoid using symbols with cultural connotations and locale-specific idioms.
- Do use long dates or month abbreviations instead of numbers when identifying dates. Month vs. day orders in different parts of the world vary (e.g., mm/dd/yy in the US; dd/mm/yy in Europe).
- Don’t alphabetically sort strings in string tables and resource bundles. Try to offer as much context as you can with the externalized strings. This will help the translator better adapt the translation to that context. If context is non-existent, run-time QA will take much longer to correct the translations.
For example: “Update” could be the action (to update) or the software itself. “Check” in a financial software could be the action (noun or verb), or the monetary equivalent. “Email” could be a verb or a noun.
Following these simple internationalization (i18n) principles will expedite product localization and reduce testing, rework, and quality assurance costs – ultimately allowing you to meet the strict time-to-market requirements expected from companies selling products worldwide.
To get proactive assistance in addressing the above software i18n issues during product localization as well as any technical translation services, contact our localization experts.
About the Author
Nabil Freij is the author of Enabling Globalization and the president, founder, and owner of GlobalVision International, Inc. (www.globalvis.com), a Software Localization and Translation specialist. He is trilingual and holds an MSEE from Brown University and an MBA from Bryant University. Freij has worked for 25 years in the hardware, software, and localization industries. He has traveled the world and lived in five countries. He is frequently published and quoted. Nabil is married and has two children. He currently resides in Palmetto, FL. Mr. Freij can be reached at firstname.lastname@example.org . You can read his blog at: http://blog.globalvis.com.