iLib Blog

The most comprehensive library of Javascript i18n classes available

Brought to you by: ehoogerbeets1

Externalized Strings vs. Extracted Strings

There are two basic internationalization philosophies when it comes to string translation. You could externalize all strings into a resource file manually, come up with unique keys for each one, then modify your source code to load the string using that key. Alternately, you could leave the source string in the source code, wrap it in a function call to load the translation, and then extract it later using a tool.

iLib uses the latter method. The ResBundle class has a getString() method which takes a string written in the source language as a parameter. Optionally, you can instead pass it an object that contains a manual key and a source string. This method has a number of advantages:

The code is much more readable to engineers because the string is right there in the code.
Searching the code for a string you see in the UI is one step. With externalized strings, it is two or more:
1. search the resource files for your string
2. get the key for the resource that matches. Or worse, if there are multiple resources that match, get the list of keys.
3. For each key, search the source code for it
Getting the core engineers (ie. the non-i18n specialists) to externalize all their strings to a properties, or res, or yml or whatever format file is like trying to get a toddler to brush his or her teeth. They just don't care about doing it. (Well, at least that's how it was for the engineers I've worked with and for my twin toddlers. ;-) What's worse, the code is less readable to them (see points 2 & 3) and it is more work to externalize them and to search for strings, so there is a negative incentive for them to do it. If you tell them all they need to do is wrap their strings with a getString call and check in their code (no fiddling with other files), they will be more inclined to follow that guideline.
Source strings can be extracted or used in many different ways. One client I had externalized exception strings in the server with this method. The logger would use the source string as-is in English because they knew their English-speaking tech support people would need to read it, and the extracted strings were also sent to the UI side as a resource file which was then translated. The UI could then look up error messages from the server and display the translations instead. The UI people were also guaranteed that all error strings that the server could possibly return were in that resource file so they didn't have to worry about mysterious, untranslatable errors. Additionally, the values of substitution parameters were also sent to the UI so they could be formatted into the localized error string.
If you cut-and-paste a snippet of code around, you don't have to move the localizations with it. When you cut-and-paste code containing manually externalized strings, you have to remember to find the keys of all externalized strings in that snippet of code, then cut-and-paste the string resources to go with it. Even worse, you have to get core engineers who don't really know much or care much about i18n to cut-and-paste them properly. Good luck with that! When the extracted string method, you don't have to tell the core engineers anything. Let them cut-and-paste as much as they need.
Pseudo-localization is easier when you have the source string available. getString() can just pseudo-localize its source string parameter and return it. Pseudo-localization is still possible when strings are externalized, but you have to first look up the source string in the resources by key, then psuedo-localize the value, and return the results.
When developing, engineers can add new UI elements and widgets with new strings and just test their code. No translation necessary. No externalization to resource files. Very little "extra" work is needed other than wrapping their string with getString(), which means that their code-test cycle is not impeded. Yes, when they run their new code in a different locale, the new widgets will appear in the source language, but things will always work because at the very least the source string is available as the string of last resort.

Of course, there are a some disadvantages as well:

The source string is the key, so if you change the source string, your translations will no longer match and you have to re-translate. This is not a valid criticism, however, because if the string was externalized and you changed the source language, you would have to re-translate too, so what's the difference?
You might counter that if there is no explicit key, then the key can change and therefore you can't even re-use old translations because there is nothing to key off of. Well, the way the translator workbench tools work, they don't really use the key to do fuzzy matching of strings in the translation memory. The old translation will be one of the possibilities that the translation tool will offer to the translator during the leveraging step. This is not really an issue.
The key may become horrendously long. This is rare, and should be avoided anyways. If your strings are that long, then any change whatsoever anywhere in the string, even to fix whitespace or a simple spelling error, will trigger a re-translation of the whole string. This is true whether you manually externalize or you extract. I tried to get my core engineers to limit their strings to paragraph or less to minimize the "collateral damage" of fixing that one little spelling error.
You can't differentiate two strings with the same source but different meanings when they are used in different contexts. That is a problem, yes, but fortunately a rare one as well. iLib's getString() handles it by allowing you to optionally specify an explicit key as well as the source string so that you can make sure manually that the keys are unique.
With no fixed keys, it's hard to track when strings are deleted. That is true, but... so what? Maybe you want to clean up the old strings from your resource files? Well, usually that is not a big problem if you use an extractor tool. The only strings that get into your resource files are the one that actually exist in your source code.
How do you externalize the strings? Doesn't that still have to be done by someone? Of course not! It is easy for a tool to pick out all the strings. In fact, for one client, it only took me an two hours to write a tool based on grep, awk, and sed scripts that picked all of the strings from their Java files. Perl or even Javascript on nodejs would work too.

JEDLSoft is currently working on a tool that helps you with extraction which will be more sophisticated than a set of simple scripts. Stay tuned for more details in the coming months.

Edwin

Posted by 2012-10-24