iLib Blog

The most comprehensive library of Javascript i18n classes available

Brought to you by: ehoogerbeets1

Externalized Strings vs. Extracted Strings

There are two basic internationalization philosophies when it comes to string translation. You could externalize all strings into a resource file manually, come up with unique keys for each one, then modify your source code to load the string using that key. Alternately, you could leave the source string in the source code, wrap it in a function call to load the translation, and then extract it later using a tool.

iLib uses the latter method. The ResBundle class has a getString() method which takes a string written in the source language as a parameter. Optionally, you can instead pass it an object that contains a key property and a value property where you put the source string.

In most small-to-medium-sized companies that I have worked, I have either been the lone i18n engineer, or one of a small set of them. This means I had to rely on the core engineers to play nice and write i18n-friendly code on their own. This took much teaching and many brown bags, and once I even resorted to publishing an "i18n fail of the week" email of the most interesting or humourous problems I found. This of course was only half tongue-in-cheek. The perpetrators of the offending code were anonymous in the email, but they knew who they were, and the semi-public shaming seemed to have a positive effect in the long run. Even today, ex-coworkers of mine tell me things like, "Hey Edwin, you'd be proud. I still think about international today when I code because of you."

In order to turn the core engineers into allies like that, I had to make their lives as easy as possible by removing annoying tasks and making the things I asked them to do as painless as possible. I18n was not their main task, so I was asking for their help. Using the extraction method of localization made this a lot easier.

Here is a list of the advantages of the extraction method:

The code is much more readable to engineers because the string is right there in the code.
Searching the code for a string you see in the UI only takes one step. Just search for the string. Done. With externalized strings, it is two or more steps:
1. Search the resource files for your string
2. Get the key for the resource that matches. Or worse, if there are multiple resources that match, get the list of keys.
3. For each key you find, search the source code for it. Hm. Tedious.
Getting the core engineers to externalize all their strings to a properties, or res, or yml or whatever format file is like trying to get a toddler to brush his or her teeth. They just don't care about doing it. (Well, at least that's how it was for the engineers I've worked with and for my twin toddlers. ;-) What's worse, the code is less readable to them (see points 2 & 3) and it is more work to externalize them and to search for strings, so there is a negative incentive for them to do it. If you tell them all they need to do is wrap their strings with a getString call and check in their code (no fiddling with other files), they will be more inclined to follow that guideline.
Source strings can be extracted or used in many different ways. One client I had externalized exception strings in the server with this method. The logger would use the source string as-is in English because they knew their English-speaking tech support people would need to read it. Then, the extracted strings were also sent to the UI side as a resource file which was then translated. The UI could then look up error messages from the server and display the translations instead. The UI people were also guaranteed that all error strings that the server could possibly return were in that resource file so they didn't have to worry about mysterious, untranslatable errors. Additionally, the values of substitution parameters were also sent to the UI so they could be formatted into the localized error string.
If you cut-and-paste a snippet of code around, you don't have to move the localizations with it. When you cut-and-paste code containing manually externalized strings, you have to remember to find the keys of all externalized strings in that snippet of code, then cut-and-paste the string resources to go with it. Even worse, you have to get core engineers who don't really know much or care much about i18n to cut-and-paste them properly. Good luck with that! When the extracted string method, you don't have to tell the core engineers anything. Let them cut-and-paste as much as they want.
Pseudo-localization is slightly easier when you have the source string available. getString() can just pseudo-localize its source string parameter and return it. Pseudo-localization is still possible when strings are externalized, but you have to first look up the source string in the resources by key, then psuedo-localize the value, and return the results.
When developing, engineers can add new UI elements and widgets with new strings and just test their code. No translation necessary. No externalization to resource files. Very little "extra" work is needed other than wrapping their string with getString(), which means that their code-test cycle is not impeded. This removes yet another negative incentive to helping you localize strings. Yes, when they run their new code in a different locale, the new widgets will appear in the source language. However, the code will always work because at the very least the source string is available as the string of last resort.

Of course, there are a some disadvantages as well. Here are the objection and of course my counter-arguments:

The source string is the key, so if you change the source string, your translations will no longer match and you have to re-translate. This is not a valid criticism, however, because if the string was externalized and you changed the string in the source language, you would have to re-translate too, so what's the difference?
You might counter that if there is no explicit key, then the key can change and therefore you can't even re-use old translations because there is nothing to key off of. Well, the way the translator workbench tools work, they don't really use the key to do fuzzy matching of strings in the translation memory. The old translation will be one of the possibilities that the translation tool will offer to the translator during the leveraging step. This is not really an issue.
The key may become horrendously long. This is rare, and should be avoided anyways. If your strings are that long, then any change whatsoever anywhere in the string, even to fix whitespace or a simple spelling error, will trigger a re-translation of the whole string. This is true whether you manually externalize or you extract. I tried to get my core engineers to limit their strings to paragraph or less to minimize the "collateral damage" of fixing that one little spelling, formatting, or whitespace problem.
You can't differentiate two strings with the same source but different meanings when they are used in different contexts. That is a problem, yes, but fortunately a rare one as well. iLib's getString() handles it by allowing you to optionally specify an explicit key along with the source string so that you can make sure manually that the keys are unique.
If a string has not fixed key, then it's hard to track when strings are deleted. That is true, but... so what? Maybe you want to clean up the old strings from your resource files? Well, usually that is not a big problem if you use an extractor tool. The only strings that get into your resource files are the one that actually exist in your source code.
How do you externalize the strings? Doesn't that still have to be done by someone? Of course not! It is easy for a tool to pick out all the strings. In fact, for one client, it only took me an two hours to write a shell script based on grep, awk, and sed scripts that picked all of the strings from their Java files. Perl or even Javascript on nodejs would work too.

JEDLSoft is currently working on a tool that helps you with extraction which will be more sophisticated than a set of simple scripts. Stay tuned for more details in the coming months.

Edwin

Posted by 2012-10-23