Menu

Language, Country, and Script

There is an inherent mismatch between the way most natural language tasks use language, country, and script, and the way that they are used in a font. For tasks like hyphenation and spell-check, the most important piece of information is the language. If more precision is needed, the country is established, to distinguish between, for example, "color" in the United States, and "colour" in Great Britain -- the same word, just spelled differently. In some cases, a script might also be needed to distinguish between two writing systems in the same country. Vietnam, for example, uses the Latn script and the Hani script. Again, the words are the same, but the spelling is using an entirely different set of characters.

Fonts, on the other hand, need to know the script first, then possibly the language if additional clarity is needed.

It is useful for both hyphenation and font features to have all three of these, although the country code is not really important for examining font features. I have wrestled a bit with how to encapsulate the three so that the encapulated object can be used as a key to a map. In Java 6, Class java.util.Locale encapsulates the language and country code, but not the script. Java 7 adds the script, and so does the ICU4J library. Both aXSL and FOray use Java 6 in current development, and I would like to avoid the move to 7 for as long as possible. So for now, we will use the ICU4J library, which seems better than writing a custom interface and class to do the same thing. The only real downside is that these classes seem to be oriented toward telling Java about the user's locale, not the locales that they might be processing. The instances appear to be immutable, but it isn't documented that way, and I am a bit mystified at the moment about why it doesn't have factory methods instead of constructors, to try to avoid the creation of a large number of identical objects. We may need to do something about that on the implementation side.

Posted by Victor Mote 2017-01-03 Labels: language country script font hyphenation locale icu4j

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.