This is a partial proposal for adding script support to the locale IDs. A
different proposal will be made for ICU4J support and a locale ID
conversion API for currently registered RFC 3066 language tags once API is
thought out a bit more.
This proposal seeks to specifically add script support for ICU locale
identifiers. This scheme will not break compatibility with the older ICU
locale ID scheme. This is an extension of the current scheme.
Currently the locale ID scheme looks something like this:
locale = language [ _COUNTRY ]? [ _VARIANT ]*
where
language = i-language | x-language | ISO-639 code (2-3 characters)
COUNTRY = ISO-3166 code (2-3 characters)
VARIANT = An unspecified number of characters
I'd like to make ICU do the following now.
locale = language [ _Script ]? [ _COUNTRY ]? [ _VARIANT ]*
where
language = i-language | x-language | ISO-639 code (2-3 characters)
Script = ISO-15924 code (exactly 4 characters)
COUNTRY = ISO-3166 code (2-3 characters)
VARIANT = An unspecified number of characters
(The dash and underscore are still considered the same identifier
separator)
This will make it easier to specify things like Simplified Chinese
(zh_Hans), Traditional Chinese (zh_Hant), Serbian in Cyrillic (sr_Cyrl),
Serbian in Latin (sr_Latn) or any other locales where a language can be
represented by more than one script.
Some of these locale identifiers are already registered here:
http://www.iana.org/assignments/language-tags
Here is the API that I'd like to add.
locid.h
/**
* Returns the locale's ISO-15924 abbreviation script code.
* @return An alias to the code
* @see uscript_getShortName
* @see uscript_getCode
* @draft ICU 2.8
*/
inline const char * getScript( ) const;
/**
* Fills in "dispScript" with the name of this locale's script in a
format suitable
* for user display in the default locale. For example, if the
locale's script code
* is "LATN" and the default locale's language code is "en", this
function would set
* dispScript to "Latin".
* @param dispCountry Receives the country's display name.
* @return A reference to "dispScript".
* @draft ICU 2.8
*/
UnicodeString& getDisplayScript( UnicodeString& dispScript)
const;
/**
* Fills in "dispScript" with the name of this locale's country in a
format suitable
* for user display in the locale specified by "displayLocale". For
example, if the locale's
* script code is "LATN" and displayLocale's language code is "en",
this function would set
* dispScript to "Latin".
* @param displayLocale Specifies the locale to be used to
display the name. In other
* words, if the locale's script code is "LATN",
passing
* Locale::getFrench() for displayLocale would
result in "", while
* passing Locale::getGerman() for displayLocale
would result in
* "".
* @param dispCountry Receives the country's display name.
* @return A reference to "dispScript".
* @draft ICU 2.8
*/
UnicodeString& getDisplayScript( const Locale& displayLocale,
UnicodeString& dispScript)
const;
uloc.h
/**
* Useful constant for the maximum size of the script part of a locale ID
* (including the terminating NULL).
* @draft ICU 2.8
*/
#define ULOC_SCRIPT_CAPACITY 6
/**
* Gets the script code for the specified locale.
*
* @param localeID the locale to get the ISO language code with
* @param script the language code for localeID
* @param scriptCapacity the size of the language buffer to store the
* language code with
* @param err error information if retrieving the language code failed
* @return the actual buffer size needed for the language code. If it's
greater
* than scriptCapacity, the returned language code will be truncated.
* @draft ICU 2.8
*/
U_CAPI int32_t U_EXPORT2
uloc_getScript(const char* localeID,
char* script,
int32_t scriptCapacity,
UErrorCode* err);
/**
* Gets the script name suitable for display for the specified locale.
*
* @param locale the locale to get the displayable script code with. NULL
may be used to specify the default.
* @param displayLocale Specifies the locale to be used to display the
name. In other words,
* if the locale's language code is "en", passing
Locale::getFrench() for
* inLocale would result in "", while passing
Locale::getGerman()
* for inLocale would result in "". NULL may be used to
specify the default.
* @param script the displayable country code for localeID
* @param scriptCapacity the size of the script buffer to store the
* displayable script code with
* @param status error information if retrieving the displayable script
code failed
* @return the actual buffer size needed for the displayable script code.
If it's greater
* than scriptCapacity, the returned displayable script code will be
truncated.
* @draft ICU 2.8
*/
U_CAPI int32_t U_EXPORT2
uloc_getDisplayScript(const char* locale,
const char* displayLocale,
UChar* script,
int32_t scriptCapacity,
UErrorCode* status);
deadline for comments: 10/10/2003 (October 10th, 2003)
George Rhoten
IBM Globalization Center of Competency/ICU San José, CA, USA
|