From: Alan L. <ala...@jt...> - 2002-10-30 04:04:13
|
Expires Nov. 1 2002 Jitterbug: 2015 This bug covers the addition of new API and constants to support property and value aliases as defined in PropertyAliases.txt and PropertyValueAliases.txt. There are new API in uchar.h (see below) and two new API in UnicodeSet (C++ only to start). Note: The term "IntPropertyValue", as seen in existing uchar API and in the new proposed API here, actually refers to both binary properties and enumerated properties. A better name might be "non-non-enumerated" properties (kidding). This bug also covers corresponding Java changes. These will follow the standard API idioms for Java but will otherwise closely follow the API proposed here. When the Java API is determined I will propose it separately to this list. Alan Liu IBM in uchar.h: typedef enum UProperty { ... binary and enumerated properties unchanged ... /** First constant for double Unicode properties. @draft ICU 2.4 */ UCHAR_DOUBLE_START=0x2000, /** Double property Numeric_Value. Corresponds to u_getNumericValue. @draft ICU 2.4 */ UCHAR_NUMERIC_VALUE=UCHAR_DOUBLE_START, /** One more than the last constant for double Unicode properties. @draft ICU 2.4 */ UCHAR_DOUBLE_LIMIT, /** First constant for string Unicode properties. @draft ICU 2.4 */ UCHAR_STRING_START=0x3000, /** String property Age. Corresponds to u_charAge. @draft ICU 2.4 */ UCHAR_AGE=UCHAR_STRING_START, /** String property Bidi_Mirroring_Glyph. Corresponds to u_charMirror. @draft ICU 2.4 */ UCHAR_BIDI_MIRRORING_GLYPH, /** String property Case_Folding. Corresponds to u_strFoldCase in ustring.h. @draft ICU 2.4 */ UCHAR_CASE_FOLDING, /** String property ISO_Comment. Corresponds to u_getISOComment. @draft ICU 2.4 */ UCHAR_ISO_COMMENT, /** String property Lowercase_Mapping. Corresponds to u_strToLower in ustring.h. @draft ICU 2.4 */ UCHAR_LOWERCASE_MAPPING, /** String property Name. Corresponds to u_charName. @draft ICU 2.4 */ UCHAR_NAME, /** String property Simple_Case_Folding. Corresponds to u_foldCase. @draft ICU 2.4 */ UCHAR_SIMPLE_CASE_FOLDING, /** String property Simple_Lowercase_Mapping. Corresponds to u_tolower. @draft ICU 2.4 */ UCHAR_SIMPLE_LOWERCASE_MAPPING, /** String property Simple_Titlecase_Mapping. Corresponds to u_totitle. @draft ICU 2.4 */ UCHAR_SIMPLE_TITLECASE_MAPPING, /** String property Simple_Uppercase_Mapping. Corresponds to u_toupper. @draft ICU 2.4 */ UCHAR_SIMPLE_UPPERCASE_MAPPING, /** String property Titlecase_Mapping. Corresponds to u_strToTitle in ustring.h. @draft ICU 2.4 */ UCHAR_TITLECASE_MAPPING, /** String property Unicode_1_Name. Corresponds to u_charName. @draft ICU 2.4 */ UCHAR_UNICODE_1_NAME, /** String property Uppercase_Mapping. Corresponds to u_strToUpper in ustring.h. @draft ICU 2.4 */ UCHAR_UPPERCASE_MAPPING, /** One more than the last constant for string Unicode properties. @draft ICU 2.4 */ UCHAR_STRING_LIMIT, /** Represents a nonexistent or invalid property or property value. @draft ICU 2.4 */ UCHAR_INVALID_CODE = -1 } UProperty; New mask for Cased_Letter category: #define U_GC_LC_MASK \ (U_GC_LU_MASK|U_GC_LL_MASK|U_GC_LT_MASK) /** * Return the Unicode name for a given property, as given in the * Unicode database file PropertyAliases.txt. * * @param property UProperty selector other than UCHAR_INVALID_CODE. * If out of range, NULL is returned. * * @param nameChoice selector for which name to get. If out of range, * NULL is returned. All properties have a long name. Most * have a short name, but some do not. Unicode allows for * additional names; if present these will be returned by * U_LONG_PROPERTY_NAME + i, where i=1, 2,... * * @return a pointer to the name, or NULL if either the * property or the nameChoice is out of range. If a given * nameChoice returns NULL, then all larger values of * nameChoice will return NULL, with one exception: if NULL is * returned for U_SHORT_PROPERTY_NAME, then * U_LONG_PROPERTY_NAME (and higher) may still return a * non-NULL value. The returned pointer is valid until * u_cleanup() is called. * * @see UProperty * @see UPropertyNameChoice * @draft ICU 2.4 */ U_CAPI const char* U_EXPORT2 u_getPropertyName(UProperty property, UPropertyNameChoice nameChoice); /** * Return the UProperty enum for a given property name, as specified * in the Unicode database file PropertyAliases.txt. Short, long, and * any other variants are recognized. * * @param alias the property name to be matched. The name is compared * using "loose matching" as described in PropertyAliases.txt. * * @return a UProperty enum, or UCHAR_INVALID_CODE if the given name * does not match any property. * * @see UProperty * @draft ICU 2.4 */ U_CAPI UProperty U_EXPORT2 u_getPropertyEnum(const char* alias); /** * Return the Unicode name for a given property value, as given in the * Unicode database file PropertyValueAliases.txt. * * @param property UProperty selector in the range UCHAR_INT_START <= * x < UCHAR_INT_LIMIT or UCHAR_BINARY_START <= x < * UCHAR_BINARY_LIMIT. If out of range, NULL is returned. * * @param value selector for a value for the given property. If out * of range, NULL is returned. In general, valid values range * from 0 up to some maximum. There are a few exceptions: * (1.) UCHAR_BLOCK values begin at the non-zero value * UBLOCK_BASIC_LATIN. (2.) UCHAR_CANONICAL_COMBINING_CLASS * values are not contiguous and range from 0..240. (3.) * UCHAR_GENERAL_CATEGORY values are not values of * UCharCategory, but rather mask values produced by * U_GET_GC_MASK(). This allows grouped categories such as * [:L:] to be represented. Mask values range * non-contiguously from 1..U_GC_P_MASK. * * @param nameChoice selector for which name to get. If out of range, * NULL is returned. All values have a long name. Most have * a short name, but some do not. Unicode allows for * additional names; if present these will be returned by * U_LONG_PROPERTY_NAME + i, where i=1, 2,... * @return a pointer to the name, or NULL if either the * property or the nameChoice is out of range. If a given * nameChoice returns NULL, then all larger values of * nameChoice will return NULL, with one exception: if NULL is * returned for U_SHORT_PROPERTY_NAME, then * U_LONG_PROPERTY_NAME (and higher) may still return a * non-NULL value. The returned pointer is valid until * u_cleanup() is called. * * @see UProperty * @see UPropertyNameChoice * @draft ICU 2.4 */ U_CAPI const char* U_EXPORT2 u_getPropertyValueName(UProperty property, int32_t value, UPropertyNameChoice nameChoice); /** * Return the property value integer for a given value name, as * specified in the Unicode database file PropertyValueAliases.txt. * Short, long, and any other variants are recognized. * * @param prop the UProperty selector for the property to which * the given value alias belongs. It should be in the range * UCHAR_INT_START <= x < UCHAR_INT_LIMIT or * UCHAR_BINARY_START <= x < UCHAR_BINARY_LIMIT; only these * properties define value names and enums. If out of range, * UCHAR_INVALID_CODE is returned. * * @param alias the value name to be matched. The name is compared * using "loose matching" as described in * PropertyValueAliases.txt. * * @return a value integer or UCHAR_INVALID_CODE if the given name * does not match any value of the given property, or if the * property is invalid. Note: U CHAR_GENERAL_CATEGORY values * are not values of UCharCategory, but rather mask values * produced by U_GET_GC_MASK(). This allows grouped * categories such as [:L:] to be represented. * * @see UProperty * @draft ICU 2.4 */ U_CAPI int32_t U_EXPORT2 u_getPropertyValueEnum(UProperty property, const char* alias); in UnicodeSet: /** * Modifies this set to contain those code points which have the given value * for the given binary or enumerated property, as returned by * u_getIntPropertyValue. Prior contents of this set are lost. * * @param prop a property in the range UCHAR_BIN_START..UCHAR_BIN_LIMIT-1 * or UCHAR_INT_START..UCHAR_INT_LIMIT-1. * * @param value a value in the range u_getIntPropertyMinValue(prop).. * u_getIntPropertyMaxValue(prop), with one exception. If prop is * UCHAR_GENERAL_CATEGORY, then value should not be a UCharCategory, but * rather a mask value produced by U_GET_GC_MASK(). This allows grouped * categories such as [:L:] to be represented. Mask values range * non-contiguously from 1..U_GC_P_MASK. * * @param ec error code input/output parameter * * @return a reference to this set * * @draft ICU 2.2 */ UnicodeSet& applyIntPropertyValue(UProperty prop, int32_t value, UErrorCode& ec); /** * Modifies this set to contain those code points which have the * given value for the given property. Prior contents of this * set are lost. * * @param prop a property alias, either short or long. The name is matched * loosely. See PropertyAliases.txt for names and a description of loose * matching. If the value string is empty, then this string is interpreted * as either a General_Category value alias, a Script value alias, a binary * property alias, or a special ID. Special IDs are matched loosely and * correspond to the following sets: * * "ANY" = [\u0000-\U0010FFFF], * "ASCII" = [\u0000-\u007F]. * * @param value a value alias, either short or long. The name is matched * loosely. See PropertyValueAliases.txt for names and a description of * loose matching. In addition to aliases listed, numeric values and * canonical combining classes may be expressed numerically, e.g., ("nv", * "0.5") or ("ccc", "220"). The value string may also be empty. * * @param ec error code input/output parameter * * @return a reference to this set * * @draft ICU 2.2 */ UnicodeSet& applyPropertyAlias(const UnicodeString& prop, const UnicodeString& value, UErrorCode& ec); |