|
From: Richard G. <ric...@ap...> - 2023-06-30 01:31:21
|
Frank— This looks fine, but also kind of heavy-weight. I wonder if it’d make sense to just have a struct with all the fields and utility methods to convert between a locale ID string and that struct? I guess the weak point in that approach would be the key-value pairs at the end of the locale ID— an alternative approach might be to use a UHashTable (or whatever it's called) as the bearer of fields (or to provide both the struct and the hash table, depending on whether the caller cares about the key-value fields). The beauty of an approach like this is that you save a few heap allocations and don’t have to add as many new API functions. On the other hand, what you have is a thin wrapper around the existing C++ LocaleBuilder class, so you’re not having to implement the locale-mangling code all over again. I don’t know— what does everybody else think? —Rich > On Jun 29, 2023, at 5:09 PM, Frank Tang (譚永鋒) via icu-design <icu...@li...> wrote: > > > > Dear ICU team & users, > > > I would like to propose the following for: ICU 74 > > Please provide feedback by: Next Wednesday, July 5, or any time sufficiently in advance of the feature freeze > > Designated API review: Gary Wade > > Issue: https://unicode-org.atlassian.net/browse/ICU-22365 > A draft PR and implementation (not including test yet) could be found at > https://github.com/unicode-org/icu/pull/2520 ) > > This is to follow up the C++ API in ICU 64 and provide a C API requested in > > Issue to be discussed > > 1. Should the prefix be ulb_ or something else? > 2. What should ulb_build build? the value as returned by > Locale::getName() or Locale::toLanguageTag()? > 3. is it to use file ulocbuilder.{h,cpp}? or should it be something else? > > Add C API for ULocaleBuilder > > https://unicode-org.atlassian.net/browse/ICU-22365 > > File : icu4c/source/common/ulocbuilder.h > > // © 2023 and later: Unicode, Inc. and others. > // License & terms of use: http://www.unicode.org/copyright.html > #ifndef __ULOCBUILDER_H__ > #define __ULOCBUILDER_H__ > > #include "unicode/utypes.h" > > > /** > * \file > * \brief C API: Builder API for Locale > */ > > #ifndef U_HIDE_DRAFT_API > > /** > * Opaque type for a Locale builder. > * @draft ICU 74 > */ > typedef void *ULocaleBuilder; > > /** > * <code>ULocaleBuilder</code> is used to build stirng of valid <code>locale</code> > * from values configured by the setters. > * The <code>ULocaleBuilder</code> checks if a value configured by a > * setter satisfies the syntax requirements defined by the <code>Locale</code> > * class. A string of Locale created by a <code>ULocaleBuilder</code> is > * well-formed and can be transformed to a well-formed IETF BCP 47 language tag > * without losing information. > * > * <p>The following example shows how to create a <code>locale</code> string > * with the <code>ULocaleBuilder</code>. > * <blockquote> > * <pre> > * UErrorCode err = U_ZERO_ERROR; > * char buffer[ULOC_FULLNAME_CAPACITY]; > * ULocaleBuilder builder = ulb_open(); > * ulb_setLanguage(builder, "sr"); > * ulb_setScript(builder, "Latn"); > * ulb_setRegion(builder, "RS"); > * int32_t length = ulb_build( > * builder, buffer, ULOC_FULLNAME_CAPACITY, &error); > * ulb_close(builder); > * </pre> > * </blockquote> > * > * <p>ULocaleBuilders can be reused; <code>ulb_clear()</code> resets all > * fields to their default values. > * > * <p>ULocaleBuilder tracks errors in an internal UErrorCode. For all setters, > * except ulb_setLanguageTag and ulb_setLocale, ULocaleBuilder will return immediately > * if the internal UErrorCode is in error state. > * To reset internal state and error code, call clear method. > * The ulb_setLanguageTag and setLocale method will first clear the internal > * UErrorCode, then track the error of the validation of the input parameter > * into the internal UErrorCode. > * > * @draft ICU 74 > */ > > /** > * Constructs an empty ULocaleBuilder. The default value of all > * fields, extensions, and private use information is the > * empty string. The created builder should be destoried by calling > * ulb_close(); > * > * @draft ICU 74 > */ > U_CAPI ULocaleBuilder U_EXPORT2 > ulb_open(); > > /** > * Close the builder and destroy it's internal states. > * @param builder the builder > * @draft ICU 74 > */ > U_CAPI void U_EXPORT2 > ulb_close(ULocaleBuilder builder); > > /** > * Resets the <code>ULocaleBuilder</code> to match the provided > * <code>locale</code>. Existing state is discarded. > * > * <p>All fields of the locale must be well-formed. > * <p>This method clears the internal UErrorCode. > * > * @param builder the builder > * @param locale the locale > * > * @draft ICU 74 > */ > U_CAPI void U_EXPORT2 > ulb_setLocale(ULocaleBuilder builder, const char* locale); > > /** > * Resets the ULocaleBuilder to match the provided IETF BCP 47 language tag. > * Discards the existing state. > * The empty string causes the builder to be reset, like {@link #clear}. > * Legacy language tags (marked as “Type: grandfathered” in BCP 47) > * are converted to their canonical form before being processed. > * Otherwise, the <code>language tag</code> must be well-formed, > * or else the ulb_build() method will later report an U_ILLEGAL_ARGUMENT_ERROR. > * > * <p>This method clears the internal UErrorCode. > * > * @param builder the builder > * @param tag the language tag, defined as IETF BCP 47 language tag. > * @draft ICU 74 > */ > U_CAPI void U_EXPORT2 > ulb_setLanguageTag(ULocaleBuilder builder, const char* tag); > > /** > * Sets the language. If <code>language</code> is the empty string, the > * language in this <code>ULocaleBuilder</code> is removed. Otherwise, the > * <code>language</code> must be well-formed, or else the ulb_build() method will > * later report an U_ILLEGAL_ARGUMENT_ERROR. > * > * <p>The syntax of language value is defined as > * [unicode_language_subtag](http://www.unicode.org/reports/tr35/tr35.html#unicode_language_subtag). > * > * @param builder the builder > * @param language the language > * @draft ICU 74 > */ > U_CAPI void U_EXPORT2 > ulb_setLanguage(ULocaleBuilder builder, const char* language); > > /** > * Sets the script. If <code>script</code> is the empty string, the script in > * this <code>ULocaleBuilder</code> is removed. > * Otherwise, the <code>script</code> must be well-formed, or else the ulb_build() > * method will later report an U_ILLEGAL_ARGUMENT_ERROR. > * > * <p>The script value is a four-letter script code as > * [unicode_script_subtag](http://www.unicode.org/reports/tr35/tr35.html#unicode_script_subtag) > * defined by ISO 15924 > * > * @param builder the builder > * @param script the script > * @draft ICU 74 > */ > U_CAPI void U_EXPORT2 > ulb_setScript(ULocaleBuilder builder, const char* script); > > /** > * Sets the region. If region is the empty string, the region in this > * <code>ULocaleBuilder</code> is removed. Otherwise, the <code>region</code> > * must be well-formed, or else the ulb_build() method will later report an > * U_ILLEGAL_ARGUMENT_ERROR. > * > * <p>The region value is defined by > * [unicode_region_subtag](http://www.unicode.org/reports/tr35/tr35.html#unicode_region_subtag) > * as a two-letter ISO 3166 code or a three-digit UN M.49 area code. > * > * <p>The region value in the <code>Locale</code> created by the > * <code>ULocaleBuilder</code> is always normalized to upper case. > * > * @param builder the builder > * @param region the region > * @draft ICU 74 > */ > U_CAPI void U_EXPORT2 > ulb_setRegion(ULocaleBuilder builder, const char* region); > > /** > * Sets the variant. If variant is the empty string, the variant in this > * <code>LocaleBuilder</code> is removed. Otherwise, the <code>variant</code> > * must be well-formed, or else the ulb_build() method will later report an > * U_ILLEGAL_ARGUMENT_ERROR. > * > * <p><b>Note:</b> This method checks if <code>variant</code> > * satisfies the > * [unicode_variant_subtag](http://www.unicode.org/reports/tr35/tr35.html#unicode_variant_subtag) > * syntax requirements, and normalizes the value to lowercase letters. However, > * the <code>Locale</code> class does not impose any syntactic > * restriction on variant. To set an ill-formed variant, use a Locale constructor. > * If there are multiple unicode_variant_subtag, the caller must concatenate > * them with '-' as separator (ex: "foobar-fibar"). > * > * @param builder the builder > * @param variant the variant > * @draft ICU 74 > */ > U_CAPI void U_EXPORT2 > ulb_setVariant(ULocaleBuilder builder, const char* variant); > > /** > * Sets the extension for the given key. If the value is the empty string, > * the extension is removed. Otherwise, the <code>key</code> and > * <code>value</code> must be well-formed, or else the ulb_build() method will > * later report an U_ILLEGAL_ARGUMENT_ERROR. > * > * <p><b>Note:</b> The key ('u') is used for the Unicode locale extension. > * Setting a value for this key replaces any existing Unicode locale key/type > * pairs with those defined in the extension. > * > * <p><b>Note:</b> The key ('x') is used for the private use code. To be > * well-formed, the value for this key needs only to have subtags of one to > * eight alphanumeric characters, not two to eight as in the general case. > * > * @param builder the builder > * @param key the extension key > * @param value the extension value > * @draft ICU 74 > */ > U_CAPI void U_EXPORT2 > ulb_setExtension(ULocaleBuilder builder, char key, const char* value); > > /** > * Sets the Unicode locale keyword type for the given key. If the type > * StringPiece is constructed with a nullptr, the keyword is removed. > * If the type is the empty string, the keyword is set without type subtags. > * Otherwise, the key and type must be well-formed, or else the ulb_build() > * method will later report an U_ILLEGAL_ARGUMENT_ERROR. > * > * <p>Keys and types are converted to lower case. > * > * <p><b>Note</b>:Setting the 'u' extension via {@link #setExtension} > * replaces all Unicode locale keywords with those defined in the > * extension. > * > * @param builder the builder > * @param key the Unicode locale key > * @param type the Unicode locale type > * @return This builder. > * @draft ICU 74 > */ > U_CAPI void U_EXPORT2 > ulb_setUnicodeLocaleKeyword(ULocaleBuilder builder, > const char* key, const char* type); > > /** > * Adds a unicode locale attribute, if not already present, otherwise > * has no effect. The attribute must not be empty string and must be > * well-formed or U_ILLEGAL_ARGUMENT_ERROR will be set to status > * during the ulb_build() call. > * > * @param builder the builder > * @param attribute the attribute > * @draft ICU 74 > */ > U_CAPI void U_EXPORT2 > ulb_addUnicodeLocaleAttribute(ULocaleBuilder builder, const char* attribute); > > /** > * Removes a unicode locale attribute, if present, otherwise has no > * effect. The attribute must not be empty string and must be well-formed > * or U_ILLEGAL_ARGUMENT_ERROR will be set to status during the ulb_build() call. > * > * <p>Attribute comparison for removal is case-insensitive. > * > * @param builder the builder > * @param attribute the attribute > * @draft ICU 74 > */ > U_CAPI void U_EXPORT2 > ulb_removeUnicodeLocaleAttribute(ULocaleBuilder builder, const char* attribute); > > /** > * Resets the builder to its initial, empty state. > * <p>This method clears the internal UErrorCode. > * > * @param builder the builder > * @draft ICU 74 > */ > U_CAPI void U_EXPORT2 > ulb_clear(ULocaleBuilder builder); > > /** > * Resets the extensions to their initial, empty state. > * Language, script, region and variant are unchanged. > * > * @param builder the builder > * @draft ICU 74 > */ > U_CAPI void U_EXPORT2 > ulb_clearExtensions(ULocaleBuilder builder); > > /* > * Build the Locale< stirng from the fields set > * on this builder. > * If any set methods or during the ulb_build() call require memory allocation > * but fail U_MEMORY_ALLOCATION_ERROR will be set to status. > * If any of the fields set by the setters are not well-formed, the status > * will be set to U_ILLEGAL_ARGUMENT_ERROR. The state of the builder will > * not change after the ulb_build() call and the caller is free to keep using > * the same builder to build more locales. > * > * @param builder the builder > * @param err the error code > * @return the length of the locale id in buffer > * @draft ICU 74 > */ > U_CAPI int32_t U_EXPORT2 > ulb_build(ULocaleBuilder builder, char* buffer, int32_t bufferCapacity, UErrorCode* err); > > /** > * Sets the UErrorCode if an error occurred while recording sets. > * Preserves older error codes in the outErrorCode. > * > * @param builder the builder > * @param outErrorCode Set to an error code that occurred while setting subtags. > * Unchanged if there is no such error or if outErrorCode > * already contained an error. > * @return true if U_FAILURE(*outErrorCode) > * @draft ICU 74 > */ > U_CAPI UBool U_EXPORT2 > ulb_copyErrorTo(ULocaleBuilder builder, UErrorCode *outErrorCode); > > #endif /* U_HIDE_DRAFT_API */ > > #endif // __ULOCBUILDER_H__ > -- > Frank Yung-Fong Tang > 譚永鋒 / 🌭🍊 > Sr. Software Engineer > _______________________________________________ > icu-design mailing list > icu...@li... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-design |