Re: [icu-design] C API for LocaleBuilder

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Frank—

> I think those getter already exist in https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uloc_8h.html#a490a02bfc1dd2c007911680a9cd87b73
> if (uloc_getKeywordValue(localeID, uloc_toLegacyKey ("rg"),  
>          value, ULOC_FULLNAME_CAPACITY <https://source.chromium.org/chromium/chromium/src/+/main:third_party/icu/source/common/unicode/uloc.h;bpv=1;bpt=1;l=264?q=uloc.h&ss=chromium&gsn=ULOC_FULLNAME_CAPACITY&gs=KYTHE%3A%2F%2Fkythe%3A%2F%2Fchromium.googlesource.com%2Fcodesearch%2Fchromium%2Fsrc%2F%2Fmain%3Flang%3Dc%252B%252B%3Fpath%3Dthird_party%2Ficu%2Fsource%2Fcommon%2Funicode%2Fuloc.h%23ULOC_FULLNAME_CAPACITY%2523m%254010700>, &status) > 0 && U_SUCCESS(status) ) {

Right.  My point is that those APIs are a pain in the butt to use because you have to allocate a static buffer to receive the characters, even if you’re just checking that the field is empty.  A call to one of those functions tends to take about three lines of code.  If we’re already going to have an object there that has internally allocated the storage for the characters and can keep us from writing a bunch of that boilerplate (like the Locale object in C++), it’d be nice to leverage that to make the code snippet you show below even simpler.

—Rich

> On Jun 30, 2023, at 6:47 PM, Frank Tang (譚永鋒) <ft...@go...> wrote:
> 
> Dear Rich
> 
> Thanks for your reply
> Just also want to provide some additional historical context that I forget to mention
> 
> The ICU C++ LocaleBuilder API is try to feature/signature match what was in Java for years 
> See https://docs.oracle.com/javase/8/docs/api/java/util/Locale.Builder.html
> in C++
> What I propose here also tries to match that so we do not have yet another model only for C to build locales differently.
> 
> 
> As about your comment of "I think you might want an API that has getters, or an equivalent of the “Locale” class in C++"
> 
> I think those getter already exist in https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uloc_8h.html#a490a02bfc1dd2c007911680a9cd87b73
> 
> 
> “If this locale has the rg subtag, I want to get rid of it and move its value into the region field” 
> The current uloc.h API could be used with locale builder to achieve that
> 
> char localeID[ULOC_FULLNAME_CAPACITY <https://source.chromium.org/chromium/chromium/src/+/main:third_party/icu/source/common/unicode/uloc.h;bpv=1;bpt=1;l=264?q=uloc.h&ss=chromium&gsn=ULOC_FULLNAME_CAPACITY&gs=KYTHE%3A%2F%2Fkythe%3A%2F%2Fchromium.googlesource.com%2Fcodesearch%2Fchromium%2Fsrc%2F%2Fmain%3Flang%3Dc%252B%252B%3Fpath%3Dthird_party%2Ficu%2Fsource%2Fcommon%2Funicode%2Fuloc.h%23ULOC_FULLNAME_CAPACITY%2523m%254010700>];
> // .... copy locale to localeID
> UErrorCode status= U_ZERO_ERROR;
> ULocaleBuilder builder = ulb_open();
> ulb_setLocale(builder, localeID);
> // If this locale has the rg subtag
> char value[ ULOC_FULLNAME_CAPACITY <https://source.chromium.org/chromium/chromium/src/+/main:third_party/icu/source/common/unicode/uloc.h;bpv=1;bpt=1;l=264?q=uloc.h&ss=chromium&gsn=ULOC_FULLNAME_CAPACITY&gs=KYTHE%3A%2F%2Fkythe%3A%2F%2Fchromium.googlesource.com%2Fcodesearch%2Fchromium%2Fsrc%2F%2Fmain%3Flang%3Dc%252B%252B%3Fpath%3Dthird_party%2Ficu%2Fsource%2Fcommon%2Funicode%2Fuloc.h%23ULOC_FULLNAME_CAPACITY%2523m%254010700>];
> if (uloc_getKeywordValue(localeID, uloc_toLegacyKey ("rg"),  
>          value, ULOC_FULLNAME_CAPACITY <https://source.chromium.org/chromium/chromium/src/+/main:third_party/icu/source/common/unicode/uloc.h;bpv=1;bpt=1;l=264?q=uloc.h&ss=chromium&gsn=ULOC_FULLNAME_CAPACITY&gs=KYTHE%3A%2F%2Fkythe%3A%2F%2Fchromium.googlesource.com%2Fcodesearch%2Fchromium%2Fsrc%2F%2Fmain%3Flang%3Dc%252B%252B%3Fpath%3Dthird_party%2Ficu%2Fsource%2Fcommon%2Funicode%2Fuloc.h%23ULOC_FULLNAME_CAPACITY%2523m%254010700>, &status) > 0 && U_SUCCESS(status) ) {
>     // get rid of it and 
>   ulb_setUnicodeLocaleKeyword(builder, "rg", nullptr); // to remove "rg"
>   // move its value into the region field
>   ulb_setRegion(builder, value); 
> }
> // “If this locale ID doesn’t have a script code, 
> if (uloc_getScript(localeID,  value, ULOC_FULLNAME_CAPACITY <https://source.chromium.org/chromium/chromium/src/+/main:third_party/icu/source/common/unicode/uloc.h;bpv=1;bpt=1;l=264?q=uloc.h&ss=chromium&gsn=ULOC_FULLNAME_CAPACITY&gs=KYTHE%3A%2F%2Fkythe%3A%2F%2Fchromium.googlesource.com%2Fcodesearch%2Fchromium%2Fsrc%2F%2Fmain%3Flang%3Dc%252B%252B%3Fpath%3Dthird_party%2Ficu%2Fsource%2Fcommon%2Funicode%2Fuloc.h%23ULOC_FULLNAME_CAPACITY%2523m%254010700>, &status) == 0 && 
>     uloc_getScript(otherLocaleID,  value, ULOC_FULLNAME_CAPACITY <https://source.chromium.org/chromium/chromium/src/+/main:third_party/icu/source/common/unicode/uloc.h;bpv=1;bpt=1;l=264?q=uloc.h&ss=chromium&gsn=ULOC_FULLNAME_CAPACITY&gs=KYTHE%3A%2F%2Fkythe%3A%2F%2Fchromium.googlesource.com%2Fcodesearch%2Fchromium%2Fsrc%2F%2Fmain%3Flang%3Dc%252B%252B%3Fpath%3Dthird_party%2Ficu%2Fsource%2Fcommon%2Funicode%2Fuloc.h%23ULOC_FULLNAME_CAPACITY%2523m%254010700>, &status) > 0 && U_SUCCESS(status) ) {
>     // copy in the script code from this other locale ID."  
>   ulb_setScript(builder, value); 
>   
> }
> int32_t length = ulb_build(builder, localeID, ULOC_FULLNAME_CAPACITY, &status);
> ulb_close(builder);
> 
> The red code is already in the ICU for a long time
> The blue code is what I am proposing here.
> 
> 
> 
> On Fri, 30 Jun 2023 at 17:08, Rich Gillam <ric...@ap... <mailto:ric...@ap...>> wrote:
>> Frank—
>> 
>> I think the basic issue here is that our APIs for manipulating locale IDs in C can be really painful to use if you’re doing anything interesting, and that we want get rid of as much of that pain as we can.  I think that’s more important than whether we match the C++ interface or conform to some particular definition of “builder.”  That said, I’m starting to come around to the idea that what you’re proposing really is about as good as we’re going to get.
>> 
>> 
>> I’m going to withdraw my earlier suggestion.  I don’t think you’re quite getting what I’m saying, but my suggestion (using a struct) really only works well if you have language/script/region/variant— as soon you try to do anything with key-value pairs, you’d need to use a hash table (or something) instead of a struct, and not only does that mean you still have the heap allocation I was trying to get rid of, but it actually becomes harder to use than what you’re proposing.
>> 
>> I still think there might be value in having a struct-based API that just handled language/script/region/variant, but maybe that’s just me, and we certainly couldn’t go with JUST that, so it’s probably not worth the trouble.
>> 
>> I still tend to like Steven’s suggestion to have one setter with a selector code of some kind.  Your point that it would require a switch statement is valid, but I don’t think it’s actually that much of a performance hit in practice.  But if you’re trying to be as parallel as possible to the C++ interface, this works against that, I think.  I’m not going to argue this one either way.
>> 
>> I do want to raise one other issue, though: I think you might want an API that has getters, or an equivalent of the “Locale” class in C++.  A lot of the use cases for a locale builder are “I want this locale ID, but I want to change this one field value,” or “I want to change this one field value, but only if this other field or attribute has such-and-such a value”.  For example: “If this locale has the rg subtag, I want to get rid of it and move its value into the region field” or “If this locale ID doesn’t have a script code, copy in the script code from this other locale ID."  If we still use things like uloc_getName() to do “if this field has this value” part of the logic, we still have a problem.
>> 
>> That could be a separate ticket and API proposal, of course, but I think we want to think about it too.
>> 
>> —Rich
>> 
>>> On Jun 30, 2023, at 3:53 PM, Frank Tang (譚永鋒) via icu-design <icu...@li... <mailto:icu...@li...>> wrote:
>>> 
>>> 
>>> 
>>> ---------- Forwarded message ---------
>>> From: Frank Tang (譚永鋒) <ft...@go... <mailto:ft...@go...>>
>>> Date: Fri, 30 Jun 2023 at 15:53
>>> Subject: Re: [icu-design] C API for LocaleBuilder
>>> To: Rich Gillam <ric...@ap... <mailto:ric...@ap...>>
>>> 
>>> 
>>> 
>>> 
>>> On Fri, 30 Jun 2023 at 11:00, Rich Gillam <ric...@ap... <mailto:ric...@ap...>> wrote:
>>>> Frank--
>>>> 
>>>>> That would be a different approach to construct a locale. However, I would like to point out several things here.
>>>>> 1. That approach won't be a "builder"
>>>> 
>>>> But it would solve the problem we’re being asked to solve.
>>> 
>>> no, it does not. What the bug asked for is a C builder API and whatever the API does not follow  a "builder" design pattern does not solve the problem it asked for. A builder needs to be step by step so the decision of each field could be spread in different times and different places in the code. A single function with a struct does not allow that, and therefore not fulfilling the request and requirement. 
>>> 
>>> I am open to a concrete counter proposal. It is simple to think the issue could be solved by a simple set() w/ a constant if you only consider the 1) language, 2) script, 3) region, 4) variant, 5) languageTag, and 6) localeId as the only set of input. But once you start to consider 7) attributes, 8) unicode locale keyword/value, and 9) other extensions than "-u-" (for example "-t-" or anything in other_extension) things become complicated. The validation and override/replacement logic for each of them will be different
>>> 
>>>  
>>>> 
>>>>> but what you describe would require the caller to resolve that before putting it into the struct.
>>>> 
>>>> 
>>>> No, it wouldn’t.  For building locales, there’s nothing special going on while you’re setting up the builder— it’s just a bag of fields.
>>> 
>>> no, that is not true, it is NOT a bag of fields, it is an order list of fields, not a bag => the order matter.
>>> in bag sematic , the order does not matter
>>> 
>>> Consider the following
>>> 
>>> languageTag "en-Latn-GB-u-ca-roc"
>>> language "ja"
>>> script "Thai"
>>> region "CN"
>>> variant "hepburn-heploc"
>>> unicode extension "ca"= "japanese"
>>> language "sl"
>>> variant "bohoric"
>>> unicode extension "ca"= "coptic"
>>> 
>>> which will produce sl-Thai-CN-bohoric-u-ca-coptic"
>>> 
>>> but if you treat them as a bug and build that the order in 
>>> language "ja"
>>> script "Thai"
>>> region "CN"
>>> variant "hepburn-heploc"
>>> unicode extension "ca"= "japanese"
>>> language "sl"
>>> languageTag "en-Latn-GB-u-ca-roc"
>>> variant "bohoric"
>>> unicode extension "ca"= "coptic"
>>> 
>>> you will get
>>> which will produce en-Latn-GB-bohoric-u-ca-coptic"
>>> 
>>> which result in different value
>>> 
>>> 
>>>  
>>>>  All the magic happens when you run the build() method.
>>> 
>>> That should be implementation depend and should not be mandated by the API, there are nothing wrong to progressively build when each setting method got called.
>>>  
>>>>  I’m just saying that maybe instead of adding a new opaque class with a bunch of individual getters and setters for all those fields, we just use a simple struct (or a hash table, or something).
>>>  
>>> As I mentioned, what you said only make sense when the underline problem is a bag of fields, but the problem is this problem is not just a bag of fields. The information pass into the builder have overlapping with each other, in particular the information inside localeID and languageTag is not multurelly exclusive from language/script/region/variant/attribute/unicode locale keyword+value and other extension. They over lap and part of them could be override by another. If all the input are muturally exclusive from each other, then the problem is much simpler and possible could be solved by "just use a simple struct" . But that is NOT the case.(as the example I show above) now.
>>> 
>>> 
>>>> 
>>>>> The approach you mentioned seems to require the caller to do a lot of work to fill that struct. and the caller also needs to learn how to program UHashTable while using that API, is that better?
>>>> 
>>>> Good point.  UHashTable is a pain in the butt.  Maybe that’s enough right there to swing me back around to what you’ve already proposed…
>>>> 
>>>> That said, I do kind of like Steven’s suggestion to just have one set() method and just have constants for the individual fields...
>>> 
>>> If we look at the pre-existing locale C API in uloc.h we have different function instead of a get w/ a constant
>>> uloc_getLanguage(....
>>> uloc_getScript(....
>>> uloc_getCountry(....
>>> uloc_getVariant(....
>>> uloc_getName(....
>>> uloc_getBaseName(...
>>> 
>>> notice there is no API to get attributes.
>>> 
>>> we could also use a uloc_get() with a constant, right? but we didn't years ago. so... should we follow the same design pattern or to a set w/ constant 
>>> 
>>> 
>>> 
>>> We have the following different kind of operation to the builder
>>> 
>>> 1. language (w/o script and region), region, script, variant, language tag (w/ script or region) => set
>>> 2. attribute => add and remove (there could be multiple attributes in a locale
>>> 3. unicode extension set on a key ("ca" , "japanese")
>>> 4. other extension - set on a extension ('a', "abc-efg")
>>> 5. clear extension (say you start from "en-u-ca-japanese-nu-Thai" and you want to clear the calendar and change nu to "Arab"   to "en-u-nu-Arab") 
>>> 
>>> a simple set w/ constant method could simplify #1 above, but not powerful enough to deal with #2 #3 #4 and #5
>>> Also, I personally dislike the set + enum => internally swtich(enum) approach because almost all the time the caller know what operation need to perform in the compile time, but this approach mandate the function internally switch based on a value which is fixed in the program but cannot be easily optimized by the compiler. 
>>> 
>>> Consider the following case
>>> 
>>> 
>>> setA(context, value) {
>>>    context.a = value;
>>> }
>>> 
>>> setB(context, value) {
>>>    context.b = value;
>>> }
>>> and the caller
>>> 
>>> setA(context, 3);
>>> setB(context, 5);
>>> 
>>> vs
>>> set(context, field, value) {
>>>   switch(field) {
>>>      case A: context.a = value; break;
>>>      case B: context.b = value; break;
>>>   }
>>> }
>>> and the caller
>>> set(context, A, 3);
>>> set(context, B, 5);
>>> 
>>> The swtich / case inside the set() simply cannot be optimized by the compiler
>>> 
>>>  surely the first approach expose more functions but such exposure allow the compiler to avoid unnecessary run time hit by the switch/case which could be resovled in the compile time, not during the run time.
>>> 
>>> 
>>>> 
>>>> —Rich
>>>> 
>>>>> On Jun 30, 2023, at 12:44 AM, Frank Tang (譚永鋒) <ft...@go... <mailto:ft...@go...>> wrote:
>>>>> 
>>>>> Dear Richard
>>>>> 
>>>>> On Thu, 29 Jun 2023 at 18:31, Richard Gillam <ric...@ap... <mailto:ric...@ap...>> wrote:
>>>>>> Frank—
>>>>>> 
>>>>>> This looks fine, but also kind of heavy-weight.  I wonder if it’d make sense to just have a struct with all the fields and utility methods to convert between a locale ID string and that struct? 
>>>>> 
>>>>> That would be a different approach to construct a locale. However, I would like to point out several things here.
>>>>> 1. That approach won't be a "builder"
>>>>> The definition of a Builder design pattern is
>>>>> "Builder is a creational design pattern that lets you construct complex objects step by step. " 
>>>>> "Problem
>>>>> Imagine a complex object that requires laborious, step-by-step initialization of many fields and nested objects. Such initialization code is usually buried inside a monstrous constructor with lots of parameters. Or even worse: scattered all over the client code."
>>>>> 
>>>>> In other words, the key concept of "builder" itself is "step by step" and that allows the builder to get passing to different parts of the system to allow different part of the software to add information into the builder, without a centralized place to gather all the information together. 
>>>>> see https://refactoring.guru/design-patterns/builder
>>>>> 
>>>>> 2.  The step by step processing also means the timing of information putting in matter and the later information overrides the previous information. 
>>>>> for example, I could have a 
>>>>> ULocaleBuilder builder = ulb_open();
>>>>> ulb_setLanguageTag(builder, "en-Latn-GB-u-ca-islamic");
>>>>> ulb_setRegion(builder, "FR");
>>>>> ulb_setUnicodeExtension(builder, 'x', "abcd");
>>>>> .....
>>>>> ulb_setUnicodeLocaleKeyword(builder, "ca", "japanese");
>>>>> ulb_setRegion(builder, "DE");
>>>>> ulb_setUnicodeLocaleKeyword(builder, "ca", "roc");
>>>>> ulb_build(builder...)
>>>>> ulb_close(builder)
>>>>> 
>>>>> to build a locale "en-Latn-DE-u-ca-roc-x-abcd"
>>>>> 
>>>>> but what you describe would require the caller to resolve that before putting it into the struct.
>>>>> 
>>>>>>  I guess the weak point in that approach would be the key-value pairs at the end of the locale ID— an alternative approach might be to use a UHashTable (or whatever it's called) as the bearer of fields (or to provide both the struct and the hash table, depending on whether the caller cares about the key-value fields).  The beauty of an approach like this is that you save a few heap allocations and don’t have to add as many new API functions.  
>>>>> 
>>>>> The approach you mentioned seems to require the caller to do a lot of work to fill that struct. and the caller also needs to learn how to program UHashTable while using that API, is that better?
>>>>>  
>>>>>> On the other hand, what you have is a thin wrapper around the existing C++ LocaleBuilder class, so you’re not having to implement the locale-mangling code all over again.
>>>>>> 
>>>>>> I don’t know— what does everybody else think?
>>>>>> 
>>>>>> —Rich
>>>>>> 
>>>>>>> On Jun 29, 2023, at 5:09 PM, Frank Tang (譚永鋒) via icu-design <icu...@li... <mailto:icu...@li...>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Dear ICU team & users,
>>>>>>> 
>>>>>>> 
>>>>>>> I would like to propose the following for: ICU 74
>>>>>>> 
>>>>>>> Please provide feedback by: Next Wednesday, July 5, or any time sufficiently in advance of the feature freeze
>>>>>>> 
>>>>>>> Designated API review: Gary Wade
>>>>>>> 
>>>>>>> Issue: https://unicode-org.atlassian.net/browse/ICU-22365
>>>>>>> A draft PR and implementation (not including test yet) could be found at
>>>>>>> https://github.com/unicode-org/icu/pull/2520 )
>>>>>>> 
>>>>>>> This is to follow up the C++ API in ICU 64 and provide a C API requested in
>>>>>>>   
>>>>>>> Issue to be discussed
>>>>>>> 
>>>>>>> 1. Should the prefix be ulb_ or something else?
>>>>>>> 2. What should ulb_build build? the value as returned by 
>>>>>>> Locale::getName() or Locale::toLanguageTag()?
>>>>>>> 3. is it to use file ulocbuilder.{h,cpp}? or should it be something else?
>>>>>>> 
>>>>>>> Add C API for ULocaleBuilder 
>>>>>>> 
>>>>>>> https://unicode-org.atlassian.net/browse/ICU-22365
>>>>>>> 
>>>>>>> File : icu4c/source/common/ulocbuilder.h
>>>>>>> 
>>>>>>> // © 2023 and later: Unicode, Inc. and others.
>>>>>>> // License & terms of use: http://www.unicode.org/copyright.html
>>>>>>> #ifndef __ULOCBUILDER_H__
>>>>>>> #define __ULOCBUILDER_H__
>>>>>>> 
>>>>>>> #include "unicode/utypes.h"
>>>>>>> 
>>>>>>> 
>>>>>>> /**
>>>>>>>  * \file
>>>>>>>  * \brief C API: Builder API for Locale
>>>>>>>  */
>>>>>>> 
>>>>>>> #ifndef U_HIDE_DRAFT_API
>>>>>>> 
>>>>>>> /**
>>>>>>>  * Opaque type for a Locale builder.
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> typedef void *ULocaleBuilder;
>>>>>>> 
>>>>>>> /**
>>>>>>>  * <code>ULocaleBuilder</code> is used to build stirng of valid <code>locale</code>
>>>>>>>  * from values configured by the setters.
>>>>>>>  * The <code>ULocaleBuilder</code> checks if a value configured by a
>>>>>>>  * setter satisfies the syntax requirements defined by the <code>Locale</code>
>>>>>>>  * class.  A string of Locale created by a <code>ULocaleBuilder</code> is
>>>>>>>  * well-formed and can be transformed to a well-formed IETF BCP 47 language tag
>>>>>>>  * without losing information.
>>>>>>>  *
>>>>>>>  * <p>The following example shows how to create a <code>locale</code> string
>>>>>>>  * with the <code>ULocaleBuilder</code>.
>>>>>>>  * <blockquote>
>>>>>>>  * <pre>
>>>>>>>  *     UErrorCode err = U_ZERO_ERROR;
>>>>>>>  *     char buffer[ULOC_FULLNAME_CAPACITY];
>>>>>>>  *     ULocaleBuilder builder = ulb_open();
>>>>>>>  *     ulb_setLanguage(builder, "sr");
>>>>>>>  *     ulb_setScript(builder, "Latn");
>>>>>>>  *     ulb_setRegion(builder, "RS");
>>>>>>>  *     int32_t length = ulb_build(
>>>>>>>  *         builder, buffer, ULOC_FULLNAME_CAPACITY, &error);
>>>>>>>  *     ulb_close(builder);
>>>>>>>  * </pre>
>>>>>>>  * </blockquote>
>>>>>>>  *
>>>>>>>  * <p>ULocaleBuilders can be reused; <code>ulb_clear()</code> resets all
>>>>>>>  * fields to their default values.
>>>>>>>  *
>>>>>>>  * <p>ULocaleBuilder tracks errors in an internal UErrorCode. For all setters,
>>>>>>>  * except ulb_setLanguageTag and ulb_setLocale, ULocaleBuilder will return immediately
>>>>>>>  * if the internal UErrorCode is in error state.
>>>>>>>  * To reset internal state and error code, call clear method.
>>>>>>>  * The ulb_setLanguageTag and setLocale method will first clear the internal
>>>>>>>  * UErrorCode, then track the error of the validation of the input parameter
>>>>>>>  * into the internal UErrorCode.
>>>>>>>  *
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> 
>>>>>>> /**
>>>>>>>  * Constructs an empty ULocaleBuilder. The default value of all
>>>>>>>  * fields, extensions, and private use information is the
>>>>>>>  * empty string. The created builder should be destoried by calling
>>>>>>>  * ulb_close();
>>>>>>>  *
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> U_CAPI ULocaleBuilder U_EXPORT2
>>>>>>> ulb_open();
>>>>>>> 
>>>>>>> /**
>>>>>>>  * Close the builder and destroy it's internal states.
>>>>>>>  * @param builder the builder
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> U_CAPI void U_EXPORT2
>>>>>>> ulb_close(ULocaleBuilder builder);
>>>>>>> 
>>>>>>> /**
>>>>>>>  * Resets the <code>ULocaleBuilder</code> to match the provided
>>>>>>>  * <code>locale</code>.  Existing state is discarded.
>>>>>>>  *
>>>>>>>  * <p>All fields of the locale must be well-formed.
>>>>>>>  * <p>This method clears the internal UErrorCode.
>>>>>>>  *
>>>>>>>  * @param builder the builder
>>>>>>>  * @param locale the locale
>>>>>>>  *
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> U_CAPI void U_EXPORT2
>>>>>>> ulb_setLocale(ULocaleBuilder builder, const char* locale);
>>>>>>> 
>>>>>>> /**
>>>>>>>  * Resets the ULocaleBuilder to match the provided IETF BCP 47 language tag.
>>>>>>>  * Discards the existing state.
>>>>>>>  * The empty string causes the builder to be reset, like {@link #clear}.
>>>>>>>  * Legacy language tags (marked as “Type: grandfathered” in BCP 47)
>>>>>>>  * are converted to their canonical form before being processed.
>>>>>>>  * Otherwise, the <code>language tag</code> must be well-formed,
>>>>>>>  * or else the ulb_build() method will later report an U_ILLEGAL_ARGUMENT_ERROR.
>>>>>>>  *
>>>>>>>  * <p>This method clears the internal UErrorCode.
>>>>>>>  *
>>>>>>>  * @param builder the builder
>>>>>>>  * @param tag the language tag, defined as IETF BCP 47 language tag.
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> U_CAPI void U_EXPORT2
>>>>>>> ulb_setLanguageTag(ULocaleBuilder builder, const char* tag);
>>>>>>> 
>>>>>>> /**
>>>>>>>  * Sets the language.  If <code>language</code> is the empty string, the
>>>>>>>  * language in this <code>ULocaleBuilder</code> is removed. Otherwise, the
>>>>>>>  * <code>language</code> must be well-formed, or else the ulb_build() method will
>>>>>>>  * later report an U_ILLEGAL_ARGUMENT_ERROR.
>>>>>>>  *
>>>>>>>  * <p>The syntax of language value is defined as
>>>>>>>  * [unicode_language_subtag](http://www.unicode.org/reports/tr35/tr35.html#unicode_language_subtag).
>>>>>>>  *
>>>>>>>  * @param builder the builder
>>>>>>>  * @param language the language
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> U_CAPI void U_EXPORT2
>>>>>>> ulb_setLanguage(ULocaleBuilder builder, const char* language);
>>>>>>> 
>>>>>>> /**
>>>>>>>  * Sets the script. If <code>script</code> is the empty string, the script in
>>>>>>>  * this <code>ULocaleBuilder</code> is removed.
>>>>>>>  * Otherwise, the <code>script</code> must be well-formed, or else the ulb_build()
>>>>>>>  * method will later report an U_ILLEGAL_ARGUMENT_ERROR.
>>>>>>>  *
>>>>>>>  * <p>The script value is a four-letter script code as
>>>>>>>  * [unicode_script_subtag](http://www.unicode.org/reports/tr35/tr35.html#unicode_script_subtag)
>>>>>>>  * defined by ISO 15924
>>>>>>>  *
>>>>>>>  * @param builder the builder
>>>>>>>  * @param script the script
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> U_CAPI void U_EXPORT2
>>>>>>> ulb_setScript(ULocaleBuilder builder, const char* script);
>>>>>>> 
>>>>>>> /**
>>>>>>>  * Sets the region.  If region is the empty string, the region in this
>>>>>>>  * <code>ULocaleBuilder</code> is removed. Otherwise, the <code>region</code>
>>>>>>>  * must be well-formed, or else the ulb_build() method will later report an
>>>>>>>  * U_ILLEGAL_ARGUMENT_ERROR.
>>>>>>>  *
>>>>>>>  * <p>The region value is defined by
>>>>>>>  *  [unicode_region_subtag](http://www.unicode.org/reports/tr35/tr35.html#unicode_region_subtag)
>>>>>>>  * as a two-letter ISO 3166 code or a three-digit UN M.49 area code.
>>>>>>>  *
>>>>>>>  * <p>The region value in the <code>Locale</code> created by the
>>>>>>>  * <code>ULocaleBuilder</code> is always normalized to upper case.
>>>>>>>  *
>>>>>>>  * @param builder the builder
>>>>>>>  * @param region the region
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> U_CAPI void U_EXPORT2
>>>>>>> ulb_setRegion(ULocaleBuilder builder, const char* region);
>>>>>>> 
>>>>>>> /**
>>>>>>>  * Sets the variant.  If variant is the empty string, the variant in this
>>>>>>>  * <code>LocaleBuilder</code> is removed.  Otherwise, the <code>variant</code>
>>>>>>>  * must be well-formed, or else the ulb_build() method will later report an
>>>>>>>  * U_ILLEGAL_ARGUMENT_ERROR.
>>>>>>>  *
>>>>>>>  * <p><b>Note:</b> This method checks if <code>variant</code>
>>>>>>>  * satisfies the
>>>>>>>  * [unicode_variant_subtag](http://www.unicode.org/reports/tr35/tr35.html#unicode_variant_subtag)
>>>>>>>  * syntax requirements, and normalizes the value to lowercase letters. However,
>>>>>>>  * the <code>Locale</code> class does not impose any syntactic
>>>>>>>  * restriction on variant. To set an ill-formed variant, use a Locale constructor.
>>>>>>>  * If there are multiple unicode_variant_subtag, the caller must concatenate
>>>>>>>  * them with '-' as separator (ex: "foobar-fibar").
>>>>>>>  *
>>>>>>>  * @param builder the builder
>>>>>>>  * @param variant the variant
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> U_CAPI void U_EXPORT2
>>>>>>> ulb_setVariant(ULocaleBuilder builder, const char* variant);
>>>>>>> 
>>>>>>> /**
>>>>>>>  * Sets the extension for the given key. If the value is the empty string,
>>>>>>>  * the extension is removed.  Otherwise, the <code>key</code> and
>>>>>>>  * <code>value</code> must be well-formed, or else the ulb_build() method will
>>>>>>>  * later report an U_ILLEGAL_ARGUMENT_ERROR.
>>>>>>>  *
>>>>>>>  * <p><b>Note:</b> The key ('u') is used for the Unicode locale extension.
>>>>>>>  * Setting a value for this key replaces any existing Unicode locale key/type
>>>>>>>  * pairs with those defined in the extension.
>>>>>>>  *
>>>>>>>  * <p><b>Note:</b> The key ('x') is used for the private use code. To be
>>>>>>>  * well-formed, the value for this key needs only to have subtags of one to
>>>>>>>  * eight alphanumeric characters, not two to eight as in the general case.
>>>>>>>  *
>>>>>>>  * @param builder the builder
>>>>>>>  * @param key the extension key
>>>>>>>  * @param value the extension value
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> U_CAPI void U_EXPORT2
>>>>>>> ulb_setExtension(ULocaleBuilder builder, char key, const char* value);
>>>>>>> 
>>>>>>> /**
>>>>>>>  * Sets the Unicode locale keyword type for the given key. If the type
>>>>>>>  * StringPiece is constructed with a nullptr, the keyword is removed.
>>>>>>>  * If the type is the empty string, the keyword is set without type subtags.
>>>>>>>  * Otherwise, the key and type must be well-formed, or else the ulb_build()
>>>>>>>  * method will later report an U_ILLEGAL_ARGUMENT_ERROR.
>>>>>>>  *
>>>>>>>  * <p>Keys and types are converted to lower case.
>>>>>>>  *
>>>>>>>  * <p><b>Note</b>:Setting the 'u' extension via {@link #setExtension}
>>>>>>>  * replaces all Unicode locale keywords with those defined in the
>>>>>>>  * extension.
>>>>>>>  *
>>>>>>>  * @param builder the builder
>>>>>>>  * @param key the Unicode locale key
>>>>>>>  * @param type the Unicode locale type
>>>>>>>  * @return This builder.
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> U_CAPI void U_EXPORT2
>>>>>>> ulb_setUnicodeLocaleKeyword(ULocaleBuilder builder,
>>>>>>>         const char* key, const char* type);
>>>>>>> 
>>>>>>> /**
>>>>>>>  * Adds a unicode locale attribute, if not already present, otherwise
>>>>>>>  * has no effect.  The attribute must not be empty string and must be
>>>>>>>  * well-formed or U_ILLEGAL_ARGUMENT_ERROR will be set to status
>>>>>>>  * during the ulb_build() call.
>>>>>>>  *
>>>>>>>  * @param builder the builder
>>>>>>>  * @param attribute the attribute
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> U_CAPI void U_EXPORT2
>>>>>>> ulb_addUnicodeLocaleAttribute(ULocaleBuilder builder, const char* attribute);
>>>>>>> 
>>>>>>> /**
>>>>>>>  * Removes a unicode locale attribute, if present, otherwise has no
>>>>>>>  * effect.  The attribute must not be empty string and must be well-formed
>>>>>>>  * or U_ILLEGAL_ARGUMENT_ERROR will be set to status during the ulb_build() call.
>>>>>>>  *
>>>>>>>  * <p>Attribute comparison for removal is case-insensitive.
>>>>>>>  *
>>>>>>>  * @param builder the builder
>>>>>>>  * @param attribute the attribute
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> U_CAPI void U_EXPORT2
>>>>>>> ulb_removeUnicodeLocaleAttribute(ULocaleBuilder builder, const char* attribute);
>>>>>>> 
>>>>>>> /**
>>>>>>>  * Resets the builder to its initial, empty state.
>>>>>>>  * <p>This method clears the internal UErrorCode.
>>>>>>>  *
>>>>>>>  * @param builder the builder
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> U_CAPI void U_EXPORT2
>>>>>>> ulb_clear(ULocaleBuilder builder);
>>>>>>> 
>>>>>>> /**
>>>>>>>  * Resets the extensions to their initial, empty state.
>>>>>>>  * Language, script, region and variant are unchanged.
>>>>>>>  *
>>>>>>>  * @param builder the builder
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> U_CAPI void U_EXPORT2
>>>>>>> ulb_clearExtensions(ULocaleBuilder builder);
>>>>>>> 
>>>>>>> /*
>>>>>>>  * Build the Locale< stirng from the fields set
>>>>>>>  * on this builder.
>>>>>>>  * If any set methods or during the ulb_build() call require memory allocation
>>>>>>>  * but fail U_MEMORY_ALLOCATION_ERROR will be set to status.
>>>>>>>  * If any of the fields set by the setters are not well-formed, the status
>>>>>>>  * will be set to U_ILLEGAL_ARGUMENT_ERROR. The state of the builder will
>>>>>>>  * not change after the ulb_build() call and the caller is free to keep using
>>>>>>>  * the same builder to build more locales.
>>>>>>>  *
>>>>>>>  * @param builder the builder
>>>>>>>  * @param err the error code
>>>>>>>  * @return the length of the locale id in buffer
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> U_CAPI int32_t U_EXPORT2
>>>>>>> ulb_build(ULocaleBuilder builder, char* buffer, int32_t bufferCapacity, UErrorCode* err);
>>>>>>> 
>>>>>>> /**
>>>>>>>  * Sets the UErrorCode if an error occurred while recording sets.
>>>>>>>  * Preserves older error codes in the outErrorCode.
>>>>>>>  *
>>>>>>>  * @param builder the builder
>>>>>>>  * @param outErrorCode Set to an error code that occurred while setting subtags.
>>>>>>>  *                  Unchanged if there is no such error or if outErrorCode
>>>>>>>  *                  already contained an error.
>>>>>>>  * @return true if U_FAILURE(*outErrorCode)
>>>>>>>  * @draft ICU 74
>>>>>>>  */
>>>>>>> U_CAPI UBool U_EXPORT2
>>>>>>> ulb_copyErrorTo(ULocaleBuilder builder, UErrorCode *outErrorCode);
>>>>>>> 
>>>>>>> #endif  /* U_HIDE_DRAFT_API */
>>>>>>> 
>>>>>>> #endif  // __ULOCBUILDER_H__
>>>>>>> -- 
>>>>>>> Frank Yung-Fong Tang
>>>>>>> 譚永鋒 / 🌭🍊 
>>>>>>> Sr. Software Engineer 
>>>>>>> _______________________________________________
>>>>>>> icu-design mailing list
>>>>>>> icu...@li... <mailto:icu...@li...>
>>>>>>> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-design
>>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Frank Yung-Fong Tang
>>>>> 譚永鋒 / 🌭🍊 
>>>>> Sr. Software Engineer 
>>>> 
>>> 
>>> 
>>> -- 
>>> Frank Yung-Fong Tang
>>> 譚永鋒 / 🌭🍊 
>>> Sr. Software Engineer 
>>> 
>>> 
>>> -- 
>>> Frank Yung-Fong Tang
>>> 譚永鋒 / 🌭🍊 
>>> Sr. Software Engineer 
>>> 
>>> _______________________________________________
>>> icu-design mailing list
>>> icu...@li... <mailto:icu...@li...>
>>> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-design
>>> 
>> 
> 
> 
> -- 
> Frank Yung-Fong Tang
> 譚永鋒 / 🌭🍊 
> Sr. Software Engineer 

Re: [icu-design] C API for LocaleBuilder

Open Source C/C++/Java libraries from Unicode

Re: [icu-design] C API for LocaleBuilder