This list is closed, nobody may subscribe to it.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(142) |
Jun
(150) |
Jul
(250) |
Aug
(140) |
Sep
(200) |
Oct
(155) |
Nov
(176) |
Dec
(74) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(228) |
Feb
(347) |
Mar
(193) |
Apr
(73) |
May
(46) |
Jun
(90) |
Jul
(35) |
Aug
(39) |
Sep
(47) |
Oct
(91) |
Nov
(36) |
Dec
(6) |
| 2003 |
Jan
(24) |
Feb
(32) |
Mar
(33) |
Apr
(142) |
May
(55) |
Jun
(20) |
Jul
(47) |
Aug
(14) |
Sep
(43) |
Oct
(46) |
Nov
(68) |
Dec
(17) |
| 2004 |
Jan
(33) |
Feb
(21) |
Mar
(32) |
Apr
(22) |
May
(22) |
Jun
(14) |
Jul
(13) |
Aug
(23) |
Sep
(3) |
Oct
(26) |
Nov
(52) |
Dec
(24) |
| 2005 |
Jan
(16) |
Feb
(21) |
Mar
(5) |
Apr
(19) |
May
(37) |
Jun
(88) |
Jul
(17) |
Aug
(89) |
Sep
(39) |
Oct
(30) |
Nov
(30) |
Dec
(32) |
| 2006 |
Jan
(25) |
Feb
(88) |
Mar
(99) |
Apr
(86) |
May
(54) |
Jun
(57) |
Jul
(37) |
Aug
(41) |
Sep
(48) |
Oct
(30) |
Nov
(9) |
Dec
(4) |
| 2007 |
Jan
(24) |
Feb
(38) |
Mar
(15) |
Apr
(32) |
May
(24) |
Jun
(20) |
Jul
(92) |
Aug
(35) |
Sep
(14) |
Oct
(33) |
Nov
(18) |
Dec
(7) |
| 2008 |
Jan
(57) |
Feb
(7) |
Mar
(17) |
Apr
(1) |
May
(49) |
Jun
(14) |
Jul
(6) |
Aug
(5) |
Sep
(9) |
Oct
(26) |
Nov
(21) |
Dec
(8) |
| 2009 |
Jan
(22) |
Feb
(56) |
Mar
(26) |
Apr
(15) |
May
(2) |
Jun
(9) |
Jul
(21) |
Aug
(14) |
Sep
(27) |
Oct
(38) |
Nov
(31) |
Dec
(47) |
| 2010 |
Jan
(92) |
Feb
(30) |
Mar
(8) |
Apr
(45) |
May
(23) |
Jun
(28) |
Jul
(57) |
Aug
(83) |
Sep
(5) |
Oct
(14) |
Nov
(8) |
Dec
(15) |
| 2011 |
Jan
(37) |
Feb
(84) |
Mar
(89) |
Apr
(90) |
May
(19) |
Jun
(15) |
Jul
(12) |
Aug
(34) |
Sep
(58) |
Oct
(6) |
Nov
(16) |
Dec
(25) |
| 2012 |
Jan
(22) |
Feb
(57) |
Mar
(13) |
Apr
(29) |
May
(34) |
Jun
(20) |
Jul
(19) |
Aug
(12) |
Sep
(76) |
Oct
(70) |
Nov
(17) |
Dec
(10) |
| 2013 |
Jan
(47) |
Feb
(16) |
Mar
(33) |
Apr
(36) |
May
(46) |
Jun
(2) |
Jul
(10) |
Aug
(19) |
Sep
(13) |
Oct
(27) |
Nov
(34) |
Dec
(54) |
| 2014 |
Jan
(44) |
Feb
(13) |
Mar
(20) |
Apr
(49) |
May
(18) |
Jun
(15) |
Jul
(47) |
Aug
(23) |
Sep
(21) |
Oct
(11) |
Nov
(8) |
Dec
(12) |
| 2015 |
Jan
(11) |
Feb
(20) |
Mar
(5) |
Apr
(8) |
May
(5) |
Jun
(1) |
Jul
(3) |
Aug
(9) |
Sep
(21) |
Oct
(1) |
Nov
(8) |
Dec
(4) |
| 2016 |
Jan
(16) |
Feb
(7) |
Mar
(6) |
Apr
(18) |
May
(1) |
Jun
(4) |
Jul
(5) |
Aug
(17) |
Sep
(11) |
Oct
(2) |
Nov
(1) |
Dec
(6) |
| 2017 |
Jan
(14) |
Feb
(19) |
Mar
(12) |
Apr
(6) |
May
(4) |
Jun
(5) |
Jul
(16) |
Aug
(20) |
Sep
(8) |
Oct
(1) |
Nov
|
Dec
(8) |
| 2018 |
Jan
(2) |
Feb
(26) |
Mar
(22) |
Apr
(12) |
May
(23) |
Jun
(3) |
Jul
(2) |
Aug
(26) |
Sep
(5) |
Oct
(44) |
Nov
(4) |
Dec
(14) |
| 2019 |
Jan
(28) |
Feb
(15) |
Mar
(1) |
Apr
(2) |
May
(9) |
Jun
(16) |
Jul
(8) |
Aug
(14) |
Sep
|
Oct
(10) |
Nov
(16) |
Dec
(4) |
| 2020 |
Jan
(19) |
Feb
(21) |
Mar
(12) |
Apr
(7) |
May
(12) |
Jun
(10) |
Jul
(2) |
Aug
(15) |
Sep
(6) |
Oct
|
Nov
(6) |
Dec
(62) |
| 2021 |
Jan
(12) |
Feb
(39) |
Mar
|
Apr
|
May
|
Jun
(20) |
Jul
(5) |
Aug
(9) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2022 |
Jan
(7) |
Feb
(4) |
Mar
(5) |
Apr
(4) |
May
(7) |
Jun
(10) |
Jul
(6) |
Aug
(11) |
Sep
|
Oct
|
Nov
(13) |
Dec
(3) |
| 2023 |
Jan
(13) |
Feb
(22) |
Mar
(7) |
Apr
(1) |
May
(1) |
Jun
(18) |
Jul
(8) |
Aug
(16) |
Sep
(38) |
Oct
(8) |
Nov
(4) |
Dec
(6) |
| 2024 |
Jan
(8) |
Feb
(3) |
Mar
(11) |
Apr
(3) |
May
(1) |
Jun
(7) |
Jul
(2) |
Aug
(23) |
Sep
(15) |
Oct
(2) |
Nov
(4) |
Dec
(14) |
| 2025 |
Jan
(2) |
Feb
(2) |
Mar
(7) |
Apr
(4) |
May
(5) |
Jun
(4) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Tim C. <tj...@ig...> - 2025-06-16 21:48:33
|
Hello, I'm re-sending a request for feedback on function composition in MessageFormat. This is work that is necessary for completing the MessageFormat APIs and achieving full spec compliance. If you have an interest in MessageFormat, please read and comment. There's a lot to digest there, but it's important for finalizing the MessageFormat APIs. https://docs.google.com/document/d/1nIYDyaTqB6nChhvoSVxBkRfBAiPchlN4anvaAma9WRc/edit?tab=t.0 Thanks, Tim On 10/7/24 15:01, Tim Chevalier wrote: > Dear ICU team & users, > > I would like to propose the following changes to the MF2 API in the > MessageFormat 2.0 tech preview API in ICU 77. > > Some people may remember the discussion of function composition in MF2 > and the lack of definition of the concept of a "resolved value" in the > spec. Recently, the spec has changed to define this concept much more > precisely and provide clearer guidance for how to implement function > composition. > > I've created a design doc at > https://docs.google.com/document/d/1nIYDyaTqB6nChhvoSVxBkRfBAiPchlN4anvaAma9WRc/edit > > > There is a fair amount of context needed from both the MF2 spec and > implementation in order to follow this proposal, so please don't > hesitate to comment on the doc with questions and I'll try to clarify > things as much as I can. > > I would appreciate feedback before Thursday, October 17. > > Thanks, > > Tim > |
|
From: Markus S. <mar...@gm...> - 2025-06-06 20:40:48
|
+1 -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CAN49p6p9GhJF81tgSkJnEhMdLHsBGHc_k63fj3p6omoGc-Nuuw%40mail.gmail.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: Elango C. <el...@un...> - 2025-06-06 00:09:32
|
Hi everyone, This isn't a technical design proposal, but maybe people have opinions anyways. The point is that ICU needs some long overdue attention, IMO. Here's my brief proposal for what I would like to do: https://docs.google.com/document/d/1bUX93DTqihh97U7RYpmt1jlrM9d1d6Es3o0-aNCmZqY/edit?tab=t.0 Let me know if you have thoughts. -- Elango -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CAHqn%3DU0XmHQuam6UhPFMTrR2Lz98o66hwjtvJcScW6mBa7Q6rg%40mail.gmail.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: Markus S. <mar...@gm...> - 2025-06-03 18:25:32
|
Dear ICU team & users, This is new API for: ICU 78 Ticket: https://unicode-org.atlassian.net/browse/ICU-23038 Almost every Unicode release defines some new property values. ICU adds corresponding API constants which are "born @stable" (they are not @draft first for a while) so that they are immediately usable without problems. See https://blog.unicode.org/2025/05/unicode-170-beta-review-open.html and https://www.unicode.org/review/pri526/ for Unicode 17 beta. We are adding the following property value constants: C/C++ unicode/uchar.h enum UBlockCode { // New blocks in Unicode 17.0.0 /** @stable ICU 78 */ UBLOCK_BERIA_ERFE = 339, /*[16EA0]*/ /** @stable ICU 78 */ UBLOCK_CJK_UNIFIED_IDEOGRAPHS_EXTENSION_J = 340, /*[323B0]*/ /** @stable ICU 78 */ UBLOCK_CHISOI = 341, /*[16D80]*/ /** @stable ICU 78 */ UBLOCK_MISCELLANEOUS_SYMBOLS_SUPPLEMENT = 342, /*[1CEC0]*/ /** @stable ICU 78 */ UBLOCK_SHARADA_SUPPLEMENT = 343, /*[11B60]*/ /** @stable ICU 78 */ UBLOCK_SIDETIC = 344, /*[10940]*/ /** @stable ICU 78 */ UBLOCK_TAI_YO = 345, /*[1E6C0]*/ /** @stable ICU 78 */ UBLOCK_TANGUT_COMPONENTS_SUPPLEMENT = 346, /*[18D80]*/ /** @stable ICU 78 */ UBLOCK_TOLONG_SIKI = 347, /*[11DB0]*/ typedef enum UJoiningGroup { U_JG_THIN_NOON, /**< @stable ICU 78 */ typedef enum ULineBreak { /** @stable ICU 78 */ U_LB_UNAMBIGUOUS_HYPHEN = 48,/*[HH]*/ unicode/uscript.h typedef enum UScriptCode { /** @stable ICU 78 */ USCRIPT_BERIA_ERFE = 208, /* Berf */ /** @stable ICU 78 */ USCRIPT_CHISOI = 209, /* Chis */ /** @stable ICU 78 */ USCRIPT_SIDETIC = 210, /* Sidt */ /** @stable ICU 78 */ USCRIPT_TAI_YO = 211, /* Tayo */ /** @stable ICU 78 */ USCRIPT_TOLONG_SIKI = 212, /* Tols */ Java public final class UCharacter { public static final class UnicodeBlock extends Character.Subset { // New blocks in Unicode 17.0.0 /** @stable ICU 78 */ public static final int BERIA_ERFE_ID = 339; /*[16EA0]*/ /** @stable ICU 78 */ public static final int CJK_UNIFIED_IDEOGRAPHS_EXTENSION_J_ID = 340; /*[323B0]*/ /** @stable ICU 78 */ public static final int CHISOI_ID = 341; /*[16D80]*/ /** @stable ICU 78 */ public static final int MISCELLANEOUS_SYMBOLS_SUPPLEMENT_ID = 342; /*[1CEC0]*/ /** @stable ICU 78 */ public static final int SHARADA_SUPPLEMENT_ID = 343; /*[11B60]*/ /** @stable ICU 78 */ public static final int SIDETIC_ID = 344; /*[10940]*/ /** @stable ICU 78 */ public static final int TAI_YO_ID = 345; /*[1E6C0]*/ /** @stable ICU 78 */ public static final int TANGUT_COMPONENTS_SUPPLEMENT_ID = 346; /*[18D80]*/ /** @stable ICU 78 */ public static final int TOLONG_SIKI_ID = 347; /*[11DB0]*/ ... // New blocks in Unicode 17.0.0 /** @stable ICU 78 */ public static final UnicodeBlock BERIA_ERFE = new UnicodeBlock("BERIA_ERFE", BERIA_ERFE_ID); /** @stable ICU 78 */ public static final UnicodeBlock CJK_UNIFIED_IDEOGRAPHS_EXTENSION_J = new UnicodeBlock("CJK_UNIFIED_IDEOGRAPHS_EXTENSION_J", CJK_UNIFIED_IDEOGRAPHS_EXTENSION_J_ID); /** @stable ICU 78 */ public static final UnicodeBlock CHISOI = new UnicodeBlock("CHISOI", CHISOI_ID); /** @stable ICU 78 */ public static final UnicodeBlock MISCELLANEOUS_SYMBOLS_SUPPLEMENT = new UnicodeBlock("MISCELLANEOUS_SYMBOLS_SUPPLEMENT", MISCELLANEOUS_SYMBOLS_SUPPLEMENT_ID); /** @stable ICU 78 */ public static final UnicodeBlock SHARADA_SUPPLEMENT = new UnicodeBlock("SHARADA_SUPPLEMENT", SHARADA_SUPPLEMENT_ID); /** @stable ICU 78 */ public static final UnicodeBlock SIDETIC = new UnicodeBlock("SIDETIC", SIDETIC_ID); /** @stable ICU 78 */ public static final UnicodeBlock TAI_YO = new UnicodeBlock("TAI_YO", TAI_YO_ID); /** @stable ICU 78 */ public static final UnicodeBlock TANGUT_COMPONENTS_SUPPLEMENT = new UnicodeBlock("TANGUT_COMPONENTS_SUPPLEMENT", TANGUT_COMPONENTS_SUPPLEMENT_ID); /** @stable ICU 78 */ public static final UnicodeBlock TOLONG_SIKI = new UnicodeBlock("TOLONG_SIKI", TOLONG_SIKI_ID); public static interface JoiningGroup /** @stable ICU 78 */ public static final int THIN_NOON = 105; public static interface LineBreak /** @stable ICU 78 */ public static final int UNAMBIGUOUS_HYPHEN = 48; /*[HH]*/ public final class UScript { /** @stable ICU 78 */ public static final int BERIA_ERFE = 208; /* Berf */ /** @stable ICU 78 */ public static final int CHISOI = 209; /* Chis */ /** @stable ICU 78 */ public static final int SIDETIC = 210; /* Sidt */ /** @stable ICU 78 */ public static final int TAI_YO = 211; /* Tayo */ /** @stable ICU 78 */ public static final int TOLONG_SIKI = 212; /* Tols */ public final class VersionInfo { /** * Unicode 17.0 version * @stable ICU 78 */ public static final VersionInfo UNICODE_17_0; Sincerely, markus -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CAN49p6qhS2gNDB0TLbEeDyRT2j9P4T7u30NgTgyf7tiCBuPkAQ%40mail.gmail.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: Markus S. <mar...@gm...> - 2025-05-22 18:41:36
|
+1 tnx markus -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CAN49p6rbJxpvm90a2s5J%2BOghNA1cdLb%3DXuEO3ZGRWzDTwP9pJA%40mail.gmail.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: Elango C. <el...@un...> - 2025-05-22 18:25:18
|
Better yet, if you have comments, here's the ticket for discussion: https://unicode-org.atlassian.net/browse/ICU-23124 On Thu, May 22, 2025 at 11:19 AM Elango Cheran <el...@un...> wrote: > Hi everyone, > In working on the proposed Segmenter API, I've come across an > inconsistency in the type of exceptions that BreakIterator throws. > > In particular, when you call some APIs like `isBoundary(int offset)`, they > will internally first check the provided offset value using a helper method > `checkOffset(int offset, ...) > <https://github.com/unicode-org/icu/blob/b30c63d1b930610850489a67433b9c3ba55d6f43/icu4j/main/core/src/main/java/com/ibm/icu/text/RuleBasedBreakIterator.java#L530>`. > In turn, checkOffset will throw an exception if the offset is out of range > for the input string's indices. The inconsistency is that the exception it > throws is IllegalArgumentException, when it should be > IndexOutOfBoundsException. > > I am planning to change the exception type from IllegalArgumentException > to IndexOutOfBoundsException accordingly. Let me know if any of you take > exception to that (pun sort of intended, sorry). > > -- Elango > -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CAHqn%3DU18vuwJMbSiAwUc6q%3D3V%3D8aMizEjcAaZD6fnbNunbHvyA%40mail.gmail.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: Elango C. <el...@un...> - 2025-05-22 18:19:34
|
Hi everyone, In working on the proposed Segmenter API, I've come across an inconsistency in the type of exceptions that BreakIterator throws. In particular, when you call some APIs like `isBoundary(int offset)`, they will internally first check the provided offset value using a helper method `checkOffset(int offset, ...) <https://github.com/unicode-org/icu/blob/b30c63d1b930610850489a67433b9c3ba55d6f43/icu4j/main/core/src/main/java/com/ibm/icu/text/RuleBasedBreakIterator.java#L530>`. In turn, checkOffset will throw an exception if the offset is out of range for the input string's indices. The inconsistency is that the exception it throws is IllegalArgumentException, when it should be IndexOutOfBoundsException. I am planning to change the exception type from IllegalArgumentException to IndexOutOfBoundsException accordingly. Let me know if any of you take exception to that (pun sort of intended, sorry). -- Elango -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CAHqn%3DU0qUvQK%2Bit3VuLf6Kaz60Fh89ySjLs0vV0EmxNwj-FYDw%40mail.gmail.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: Markus S. <mar...@gm...> - 2025-05-21 16:46:49
|
On Wed, May 21, 2025 at 9:17 AM Robin Leroy <egg...@un...> wrote:
> Le mar. 11 mars 2025 à 01:23, Markus Scherer <mar...@gm...> a
> écrit :
>
>> /**
>>
>> * Result of decoding a minimal Unicode code unit sequence.
>>
>> * Returned from non-validating Unicode string code point iterators.
>>
>> * Base class for class CodeUnits which is returned from validating
>> iterators.
>>
>> *
>>
>> * @tparam CP32 Code point type: UChar32 (=int32_t) or char32_t or
>> uint32_t;
>>
>> * should be signed if UTF_BEHAVIOR_NEGATIVE
>>
>> * @tparam UnitIter An iterator (often a pointer) that returns a code
>> unit type:
>>
>> * UTF-8: char or char8_t or uint8_t;
>>
>> * UTF-16: char16_t or uint16_t or (on Windows) wchar_t
>>
>> * @see UnsafeUTFIterator
>>
>> * @see UnsafeUTFStringCodePoints
>>
>> * @draft ICU 78
>>
>> */
>>
>> template<typename CP32, typename UnitIter, typename = void>
>>
>> class UnsafeCodeUnits {
>>
>> public:
>>
>> // […]
>>
>> /**
>>
>> * @return the Unicode code point decoded from the code unit sequence.
>>
>> * If the sequence is ill-formed and the iterator validates,
>>
>> * then this is a replacement value according to the iterator‘s
>>
>> * UTFIllFormedBehavior template parameter.
>>
>> * @draft ICU 78
>>
>> */
>>
>> UChar32 codePoint() const { return c; }
>>
>
> This should be *CP32*.
>
Oh my :-(
Yes! Good catch, the whole point of the *typename CP32* template parameter
is to store *and return* a type chosen by the call site, to fit with their
code.
We need to fix that.
tnx
markus
>
--
You received this message because you are subscribed to the Google Groups "icu-design" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un....
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CAN49p6o-hGoGCa3b8LAZxE2ArF4Urv0Xn7Uz8mmE74h5jJPV7A%40mail.gmail.com.
For more options, visit https://groups.google.com/a/unicode.org/d/optout.
|
|
From: Robin L. <egg...@un...> - 2025-05-21 16:18:07
|
Le mar. 11 mars 2025 à 01:23, Markus Scherer <mar...@gm...> a
écrit :
> /**
>
> * Result of decoding a minimal Unicode code unit sequence.
>
> * Returned from non-validating Unicode string code point iterators.
>
> * Base class for class CodeUnits which is returned from validating
> iterators.
>
> *
>
> * @tparam CP32 Code point type: UChar32 (=int32_t) or char32_t or
> uint32_t;
>
> * should be signed if UTF_BEHAVIOR_NEGATIVE
>
> * @tparam UnitIter An iterator (often a pointer) that returns a code unit
> type:
>
> * UTF-8: char or char8_t or uint8_t;
>
> * UTF-16: char16_t or uint16_t or (on Windows) wchar_t
>
> * @see UnsafeUTFIterator
>
> * @see UnsafeUTFStringCodePoints
>
> * @draft ICU 78
>
> */
>
> template<typename CP32, typename UnitIter, typename = void>
>
> class UnsafeCodeUnits {
>
> public:
>
> // […]
>
> /**
>
> * @return the Unicode code point decoded from the code unit sequence.
>
> * If the sequence is ill-formed and the iterator validates,
>
> * then this is a replacement value according to the iterator‘s
>
> * UTFIllFormedBehavior template parameter.
>
> * @draft ICU 78
>
> */
>
> UChar32 codePoint() const { return c; }
>
This should be *CP32*.
--
You received this message because you are subscribed to the Google Groups "icu-design" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un....
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CAK6dhvy-XBf65gvn8TONVxrcMWFMtM5MTFmLLMsvcCfS51Xtjw%40mail.gmail.com.
For more options, visit https://groups.google.com/a/unicode.org/d/optout.
|
|
From: Markus S. <mar...@gm...> - 2025-04-16 23:42:49
|
On Fri, Apr 11, 2025 at 4:39 PM Elango Cheran <el...@un...> wrote: > This is a little retroactive of an announcement since I just realized that > I omitted including the icu-design@ mailing list when sending out > mailings to solicit reviews and discussions on the Segmenter API proposal. > Thanks for sending this, and for working on it! Looking forward to using the new API! Now I need to review the PR... markus -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CAN49p6okjb8y_x%2Be%2Bx1brJMdYL4fmfxYT2-RDtmWLrKb8Us5nA%40mail.gmail.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: Elango C. <el...@un...> - 2025-04-11 23:40:13
|
Hi everyone, This is a little retroactive of an announcement since I just realized that I omitted including the icu-design@ mailing list when sending out mailings to solicit reviews and discussions on the Segmenter API proposal. The Segmenter API is designed to be a higher level API that provides a more modern API for segmentation by ensuring immutability of instances and isolation of iteration state across instances. This avoids a big source of complexity and bugs from BreakIterator. It also uses the Stream API from Java 8+ to represent an element sequence abstraction, that in turn contains APIs for functional programming constructs. The Segmenter API is not meant to be a 1-to-1 replacement of BreakIterator, so it does not attempt to replicate all BreakIterator APIs. Another non-goal of the Segmenter API is to maintain 100% performance parity with BreakIterator at the same time that it is wrapping BreakIterator The proposal went through rounds of discussion in the TC over the past few months and the ICU4J portion received an Approved as Amended status yesterday. - design doc <https://docs.google.com/document/d/17C02RNdwD41e-sKTHcwyM8uPJhYNvo0oNI7LBs8un58/edit?pli=1&tab=t.0#heading=h.uvxziwtzsmo4> - PR #3237 <https://github.com/unicode-org/icu/pull/3237> - I sent this out for review just today There are a couple of implicit issues of discussion that didn't get discussed in the TC but are present in the implementation PR branch: 1. Should the `Segment` class be a top-level class, or an inner class of `Segments`? - The implementation PR has moved the `Segment` class as well as all of the other inner classes used for implementation from being inner classes within `Segments` to being top level classes. - Having top level classes allows a clearer delineation of which classes are public or not. An inner class of an interface might be default public, but only the `Segment` class needs to be public (and of course, given that it is a part of the return type in the signature of the public APIs), whereas all of the iteration implementation-specific classes should be kept private. 2. Is it okay to create a new package segment in the Java package hierarchy for Segmenter API classes, and where to put it? - The implementation PR has created a package `com.ibm.icu.segmenter` - Even though `BreakIterator` exists in `com.ibm.icu.text`, that package is a hodgepodge of all types of classes that were created a long time ago. - Newer APIs have started to create packages with a specific focus, like `com.ibm.icu.message2` for the new MessageFormatter that implements MF2. The PR follows this pattern. The following are the public API signatures public interface Segmenter { Segments segment(CharSequence s); } public interface Segments { /** * Returns a {@code Stream} of the {@code CharSequence}s for all of the segments in the source * sequence. Start from the beginning of the sequence and iterate forwards until the end. * @return a {@code Stream} of all {@code Segments} in the source sequence. */ Stream<CharSequence> subSequences(); /** * Returns the segment that contains index {@code i}. Containment is inclusive of the start index * and exclusive of the limit index. * * <p>Specifically, the containing segment is defined as the segment with start {@code s} and * limit {@code l} such that {@code s ≤ i < l}.</p> * @param i index in the input {@code CharSequence} to the {@code Segmenter} * @throws IllegalArgumentException if {@code i} is less than 0 or greater than the length of the * input {@code CharSequence} to the {@code Segmenter} * @return A segment that either starts at or contains index {@code i} */ Segment segmentAt(int i); /** * Returns a {@code Stream} of all {@code Segment}s in the source sequence. Start with the first * and iterate forwards until the end of the sequence. * * <p>This is equivalent to {@code segmentsFrom(0)}.</p> * @return a {@code Stream} of all {@code Segments} in the source sequence. */ Stream<Segment> segments(); /** * Returns a {@code Stream} of all {@code Segment}s in the source sequence where all segment limits * {@code l} satisfy {@code i < l}. Iteration moves forwards. * * <p>This means that the first segment in the stream is the same * as what is returned by {@code segmentAt(i)}.</p> * * <p>The word "from" is used here to mean "at or after", with the semantics of "at" for a * {@code Segment} defined by {@link #segmentAt(int)}}. We cannot describe the segments all as * being "after" since the first segment might contain {@code i} in the middle, meaning that * in the forward direction, its start position precedes {@code i}.</p> * * <p>{@code segmentsFrom} and {@link #segmentsBefore(int)} create a partitioning of the space of * all {@code Segment}s.</p> * @param i index in the input {@code CharSequence} to the {@code Segmenter} * @return a {@code Stream} of all {@code Segment}s at or after {@code i} */ Stream<Segment> segmentsFrom(int i); /** * Returns whether offset {@code i} is a segmentation boundary. Throws an exception when * {@code i} is not a valid index position for the source sequence. * @param i index in the input {@code CharSequence} to the {@code Segmenter} * @throws IllegalArgumentException if {@code i} is less than 0 or greater than the length of the * input {@code CharSequence} to the {@code Segmenter} * @return Returns whether offset {@code i} is a segmentation boundary. */ boolean isBoundary(int i); /** * Returns all segmentation boundaries, starting from the beginning and moving forwards. * * <p><b>Note:</b> {@code boundaries() != boundariesAfter(0)}. * This difference naturally results from the strict inequality condition in boundariesAfter, * and the fact that 0 is the first boundary returned from the start of an input sequence.</p> * @return An {@code IntStream} of all segmentation boundaries, starting at the first * boundary with index 0, and moving forwards in the input sequence. */ IntStream boundaries(); /** * Returns all segmentation boundaries after the provided index. Iteration moves forwards. * @param i index in the input {@code CharSequence} to the {@code Segmenter} * @return An {@code IntStream} of all boundaries {@code b} such that {@code b > i} */ IntStream boundariesAfter(int i); /** * Returns all segmentation boundaries on or before the provided index. Iteration moves backwards. * * <p>The phrase "back from" is used to indicate both that: 1) boundaries are "on or before" the * input index; 2) the direction of iteration is backwards (towards the beginning). * "on or before" indicates that the result set is {@code b} where {@code b ≤ i}, which is a weak * inequality, while "before" might suggest the strict inequality {@code b < i}.</p> * * <p>{@code boundariesBackFrom} and {@link #boundariesAfter(int)} create a partitioning of the * space of all boundaries.</p> * @param i index in the input {@code CharSequence} to the {@code Segmenter} * @return An {@code IntStream} of all boundaries {@code b} such that {@code b ≤ i} */ IntStream boundariesBackFrom(int i); class Segment { public final int start; public final int limit; public final int ruleStatus = 0; /** * Return the subsequence represented by this {@code Segment} * @return a new {@code CharSequence} object that is the subsequence represented by this * {@code Segment}. */ public CharSequence getSubSequence() { ... } } } public class LocalizedSegmenter implements Segmenter { @Override public Segments segment(CharSequence s) { ... } public static Builder builder() { return new Builder(); } public enum SegmentationType { GRAPHEME_CLUSTER, WORD, LINE, SENTENCE, } public static class Builder { public Builder setLocale(ULocale locale) { ... } public Builder setLocale(Locale locale) { ... } public Builder setSegmentationType(SegmentationType segmentationType) { ... } public Segmenter build() { ... } } } public class RuleBasedSegmenter implements Segmenter { @Override public Segments segment(CharSequence s) { ... } public static Builder builder() { return new Builder(); } public static class Builder { public Builder setRules(String rules) { ... } public Segmenter build() { ... } } } -- Elango -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CAHqn%3DU18DFx_3pi1A_UPC9LFOjTrdQthgCBucQCCsUGiV09sbg%40mail.gmail.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: Markus S. <mar...@gm...> - 2025-04-05 00:35:28
|
Update:
Between sending this proposal and the ICU-TC review, Robin and I had made
the following changes.
With these changes, the TC approved the proposal.
(And requested some non-API-signature changes which I made.)
a) Changed [Unsafe]Code*Units* function data() to begin() and end().
This fits, because they return iterators, not generally pointers.
And it makes the CodeUnits more like an actual C++ "range" of code units.
class UnsafeCodeUnits {
// Post-proposal change:
// Renamed data() to begin() (because it returns a UnitIter which need
not be a pointer)
// and add the corresponding end().
/**
* @return the start of the minimal Unicode code unit sequence.
* Only enabled if UnitIter is a (multi-pass) forward_iterator or
better.
* @draft ICU 78
*/
UnitIter data begin() const { return p; }
/**
* @return the limit (exclusive end) of the minimal Unicode code unit
sequence.
* Only enabled if UnitIter is a (multi-pass) forward_iterator or
better.
* @draft ICU 78
*/
UnitIter end() const { return limit_; }
b) We changed the return types of the [Unsafe]UTFCode*Points* begin() and
end() functions to opaque "auto", making the exact types implementation
details, in case we need to change them later.
class UTFStringCodePoints {
// Post-proposal change: (twice here, twice in
UnsafeUTFStringCodePoints)
// Make the begin()/end() return types opaque.
// Returns a UTFIterator<CP32, behavior, UnitIter> where the UnitIter
may vary;
// it may be a const Unit * or a basic_string_view<Unit>::iterator.
/** @draft ICU 78 */
UTFIterator<CP32, behavior, const Unit *> auto begin() const {
/** @draft ICU 78 */
UTFIterator<CP32, behavior, const Unit *> auto end() const {
class UnsafeUTFStringCodePoints {
/** @draft ICU 78 */
UnsafeUTFIterator<CP32, const Unit *> auto begin() const {
/** @draft ICU 78 */
UnsafeUTFIterator<CP32, const Unit *> auto end() const {
markus
--
You received this message because you are subscribed to the Google Groups "icu-design" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un....
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CAN49p6qEfvV6sS11pTMr1pY4W8uMaOgYVS6AMpSD0QULP51PhA%40mail.gmail.com.
For more options, visit https://groups.google.com/a/unicode.org/d/optout.
|
|
From: Markus S. <mar...@gm...> - 2025-04-05 00:11:40
|
Addendum:
(The TC has not seen this yet.)
While writing more test code, I found two problems that require API
additions and changes.
Please let me know if you disagree.
First:
A C++ forward_iterator is required to be default-constructible (have a
default constructor).
Therefore, I have added:
class UTFIterator {
/**
* Default constructor. Makes a non-functional iterator.
*
* @draft ICU 78
*/
U_FORCE_INLINE UTFIterator()
class UnsafeUTFIterator {
/**
* Default constructor. Makes a non-functional iterator.
*
* @draft ICU 78
*/
U_FORCE_INLINE UnsafeUTFIterator()
Second:
I found that
template<typename CP32, UTFIllFormedBehavior behavior, typename
StringView>
auto utfStringCodePoints(StringView s)
and
template<typename CP32, typename StringView>
auto unsafeUTFStringCodePoints(StringView s)
did not work when the StringView was not literally a
std::basic_string_view<CharType>.
(I should have known better...)
In order for these convenience functions to work with other inputs –
std::string variants, string literals, UnicodeString – I had to replace
each of these two functions with five overloads, without the StringView
template parameter:
template<typename CP32, UTFIllFormedBehavior behavior>
auto utfStringCodePoints(std::string_view s) {
template<typename CP32, UTFIllFormedBehavior behavior>
auto utfStringCodePoints(std::u16string_view s) {
template<typename CP32, UTFIllFormedBehavior behavior>
auto utfStringCodePoints(std::u32string_view s) {
template<typename CP32, UTFIllFormedBehavior behavior>
auto utfStringCodePoints(std::u8string_view s) {
template<typename CP32, UTFIllFormedBehavior behavior>
auto utfStringCodePoints(std::wstring_view s) {
and
template<typename CP32>
auto unsafeUTFStringCodePoints(std::string_view s) {
template<typename CP32>
auto unsafeUTFStringCodePoints(std::u16string_view s) {
template<typename CP32>
auto unsafeUTFStringCodePoints(std::u32string_view s) {
template<typename CP32>
auto unsafeUTFStringCodePoints(std::u8string_view s) {
template<typename CP32>
auto unsafeUTFStringCodePoints(std::wstring_view s) {
Sincerely,
markus
--
You received this message because you are subscribed to the Google Groups "icu-design" group.
To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un....
To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CAN49p6qXfLajA7_%3DA1nze%2BF3da9h0w%2BvVjOJ4RYuiA_kQ6c13w%40mail.gmail.com.
For more options, visit https://groups.google.com/a/unicode.org/d/optout.
|
|
From: 'Frank T. (譚永鋒)' v. icu-d. <icu...@un...> - 2025-03-27 18:46:31
|
After the discussion about avoiding multiple inheritance constraints in the API in another discussion this morning, I retract this proposal and will do more research and propose a better solution later. Sorry about that. On Wed, Mar 26, 2025 at 10:33 PM Shane Carr <sf...@go...> wrote: > Now that we (finally) have decent interop with java.time, which works with > some non-gregorian calendars already, putting more work into Calendar seems > like it might not be fruitful. Maybe in C++ we can interop more nicely with > absl time or similar. > > On Wed, Mar 26, 2025 at 4:50 PM Rich Gillam <ric...@ap...> > wrote: > >> Frank— >> >> Why is this important? Is this a performance issue, or a thread safety >> issue, or a developer-friendliness issue, or something else? >> >> And if we’re going to undertake a major redesign of the Calendar >> interface, might it make more sense to introduce a whole new Calendar class >> (Calendar2 or something) and leave the old one alone? That’d give you a >> lot more freedom to design a better API. >> >> What it seems like a lot of other APIs do is to separate a bag-of-fields >> class from the thing that does the actual calculations. Then the >> calculating class could be thread safe and the bag of fields could be >> pretty lightweight. But that’d be hard (or impossible) to do while >> maintaining backward compatibility with the Calendar class we have now. >> >> —Rich >> >> On Mar 25, 2025, at 5:59 PM, 'Frank Tang (譚永鋒)' via icu-design < >> icu...@un...> wrote: >> >> Dear ICU teams >> >> I would like to propose the following API for: ICU 78 >> Please provide feedback by: next Wednesday, 2025-04-02 >> Designated API reviewer: Shane >> >> I would like to propose the following changes to Calendar API >> The purpose is to add builder class and immutable interface for Calendar. >> Currently, the calendar object is mutable and could not be shared across >> thread, w/ immutable interface, the object can be passed across thread. >> Also, there are too many features in the Calendar API so I break down four >> different interfaces to cover specific usage. >> >> The prototype is in >> https://github.com/unicode-org/icu/pull/3452 >> >> Ticket >> https://unicode-org.atlassian.net/browse/ICU-22993 >> >> The added builder class >> >> >> >> Here is the proposed change to the public API >> >> >> diff --git a/icu4c/source/i18n/unicode/calendar.h b/icu4c/source/i18n/unicode/calendar.h >> index 4499e281f9c5..20363fcc2ae5 100644 >> --- a/icu4c/source/i18n/unicode/calendar.h >> +++ b/icu4c/source/i18n/unicode/calendar.h >> @@ -56,6 +56,271 @@ typedef int32_t UFieldResolutionTable[12][8]; >> >> class BasicTimeZone; >> class CharString; >> + >> +/** >> + * The WeekRules interface in ICU defines the logic for week-related >> + * calculations in different calendar systems. It manages parameters like the >> + * first day of the week and the minimum days in the first week, supporting >> + * various regional and international week numbering conventions, including the >> + * ISO 8601 standard. This class works with the Calendar class, enabling >> + * customization and adherence to specific week-related rules. >> + */ >> +class U_I18N_API WeekRules { >> + public: >> + /** >> + * Gets what the first day of the week is; e.g., Sunday in US, Monday in France. >> + * >> + * @param status error code >> + * @return The first day of the week. >> + */ >> + virtual UCalendarDaysOfWeek getFirstDayOfWeek(UErrorCode &status) const = 0; >> + >> + /** >> + * Gets what the minimal days required in the first week of the year are; e.g., if >> + * the first week is defined as one that contains the first day of the first month >> + * of a year, getMinimalDaysInFirstWeek returns 1. If the minimal days required must >> + * be a full week, getMinimalDaysInFirstWeek returns 7. >> + * >> + * @return The minimal days required in the first week of the year. >> + */ >> + virtual uint8_t getMinimalDaysInFirstWeek() const = 0; >> + >> + /** >> + * Returns whether the given day of the week is a weekday, a weekend day, >> + * or a day that transitions from one to the other, for the locale and >> + * calendar system associated with this Calendar (the locale's region is >> + * often the most determinant factor). If a transition occurs at midnight, >> + * then the days before and after the transition will have the >> + * type UCAL_WEEKDAY or UCAL_WEEKEND. If a transition occurs at a time >> + * other than midnight, then the day of the transition will have >> + * the type UCAL_WEEKEND_ONSET or UCAL_WEEKEND_CEASE. In this case, the >> + * method getWeekendTransition() will return the point of >> + * transition. >> + * @param dayOfWeek The day of the week whose type is desired (UCAL_SUNDAY..UCAL_SATURDAY). >> + * @param status The error code for the operation. >> + * @return The UCalendarWeekdayType for the day of the week. >> + */ >> + virtual UCalendarWeekdayType getDayOfWeekType(UCalendarDaysOfWeek dayOfWeek, UErrorCode &status) const = 0; >> + >> + /** >> + * Returns the time during the day at which the weekend begins or ends in >> + * this calendar system. If getDayOfWeekType() returns UCAL_WEEKEND_ONSET >> + * for the specified dayOfWeek, return the time at which the weekend begins. >> + * If getDayOfWeekType() returns UCAL_WEEKEND_CEASE for the specified dayOfWeek, >> + * return the time at which the weekend ends. If getDayOfWeekType() returns >> + * some other UCalendarWeekdayType for the specified dayOfWeek, is it an error condition >> + * (U_ILLEGAL_ARGUMENT_ERROR). >> + * @param dayOfWeek The day of the week for which the weekend transition time is >> + * desired (UCAL_SUNDAY..UCAL_SATURDAY). >> + * @param status The error code for the operation. >> + * @return The milliseconds after midnight at which the weekend begins or ends. >> + */ >> + virtual int32_t getWeekendTransition(UCalendarDaysOfWeek dayOfWeek, UErrorCode &status) const = 0; >> + >> + /** >> + * Returns true if the given UDate is in the weekend in >> + * this calendar system. >> + * @param date The UDate in question. >> + * @param status The error code for the operation. >> + * @return true if the given UDate is in the weekend in >> + * this calendar system, false otherwise. >> + */ >> + virtual UBool isWeekend(UDate date, UErrorCode &status) const = 0; >> +}; >> + >> +/** >> + * DateFieldRange interface defines permissible boundaries for date/time >> + * components (e.g., month: 1-12). This ensures data integrity within the ICU >> + * library by preventing invalid dates/times during formatting/parsing. It's >> + * also useful for developers when iterating through date/time ranges (e.g., >> + * generating schedules). Associated with constants like DAY_OF_MONTH, it >> + * provides a structured way to manage date/time component constraints. >> + */ >> +class U_I18N_API DateFieldRange { >> + public: >> + /** >> + * Gets the minimum value for the given time field. e.g., for Gregorian >> + * DAY_OF_MONTH, 1. >> + * >> + * @param field The given time field. >> + * @return The minimum value for the given time field. >> + */ >> + virtual int32_t getMinimum(UCalendarDateFields field) const = 0; >> + >> + /** >> + * Gets the maximum value for the given time field. e.g. for Gregorian DAY_OF_MONTH, >> + * 31. >> + * >> + * @param field The given time field. >> + * @return The maximum value for the given time field. >> + */ >> + virtual int32_t getMaximum(UCalendarDateFields field) const = 0; >> + >> + /** >> + * Gets the highest minimum value for the given field if varies. Otherwise same as >> + * getMinimum(). For Gregorian, no difference. >> + * >> + * @param field The given time field. >> + * @return The highest minimum value for the given time field. >> + */ >> + virtual int32_t getGreatestMinimum(UCalendarDateFields field) const = 0; >> + >> + /** >> + * Gets the lowest maximum value for the given field if varies. Otherwise same as >> + * getMaximum(). e.g., for Gregorian DAY_OF_MONTH, 28. >> + * >> + * @param field The given time field. >> + * @return The lowest maximum value for the given time field. >> + */ >> + virtual int32_t getLeastMaximum(UCalendarDateFields field) const = 0; >> + >> + /** >> + * Return the minimum value that this field could have, given the current date. >> + * For the Gregorian calendar, this is the same as getMinimum() and getGreatestMinimum(). >> + * >> + * The version of this function on Calendar uses an iterative algorithm to determine the >> + * actual minimum value for the field. There is almost always a more efficient way to >> + * accomplish this (in most cases, you can simply return getMinimum()). GregorianCalendar >> + * overrides this function with a more efficient implementation. >> + * >> + * @param field the field to determine the minimum of >> + * @param status Fill-in parameter which receives the status of this operation. >> + * @return the minimum of the given field for the current date of this Calendar >> + */ >> + virtual int32_t getActualMinimum(UCalendarDateFields field, UErrorCode& status) const = 0; >> + >> + /** >> + * Return the maximum value that this field could have, given the current date. >> + * For example, with the date "Feb 3, 1997" and the DAY_OF_MONTH field, the actual >> + * maximum would be 28; for "Feb 3, 1996" it s 29. Similarly for a Hebrew calendar, >> + * for some years the actual maximum for MONTH is 12, and for others 13. >> + * >> + * The version of this function on Calendar uses an iterative algorithm to determine the >> + * actual maximum value for the field. There is almost always a more efficient way to >> + * accomplish this (in most cases, you can simply return getMaximum()). GregorianCalendar >> + * overrides this function with a more efficient implementation. >> + * >> + * @param field the field to determine the maximum of >> + * @param status Fill-in parameter which receives the status of this operation. >> + * @return the maximum of the given field for the current date of this Calendar >> + */ >> + virtual int32_t getActualMaximum(UCalendarDateFields field, UErrorCode& status) const = 0; >> + >> +}; >> + >> +/** >> + * The CalendarFieldAccessor class provides an interface to get individual >> + * components (year, month, day, etc.) of a Calendar object. This improves code >> + * maintainability and flexibility. >> + */ >> +class U_I18N_API CalendarFieldAccessor { >> + public: >> + /** >> + * Gets the value for a given time field. >> + * >> + * @param field The given time field. >> + * @param status Fill-in parameter which receives the status of the operation. >> + * @return The value for the given time field, or zero if the field is unset, >> + * and set() has been called for any other field. >> + */ >> + virtual int32_t get(UCalendarDateFields field, UErrorCode& status) const = 0; >> + >> + /** >> + * Returns true if this Calendar's current date-time is in the weekend in >> + * this calendar system. >> + * @return true if this Calendar's current date-time is in the weekend in >> + * this calendar system, false otherwise. >> + */ >> + virtual UBool isWeekend() const = 0; >> + >> + /** >> + * Returns true if the date is in a leap year. Recalculate the current time >> + * field values if the time value has been changed by a call to * setTime(). >> + * This method is semantically const, but may alter the object in memory. >> + * A "leap year" is a year that contains more days than other years (for >> + * solar or lunar calendars) or more months than other years (for lunisolar >> + * calendars like Hebrew or Chinese), as defined in the ECMAScript Temporal >> + * proposal. >> + * >> + * @param status ICU Error Code >> + * @return True if the date in the fields is in a Temporal proposal >> + * defined leap year. False otherwise. >> + */ >> + virtual bool inTemporalLeapYear(UErrorCode& status) const = 0; >> + >> + /** >> + * Gets The Temporal monthCode value corresponding to the month for the date. >> + * The value is a string identifier that starts with the literal grapheme >> + * "M" followed by two graphemes representing the zero-padded month number >> + * of the current month in a normal (non-leap) year and suffixed by an >> + * optional literal grapheme "L" if this is a leap month in a lunisolar >> + * calendar. The 25 possible values are "M01" .. "M13" and "M01L" .. "M12L". >> + * For the Hebrew calendar, the values are "M01" .. "M12" for non-leap year, and >> + * "M01" .. "M05", "M05L", "M06" .. "M12" for leap year. >> + * For the Chinese calendar, the values are "M01" .. "M12" for non-leap year and >> + * in leap year with another monthCode in "M01L" .. "M12L". >> + * For Coptic and Ethiopian calendar, the Temporal monthCode values for any >> + * years are "M01" to "M13". >> + * >> + * @param status ICU Error Code >> + * @return One of 25 possible strings in {"M01".."M13", "M01L".."M12L"}. >> + */ >> + virtual const char* getTemporalMonthCode(UErrorCode& status) const = 0; >> + >> + /** >> + * Queries if the current date for this Calendar is in Daylight Savings Time. >> + * >> + * @param status Fill-in parameter which receives the status of this operation. >> + * @return True if the current date for this Calendar is in Daylight Savings Time, >> + * false, otherwise. >> + */ >> + virtual UBool inDaylightTime(UErrorCode& status) const = 0; >> + >> + /** >> + * Gets this Calendar's time as milliseconds. May involve recalculation of time due >> + * to previous calls to set time field values. The time specified is non-local UTC >> + * (GMT) time. Although this method is const, this object may actually be changed >> + * (semantically const). >> + * >> + * @param status Output param set to success/failure code on exit. If any value >> + * previously set in the time field is invalid or restricted by >> + * leniency, this will be set to an error status. >> + * @return The current time in UTC (GMT) time, or zero if the operation >> + * failed. >> + * @stable ICU 2.0 >> + */ >> + virtual UDate getTime(UErrorCode& status) const = 0; >> +}; >> + >> +/** >> + * The CenturyContext class provides a framework for interpreting year values >> + * that are not fully specified with a century, such as a two-digit year. This >> + * class addresses the ambiguity of two-digit years by providing context, such >> + * as a default century or a range of years for interpretation. It is utilized >> + * during date parsing and formatting to ensure accurate conversion between >> + * textual representations of dates and the internal Calendar representation, >> + * particularly when dealing with formats where the century might be omitted. >> + */ >> +class U_I18N_API CenturyContext { >> + public: >> + /** >> + * @return true if this calendar has a default century (i.e. 03 -> 2003) >> + * @internal >> + */ >> + virtual UBool haveDefaultCentury() const = 0; >> + >> + /** >> + * @return the start of the default century, as a UDate >> + * @internal >> + */ >> + virtual UDate defaultCenturyStart() const = 0; >> + /** >> + * @return the beginning year of the default century, as a year >> + * @internal >> + */ >> + virtual int32_t defaultCenturyStartYear() const = 0; >> +}; >> + >> /** >> * `Calendar` is an abstract base class for converting between >> * a `UDate` object and a set of integer fields such as >> @@ -187,7 +452,11 @@ class CharString; >> * >> * @stable ICU 2.0 >> */ >> -class U_I18N_API Calendar : public UObject { >> +class U_I18N_API Calendar : public UObject, >> + public CenturyContext, >> + public WeekRules, >> + public DateFieldRange, >> + public CalendarFieldAccessor { >> public: >> #ifndef U_FORCE_HIDE_DEPRECATED_API >> /** >> @@ -2413,23 +2682,6 @@ class U_I18N_API Calendar : public UObject { >> friend class DefaultCalendarFactory; >> #endif /* !UCONFIG_NO_SERVICE */ >> >> - /** >> - * @return true if this calendar has a default century (i.e. 03 -> 2003) >> - * @internal >> - */ >> - virtual UBool haveDefaultCentury() const = 0; >> - >> - /** >> - * @return the start of the default century, as a UDate >> - * @internal >> - */ >> - virtual UDate defaultCenturyStart() const = 0; >> - /** >> - * @return the beginning year of the default century, as a year >> - * @internal >> - */ >> - virtual int32_t defaultCenturyStartYear() const = 0; >> - >> /** Get the locale for this calendar object. You can choose between valid and actual locale. >> * @param type type of the locale we're looking for (valid or actual) >> * @param status error code for the operation >> @@ -2509,6 +2761,214 @@ class U_I18N_API Calendar : public UObject { >> #endif /* U_HIDE_INTERNAL_API */ >> }; >> >> +/** >> + * Provides a builder pattern for constructing instances of >> + * CalendarFieldAccessor,simplifying the creation and configuration of field >> + * accessors for Calendar objects. >> + */ >> +class U_I18N_API FieldAccessorBuilder : public UObject { >> + public: >> + FieldAccessorBuilder(const Locale& locale, UErrorCode &status); >> + virtual ~FieldAccessorBuilder(); >> + >> + FieldAccessorBuilder& adoptCalendar (Calendar *value, UErrorCode &status); >> + FieldAccessorBuilder& setTimeZone(const TimeZone& value, UErrorCode &status); >> + FieldAccessorBuilder& adoptTimeZone (TimeZone *value, UErrorCode &status); >> + >> + /** >> + * Sets this Calendar's current time with the given UDate. The time specified should >> + * be in non-local UTC (GMT) time. >> + * >> + * @param date The given UDate in UTC (GMT) time. >> + * @param status Output param set to success/failure code on exit. If any value >> + * set in the time field is invalid or restricted by >> + * leniency, this will be set to an error status. >> + */ >> + FieldAccessorBuilder& setTime(UDate value, UErrorCode &status); >> + >> + /** >> + * UDate Arithmetic function. Adds the specified (signed) amount of time to the given >> + * time field, based on the calendar's rules. For example, to subtract 5 days from >> + * the current time of the calendar, call add(Calendar::DATE, -5). When adding on >> + * the month or Calendar::MONTH field, other fields like date might conflict and >> + * need to be changed. For instance, adding 1 month on the date 01/31/96 will result >> + * in 02/29/96. >> + * Adding a positive value always means moving forward in time, so for the Gregorian calendar, >> + * starting with 100 BC and adding +1 to year results in 99 BC (even though this actually reduces >> + * the numeric value of the field itself). >> + * >> + * @param field Specifies which date field to modify. >> + * @param amount The amount of time to be added to the field, in the natural unit >> + * for that field (e.g., days for the day fields, hours for the hour >> + * field.) >> + * @param status Output param set to success/failure code on exit. If any value >> + * previously set in the time field is invalid or restricted by >> + * leniency, this will be set to an error status. >> + */ >> + FieldAccessorBuilder& add(UCalendarDateFields field, int32_t amount, UErrorCode& status); >> + >> + /** >> + * Time Field Rolling function. Rolls by the given amount on the given >> + * time field. For example, to roll the current date up by one day, call >> + * roll(Calendar::DATE, +1, status). When rolling on the month or >> + * Calendar::MONTH field, other fields like date might conflict and, need to be >> + * changed. For instance, rolling the month up on the date 01/31/96 will result in >> + * 02/29/96. Rolling by a positive value always means rolling forward in time (unless >> + * the limit of the field is reached, in which case it may pin or wrap), so for >> + * Gregorian calendar, starting with 100 BC and rolling the year by + 1 results in 99 BC. >> + * When eras have a definite beginning and end (as in the Chinese calendar, or as in >> + * most eras in the Japanese calendar) then rolling the year past either limit of the >> + * era will cause the year to wrap around. When eras only have a limit at one end, >> + * then attempting to roll the year past that limit will result in pinning the year >> + * at that limit. Note that for most calendars in which era 0 years move forward in >> + * time (such as Buddhist, Hebrew, or Islamic), it is possible for add or roll to >> + * result in negative years for era 0 (that is the only way to represent years before >> + * the calendar epoch). >> + * When rolling on the hour-in-day or Calendar::HOUR_OF_DAY field, it will roll the >> + * hour value in the range between 0 and 23, which is zero-based. >> + * <P> >> + * The only difference between roll() and add() is that roll() does not change >> + * the value of more significant fields when it reaches the minimum or maximum >> + * of its range, whereas add() does. >> + * >> + * @param field The time field. >> + * @param amount Indicates amount to roll. >> + * @param status Output param set to success/failure code on exit. If any value >> + * previously set in the time field is invalid, this will be set to >> + * an error status. >> + */ >> + FieldAccessorBuilder& roll(UCalendarDateFields field, int32_t amount, UErrorCode& status); >> + >> + /** >> + * Specifies whether or not date/time interpretation is to be lenient. With lenient >> + * interpretation, a date such as "February 942, 1996" will be treated as being >> + * equivalent to the 941st day after February 1, 1996. With strict interpretation, >> + * such dates will cause an error when computing time from the time field values >> + * representing the dates. >> + * >> + * @param lenient True specifies date/time interpretation to be lenient. >> + */ >> + FieldAccessorBuilder& setLenient(UBool lenient, UErrorCode& status); >> + >> + /** >> + * Sets the behavior for handling wall time repeating multiple times >> + * at negative time zone offset transitions. For example, 1:30 AM on >> + * November 6, 2011 in US Eastern time (America/New_York) occurs twice; >> + * 1:30 AM EDT, then 1:30 AM EST one hour later. When <code>UCAL_WALLTIME_FIRST</code> >> + * is used, the wall time 1:30AM in this example will be interpreted as 1:30 AM EDT >> + * (first occurrence). When <code>UCAL_WALLTIME_LAST</code> is used, it will be >> + * interpreted as 1:30 AM EST (last occurrence). The default value is >> + * <code>UCAL_WALLTIME_LAST</code>. >> + * <p> >> + * <b>Note:</b>When <code>UCAL_WALLTIME_NEXT_VALID</code> is not a valid >> + * option for this. When the argument is neither <code>UCAL_WALLTIME_FIRST</code> >> + * nor <code>UCAL_WALLTIME_LAST</code>, this method has no effect and will keep >> + * the current setting. >> + * >> + * @param option the behavior for handling repeating wall time, either >> + * <code>UCAL_WALLTIME_FIRST</code> or <code>UCAL_WALLTIME_LAST</code>. >> + * @see #getRepeatedWallTimeOption >> + */ >> + FieldAccessorBuilder& setRepeatedWallTimeOption(UCalendarWallTimeOption option, UErrorCode& status); >> + >> + /** >> + * Sets the behavior for handling skipped wall time at positive time zone offset >> + * transitions. For example, 2:30 AM on March 13, 2011 in US Eastern time (America/New_York) >> + * does not exist because the wall time jump from 1:59 AM EST to 3:00 AM EDT. When >> + * <code>UCAL_WALLTIME_FIRST</code> is used, 2:30 AM is interpreted as 30 minutes before 3:00 AM >> + * EDT, therefore, it will be resolved as 1:30 AM EST. When <code>UCAL_WALLTIME_LAST</code> >> + * is used, 2:30 AM is interpreted as 31 minutes after 1:59 AM EST, therefore, it will be >> + * resolved as 3:30 AM EDT. When <code>UCAL_WALLTIME_NEXT_VALID</code> is used, 2:30 AM will >> + * be resolved as next valid wall time, that is 3:00 AM EDT. The default value is >> + * <code>UCAL_WALLTIME_LAST</code>. >> + * <p> >> + * <b>Note:</b>This option is effective only when this calendar is lenient. >> + * When the calendar is strict, such non-existing wall time will cause an error. >> + * >> + * @param option the behavior for handling skipped wall time at positive time zone >> + * offset transitions, one of <code>UCAL_WALLTIME_FIRST</code>, <code>UCAL_WALLTIME_LAST</code> and >> + * <code>UCAL_WALLTIME_NEXT_VALID</code>. >> + * @see #getSkippedWallTimeOption >> + */ >> + FieldAccessorBuilder& setSkippedWallTimeOption(UCalendarWallTimeOption option, UErrorCode& status); >> + >> + /** >> + * Sets what the first day of the week is; e.g., Sunday in US, Monday in France. >> + * >> + * @param value The given first day of the week. >> + */ >> + FieldAccessorBuilder& setFirstDayOfWeek(UCalendarDaysOfWeek value, UErrorCode& status); >> + >> + /** >> + * Sets what the minimal days required in the first week of the year are; For >> + * example, if the first week is defined as one that contains the first day of the >> + * first month of a year, call the method with value 1. If it must be a full week, >> + * use value 7. >> + * >> + * @param value The given minimal days required in the first week of the year. >> + */ >> + FieldAccessorBuilder& setMinimalDaysInFirstWeek(uint8_t value, UErrorCode& status); >> + >> + /** >> + * Sets the given time field with the given value. >> + * >> + * @param field The given time field. >> + * @param value The value to be set for the given time field. >> + */ >> + FieldAccessorBuilder& set(UCalendarDateFields field, int32_t value, UErrorCode& status); >> + >> + /** >> + * Clears the values of all the time fields, making them both unset and assigning >> + * them a value of zero. The field values will be determined during the next >> + * resolving of time into time fields. >> + */ >> + FieldAccessorBuilder& clear(UErrorCode& status); >> + >> + /** >> + * Clears the value in the given time field, both making it unset and assigning it a >> + * value of zero. This field value will be determined during the next resolving of >> + * time into time fields. Clearing UCAL_ORDINAL_MONTH or UCAL_MONTH will >> + * clear both fields. >> + * >> + * @param field The time field to be cleared. >> + */ >> + FieldAccessorBuilder& clear(UCalendarDateFields field, UErrorCode& status); >> + >> + /** >> + * Sets The Temporal monthCode which is a string identifier that starts >> + * with the literal grapheme "M" followed by two graphemes representing >> + * the zero-padded month number of the current month in a normal >> + * (non-leap) year and suffixed by an optional literal grapheme "L" if this >> + * is a leap month in a lunisolar calendar. The 25 possible values are >> + * "M01" .. "M13" and "M01L" .. "M12L". For Hebrew calendar, the values are >> + * "M01" .. "M12" for non-leap years, and "M01" .. "M05", "M05L", "M06" >> + * .. "M12" for leap year. >> + * For the Chinese calendar, the values are "M01" .. "M12" for non-leap year and >> + * in leap year with another monthCode in "M01L" .. "M12L". >> + * For Coptic and Ethiopian calendar, the Temporal monthCode values for any >> + * years are "M01" to "M13". >> + * >> + * @param temporalMonth The value to be set for temporal monthCode. >> + * @param status ICU Error Code >> + */ >> + FieldAccessorBuilder& setTemporalMonthCode(const char* temporalMonth, UErrorCode& status); >> + >> + /** >> + * Sets the GregorianCalendar change date. This is the point when the switch from >> + * Julian dates to Gregorian dates occurred. Default is 00:00:00 local time, October >> + * 15, 1582. Previous to this time and date will be Julian dates. >> + * >> + * @param date The given Gregorian cutover date. >> + * @param status Output param set to success/failure code on exit. >> + */ >> + FieldAccessorBuilder& setGregorianChange(UDate date, UErrorCode& status); >> + >> + CalendarFieldAccessor* buildFieldAccessor(UErrorCode& status) const; >> + >> + private: >> + LocalPointer<Calendar> fCalendar; >> +}; >> + >> // ------------------------------------- >> >> inline Calendar* >> >> >> -- >> Frank Yung-Fong Tang >> 譚永鋒 / 🌭🍊 >> Sr. Software Engineer >> >> -- >> You received this message because you are subscribed to the Google Groups >> "icu-design" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to icu...@un.... >> To view this discussion visit >> https://groups.google.com/a/unicode.org/d/msgid/icu-design/CA%2B7fzPGQXv4UpVBsoqqROk7wsws6MW%2B%3DCR4t37C-OTJoDAezoQ%40mail.gmail.com >> <https://groups.google.com/a/unicode.org/d/msgid/icu-design/CA%2B7fzPGQXv4UpVBsoqqROk7wsws6MW%2B%3DCR4t37C-OTJoDAezoQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/a/unicode.org/d/optout. >> >> >> -- Frank Yung-Fong Tang 譚永鋒 / 🌭🍊 Sr. Software Engineer -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CA%2B7fzPFS6EkZX9uY9tDwt%3DUTkKhB5AjzupEY7sp1ZD4d-MvxnA%40mail.gmail.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: 'Shane C. v. icu-d. <icu...@un...> - 2025-03-27 05:34:11
|
Now that we (finally) have decent interop with java.time, which works with some non-gregorian calendars already, putting more work into Calendar seems like it might not be fruitful. Maybe in C++ we can interop more nicely with absl time or similar. On Wed, Mar 26, 2025 at 4:50 PM Rich Gillam <ric...@ap...> wrote: > Frank— > > Why is this important? Is this a performance issue, or a thread safety > issue, or a developer-friendliness issue, or something else? > > And if we’re going to undertake a major redesign of the Calendar > interface, might it make more sense to introduce a whole new Calendar class > (Calendar2 or something) and leave the old one alone? That’d give you a > lot more freedom to design a better API. > > What it seems like a lot of other APIs do is to separate a bag-of-fields > class from the thing that does the actual calculations. Then the > calculating class could be thread safe and the bag of fields could be > pretty lightweight. But that’d be hard (or impossible) to do while > maintaining backward compatibility with the Calendar class we have now. > > —Rich > > On Mar 25, 2025, at 5:59 PM, 'Frank Tang (譚永鋒)' via icu-design < > icu...@un...> wrote: > > Dear ICU teams > > I would like to propose the following API for: ICU 78 > Please provide feedback by: next Wednesday, 2025-04-02 > Designated API reviewer: Shane > > I would like to propose the following changes to Calendar API > The purpose is to add builder class and immutable interface for Calendar. > Currently, the calendar object is mutable and could not be shared across > thread, w/ immutable interface, the object can be passed across thread. > Also, there are too many features in the Calendar API so I break down four > different interfaces to cover specific usage. > > The prototype is in > https://github.com/unicode-org/icu/pull/3452 > > Ticket > https://unicode-org.atlassian.net/browse/ICU-22993 > > The added builder class > > > > Here is the proposed change to the public API > > > diff --git a/icu4c/source/i18n/unicode/calendar.h b/icu4c/source/i18n/unicode/calendar.h > index 4499e281f9c5..20363fcc2ae5 100644 > --- a/icu4c/source/i18n/unicode/calendar.h > +++ b/icu4c/source/i18n/unicode/calendar.h > @@ -56,6 +56,271 @@ typedef int32_t UFieldResolutionTable[12][8]; > > class BasicTimeZone; > class CharString; > + > +/** > + * The WeekRules interface in ICU defines the logic for week-related > + * calculations in different calendar systems. It manages parameters like the > + * first day of the week and the minimum days in the first week, supporting > + * various regional and international week numbering conventions, including the > + * ISO 8601 standard. This class works with the Calendar class, enabling > + * customization and adherence to specific week-related rules. > + */ > +class U_I18N_API WeekRules { > + public: > + /** > + * Gets what the first day of the week is; e.g., Sunday in US, Monday in France. > + * > + * @param status error code > + * @return The first day of the week. > + */ > + virtual UCalendarDaysOfWeek getFirstDayOfWeek(UErrorCode &status) const = 0; > + > + /** > + * Gets what the minimal days required in the first week of the year are; e.g., if > + * the first week is defined as one that contains the first day of the first month > + * of a year, getMinimalDaysInFirstWeek returns 1. If the minimal days required must > + * be a full week, getMinimalDaysInFirstWeek returns 7. > + * > + * @return The minimal days required in the first week of the year. > + */ > + virtual uint8_t getMinimalDaysInFirstWeek() const = 0; > + > + /** > + * Returns whether the given day of the week is a weekday, a weekend day, > + * or a day that transitions from one to the other, for the locale and > + * calendar system associated with this Calendar (the locale's region is > + * often the most determinant factor). If a transition occurs at midnight, > + * then the days before and after the transition will have the > + * type UCAL_WEEKDAY or UCAL_WEEKEND. If a transition occurs at a time > + * other than midnight, then the day of the transition will have > + * the type UCAL_WEEKEND_ONSET or UCAL_WEEKEND_CEASE. In this case, the > + * method getWeekendTransition() will return the point of > + * transition. > + * @param dayOfWeek The day of the week whose type is desired (UCAL_SUNDAY..UCAL_SATURDAY). > + * @param status The error code for the operation. > + * @return The UCalendarWeekdayType for the day of the week. > + */ > + virtual UCalendarWeekdayType getDayOfWeekType(UCalendarDaysOfWeek dayOfWeek, UErrorCode &status) const = 0; > + > + /** > + * Returns the time during the day at which the weekend begins or ends in > + * this calendar system. If getDayOfWeekType() returns UCAL_WEEKEND_ONSET > + * for the specified dayOfWeek, return the time at which the weekend begins. > + * If getDayOfWeekType() returns UCAL_WEEKEND_CEASE for the specified dayOfWeek, > + * return the time at which the weekend ends. If getDayOfWeekType() returns > + * some other UCalendarWeekdayType for the specified dayOfWeek, is it an error condition > + * (U_ILLEGAL_ARGUMENT_ERROR). > + * @param dayOfWeek The day of the week for which the weekend transition time is > + * desired (UCAL_SUNDAY..UCAL_SATURDAY). > + * @param status The error code for the operation. > + * @return The milliseconds after midnight at which the weekend begins or ends. > + */ > + virtual int32_t getWeekendTransition(UCalendarDaysOfWeek dayOfWeek, UErrorCode &status) const = 0; > + > + /** > + * Returns true if the given UDate is in the weekend in > + * this calendar system. > + * @param date The UDate in question. > + * @param status The error code for the operation. > + * @return true if the given UDate is in the weekend in > + * this calendar system, false otherwise. > + */ > + virtual UBool isWeekend(UDate date, UErrorCode &status) const = 0; > +}; > + > +/** > + * DateFieldRange interface defines permissible boundaries for date/time > + * components (e.g., month: 1-12). This ensures data integrity within the ICU > + * library by preventing invalid dates/times during formatting/parsing. It's > + * also useful for developers when iterating through date/time ranges (e.g., > + * generating schedules). Associated with constants like DAY_OF_MONTH, it > + * provides a structured way to manage date/time component constraints. > + */ > +class U_I18N_API DateFieldRange { > + public: > + /** > + * Gets the minimum value for the given time field. e.g., for Gregorian > + * DAY_OF_MONTH, 1. > + * > + * @param field The given time field. > + * @return The minimum value for the given time field. > + */ > + virtual int32_t getMinimum(UCalendarDateFields field) const = 0; > + > + /** > + * Gets the maximum value for the given time field. e.g. for Gregorian DAY_OF_MONTH, > + * 31. > + * > + * @param field The given time field. > + * @return The maximum value for the given time field. > + */ > + virtual int32_t getMaximum(UCalendarDateFields field) const = 0; > + > + /** > + * Gets the highest minimum value for the given field if varies. Otherwise same as > + * getMinimum(). For Gregorian, no difference. > + * > + * @param field The given time field. > + * @return The highest minimum value for the given time field. > + */ > + virtual int32_t getGreatestMinimum(UCalendarDateFields field) const = 0; > + > + /** > + * Gets the lowest maximum value for the given field if varies. Otherwise same as > + * getMaximum(). e.g., for Gregorian DAY_OF_MONTH, 28. > + * > + * @param field The given time field. > + * @return The lowest maximum value for the given time field. > + */ > + virtual int32_t getLeastMaximum(UCalendarDateFields field) const = 0; > + > + /** > + * Return the minimum value that this field could have, given the current date. > + * For the Gregorian calendar, this is the same as getMinimum() and getGreatestMinimum(). > + * > + * The version of this function on Calendar uses an iterative algorithm to determine the > + * actual minimum value for the field. There is almost always a more efficient way to > + * accomplish this (in most cases, you can simply return getMinimum()). GregorianCalendar > + * overrides this function with a more efficient implementation. > + * > + * @param field the field to determine the minimum of > + * @param status Fill-in parameter which receives the status of this operation. > + * @return the minimum of the given field for the current date of this Calendar > + */ > + virtual int32_t getActualMinimum(UCalendarDateFields field, UErrorCode& status) const = 0; > + > + /** > + * Return the maximum value that this field could have, given the current date. > + * For example, with the date "Feb 3, 1997" and the DAY_OF_MONTH field, the actual > + * maximum would be 28; for "Feb 3, 1996" it s 29. Similarly for a Hebrew calendar, > + * for some years the actual maximum for MONTH is 12, and for others 13. > + * > + * The version of this function on Calendar uses an iterative algorithm to determine the > + * actual maximum value for the field. There is almost always a more efficient way to > + * accomplish this (in most cases, you can simply return getMaximum()). GregorianCalendar > + * overrides this function with a more efficient implementation. > + * > + * @param field the field to determine the maximum of > + * @param status Fill-in parameter which receives the status of this operation. > + * @return the maximum of the given field for the current date of this Calendar > + */ > + virtual int32_t getActualMaximum(UCalendarDateFields field, UErrorCode& status) const = 0; > + > +}; > + > +/** > + * The CalendarFieldAccessor class provides an interface to get individual > + * components (year, month, day, etc.) of a Calendar object. This improves code > + * maintainability and flexibility. > + */ > +class U_I18N_API CalendarFieldAccessor { > + public: > + /** > + * Gets the value for a given time field. > + * > + * @param field The given time field. > + * @param status Fill-in parameter which receives the status of the operation. > + * @return The value for the given time field, or zero if the field is unset, > + * and set() has been called for any other field. > + */ > + virtual int32_t get(UCalendarDateFields field, UErrorCode& status) const = 0; > + > + /** > + * Returns true if this Calendar's current date-time is in the weekend in > + * this calendar system. > + * @return true if this Calendar's current date-time is in the weekend in > + * this calendar system, false otherwise. > + */ > + virtual UBool isWeekend() const = 0; > + > + /** > + * Returns true if the date is in a leap year. Recalculate the current time > + * field values if the time value has been changed by a call to * setTime(). > + * This method is semantically const, but may alter the object in memory. > + * A "leap year" is a year that contains more days than other years (for > + * solar or lunar calendars) or more months than other years (for lunisolar > + * calendars like Hebrew or Chinese), as defined in the ECMAScript Temporal > + * proposal. > + * > + * @param status ICU Error Code > + * @return True if the date in the fields is in a Temporal proposal > + * defined leap year. False otherwise. > + */ > + virtual bool inTemporalLeapYear(UErrorCode& status) const = 0; > + > + /** > + * Gets The Temporal monthCode value corresponding to the month for the date. > + * The value is a string identifier that starts with the literal grapheme > + * "M" followed by two graphemes representing the zero-padded month number > + * of the current month in a normal (non-leap) year and suffixed by an > + * optional literal grapheme "L" if this is a leap month in a lunisolar > + * calendar. The 25 possible values are "M01" .. "M13" and "M01L" .. "M12L". > + * For the Hebrew calendar, the values are "M01" .. "M12" for non-leap year, and > + * "M01" .. "M05", "M05L", "M06" .. "M12" for leap year. > + * For the Chinese calendar, the values are "M01" .. "M12" for non-leap year and > + * in leap year with another monthCode in "M01L" .. "M12L". > + * For Coptic and Ethiopian calendar, the Temporal monthCode values for any > + * years are "M01" to "M13". > + * > + * @param status ICU Error Code > + * @return One of 25 possible strings in {"M01".."M13", "M01L".."M12L"}. > + */ > + virtual const char* getTemporalMonthCode(UErrorCode& status) const = 0; > + > + /** > + * Queries if the current date for this Calendar is in Daylight Savings Time. > + * > + * @param status Fill-in parameter which receives the status of this operation. > + * @return True if the current date for this Calendar is in Daylight Savings Time, > + * false, otherwise. > + */ > + virtual UBool inDaylightTime(UErrorCode& status) const = 0; > + > + /** > + * Gets this Calendar's time as milliseconds. May involve recalculation of time due > + * to previous calls to set time field values. The time specified is non-local UTC > + * (GMT) time. Although this method is const, this object may actually be changed > + * (semantically const). > + * > + * @param status Output param set to success/failure code on exit. If any value > + * previously set in the time field is invalid or restricted by > + * leniency, this will be set to an error status. > + * @return The current time in UTC (GMT) time, or zero if the operation > + * failed. > + * @stable ICU 2.0 > + */ > + virtual UDate getTime(UErrorCode& status) const = 0; > +}; > + > +/** > + * The CenturyContext class provides a framework for interpreting year values > + * that are not fully specified with a century, such as a two-digit year. This > + * class addresses the ambiguity of two-digit years by providing context, such > + * as a default century or a range of years for interpretation. It is utilized > + * during date parsing and formatting to ensure accurate conversion between > + * textual representations of dates and the internal Calendar representation, > + * particularly when dealing with formats where the century might be omitted. > + */ > +class U_I18N_API CenturyContext { > + public: > + /** > + * @return true if this calendar has a default century (i.e. 03 -> 2003) > + * @internal > + */ > + virtual UBool haveDefaultCentury() const = 0; > + > + /** > + * @return the start of the default century, as a UDate > + * @internal > + */ > + virtual UDate defaultCenturyStart() const = 0; > + /** > + * @return the beginning year of the default century, as a year > + * @internal > + */ > + virtual int32_t defaultCenturyStartYear() const = 0; > +}; > + > /** > * `Calendar` is an abstract base class for converting between > * a `UDate` object and a set of integer fields such as > @@ -187,7 +452,11 @@ class CharString; > * > * @stable ICU 2.0 > */ > -class U_I18N_API Calendar : public UObject { > +class U_I18N_API Calendar : public UObject, > + public CenturyContext, > + public WeekRules, > + public DateFieldRange, > + public CalendarFieldAccessor { > public: > #ifndef U_FORCE_HIDE_DEPRECATED_API > /** > @@ -2413,23 +2682,6 @@ class U_I18N_API Calendar : public UObject { > friend class DefaultCalendarFactory; > #endif /* !UCONFIG_NO_SERVICE */ > > - /** > - * @return true if this calendar has a default century (i.e. 03 -> 2003) > - * @internal > - */ > - virtual UBool haveDefaultCentury() const = 0; > - > - /** > - * @return the start of the default century, as a UDate > - * @internal > - */ > - virtual UDate defaultCenturyStart() const = 0; > - /** > - * @return the beginning year of the default century, as a year > - * @internal > - */ > - virtual int32_t defaultCenturyStartYear() const = 0; > - > /** Get the locale for this calendar object. You can choose between valid and actual locale. > * @param type type of the locale we're looking for (valid or actual) > * @param status error code for the operation > @@ -2509,6 +2761,214 @@ class U_I18N_API Calendar : public UObject { > #endif /* U_HIDE_INTERNAL_API */ > }; > > +/** > + * Provides a builder pattern for constructing instances of > + * CalendarFieldAccessor,simplifying the creation and configuration of field > + * accessors for Calendar objects. > + */ > +class U_I18N_API FieldAccessorBuilder : public UObject { > + public: > + FieldAccessorBuilder(const Locale& locale, UErrorCode &status); > + virtual ~FieldAccessorBuilder(); > + > + FieldAccessorBuilder& adoptCalendar (Calendar *value, UErrorCode &status); > + FieldAccessorBuilder& setTimeZone(const TimeZone& value, UErrorCode &status); > + FieldAccessorBuilder& adoptTimeZone (TimeZone *value, UErrorCode &status); > + > + /** > + * Sets this Calendar's current time with the given UDate. The time specified should > + * be in non-local UTC (GMT) time. > + * > + * @param date The given UDate in UTC (GMT) time. > + * @param status Output param set to success/failure code on exit. If any value > + * set in the time field is invalid or restricted by > + * leniency, this will be set to an error status. > + */ > + FieldAccessorBuilder& setTime(UDate value, UErrorCode &status); > + > + /** > + * UDate Arithmetic function. Adds the specified (signed) amount of time to the given > + * time field, based on the calendar's rules. For example, to subtract 5 days from > + * the current time of the calendar, call add(Calendar::DATE, -5). When adding on > + * the month or Calendar::MONTH field, other fields like date might conflict and > + * need to be changed. For instance, adding 1 month on the date 01/31/96 will result > + * in 02/29/96. > + * Adding a positive value always means moving forward in time, so for the Gregorian calendar, > + * starting with 100 BC and adding +1 to year results in 99 BC (even though this actually reduces > + * the numeric value of the field itself). > + * > + * @param field Specifies which date field to modify. > + * @param amount The amount of time to be added to the field, in the natural unit > + * for that field (e.g., days for the day fields, hours for the hour > + * field.) > + * @param status Output param set to success/failure code on exit. If any value > + * previously set in the time field is invalid or restricted by > + * leniency, this will be set to an error status. > + */ > + FieldAccessorBuilder& add(UCalendarDateFields field, int32_t amount, UErrorCode& status); > + > + /** > + * Time Field Rolling function. Rolls by the given amount on the given > + * time field. For example, to roll the current date up by one day, call > + * roll(Calendar::DATE, +1, status). When rolling on the month or > + * Calendar::MONTH field, other fields like date might conflict and, need to be > + * changed. For instance, rolling the month up on the date 01/31/96 will result in > + * 02/29/96. Rolling by a positive value always means rolling forward in time (unless > + * the limit of the field is reached, in which case it may pin or wrap), so for > + * Gregorian calendar, starting with 100 BC and rolling the year by + 1 results in 99 BC. > + * When eras have a definite beginning and end (as in the Chinese calendar, or as in > + * most eras in the Japanese calendar) then rolling the year past either limit of the > + * era will cause the year to wrap around. When eras only have a limit at one end, > + * then attempting to roll the year past that limit will result in pinning the year > + * at that limit. Note that for most calendars in which era 0 years move forward in > + * time (such as Buddhist, Hebrew, or Islamic), it is possible for add or roll to > + * result in negative years for era 0 (that is the only way to represent years before > + * the calendar epoch). > + * When rolling on the hour-in-day or Calendar::HOUR_OF_DAY field, it will roll the > + * hour value in the range between 0 and 23, which is zero-based. > + * <P> > + * The only difference between roll() and add() is that roll() does not change > + * the value of more significant fields when it reaches the minimum or maximum > + * of its range, whereas add() does. > + * > + * @param field The time field. > + * @param amount Indicates amount to roll. > + * @param status Output param set to success/failure code on exit. If any value > + * previously set in the time field is invalid, this will be set to > + * an error status. > + */ > + FieldAccessorBuilder& roll(UCalendarDateFields field, int32_t amount, UErrorCode& status); > + > + /** > + * Specifies whether or not date/time interpretation is to be lenient. With lenient > + * interpretation, a date such as "February 942, 1996" will be treated as being > + * equivalent to the 941st day after February 1, 1996. With strict interpretation, > + * such dates will cause an error when computing time from the time field values > + * representing the dates. > + * > + * @param lenient True specifies date/time interpretation to be lenient. > + */ > + FieldAccessorBuilder& setLenient(UBool lenient, UErrorCode& status); > + > + /** > + * Sets the behavior for handling wall time repeating multiple times > + * at negative time zone offset transitions. For example, 1:30 AM on > + * November 6, 2011 in US Eastern time (America/New_York) occurs twice; > + * 1:30 AM EDT, then 1:30 AM EST one hour later. When <code>UCAL_WALLTIME_FIRST</code> > + * is used, the wall time 1:30AM in this example will be interpreted as 1:30 AM EDT > + * (first occurrence). When <code>UCAL_WALLTIME_LAST</code> is used, it will be > + * interpreted as 1:30 AM EST (last occurrence). The default value is > + * <code>UCAL_WALLTIME_LAST</code>. > + * <p> > + * <b>Note:</b>When <code>UCAL_WALLTIME_NEXT_VALID</code> is not a valid > + * option for this. When the argument is neither <code>UCAL_WALLTIME_FIRST</code> > + * nor <code>UCAL_WALLTIME_LAST</code>, this method has no effect and will keep > + * the current setting. > + * > + * @param option the behavior for handling repeating wall time, either > + * <code>UCAL_WALLTIME_FIRST</code> or <code>UCAL_WALLTIME_LAST</code>. > + * @see #getRepeatedWallTimeOption > + */ > + FieldAccessorBuilder& setRepeatedWallTimeOption(UCalendarWallTimeOption option, UErrorCode& status); > + > + /** > + * Sets the behavior for handling skipped wall time at positive time zone offset > + * transitions. For example, 2:30 AM on March 13, 2011 in US Eastern time (America/New_York) > + * does not exist because the wall time jump from 1:59 AM EST to 3:00 AM EDT. When > + * <code>UCAL_WALLTIME_FIRST</code> is used, 2:30 AM is interpreted as 30 minutes before 3:00 AM > + * EDT, therefore, it will be resolved as 1:30 AM EST. When <code>UCAL_WALLTIME_LAST</code> > + * is used, 2:30 AM is interpreted as 31 minutes after 1:59 AM EST, therefore, it will be > + * resolved as 3:30 AM EDT. When <code>UCAL_WALLTIME_NEXT_VALID</code> is used, 2:30 AM will > + * be resolved as next valid wall time, that is 3:00 AM EDT. The default value is > + * <code>UCAL_WALLTIME_LAST</code>. > + * <p> > + * <b>Note:</b>This option is effective only when this calendar is lenient. > + * When the calendar is strict, such non-existing wall time will cause an error. > + * > + * @param option the behavior for handling skipped wall time at positive time zone > + * offset transitions, one of <code>UCAL_WALLTIME_FIRST</code>, <code>UCAL_WALLTIME_LAST</code> and > + * <code>UCAL_WALLTIME_NEXT_VALID</code>. > + * @see #getSkippedWallTimeOption > + */ > + FieldAccessorBuilder& setSkippedWallTimeOption(UCalendarWallTimeOption option, UErrorCode& status); > + > + /** > + * Sets what the first day of the week is; e.g., Sunday in US, Monday in France. > + * > + * @param value The given first day of the week. > + */ > + FieldAccessorBuilder& setFirstDayOfWeek(UCalendarDaysOfWeek value, UErrorCode& status); > + > + /** > + * Sets what the minimal days required in the first week of the year are; For > + * example, if the first week is defined as one that contains the first day of the > + * first month of a year, call the method with value 1. If it must be a full week, > + * use value 7. > + * > + * @param value The given minimal days required in the first week of the year. > + */ > + FieldAccessorBuilder& setMinimalDaysInFirstWeek(uint8_t value, UErrorCode& status); > + > + /** > + * Sets the given time field with the given value. > + * > + * @param field The given time field. > + * @param value The value to be set for the given time field. > + */ > + FieldAccessorBuilder& set(UCalendarDateFields field, int32_t value, UErrorCode& status); > + > + /** > + * Clears the values of all the time fields, making them both unset and assigning > + * them a value of zero. The field values will be determined during the next > + * resolving of time into time fields. > + */ > + FieldAccessorBuilder& clear(UErrorCode& status); > + > + /** > + * Clears the value in the given time field, both making it unset and assigning it a > + * value of zero. This field value will be determined during the next resolving of > + * time into time fields. Clearing UCAL_ORDINAL_MONTH or UCAL_MONTH will > + * clear both fields. > + * > + * @param field The time field to be cleared. > + */ > + FieldAccessorBuilder& clear(UCalendarDateFields field, UErrorCode& status); > + > + /** > + * Sets The Temporal monthCode which is a string identifier that starts > + * with the literal grapheme "M" followed by two graphemes representing > + * the zero-padded month number of the current month in a normal > + * (non-leap) year and suffixed by an optional literal grapheme "L" if this > + * is a leap month in a lunisolar calendar. The 25 possible values are > + * "M01" .. "M13" and "M01L" .. "M12L". For Hebrew calendar, the values are > + * "M01" .. "M12" for non-leap years, and "M01" .. "M05", "M05L", "M06" > + * .. "M12" for leap year. > + * For the Chinese calendar, the values are "M01" .. "M12" for non-leap year and > + * in leap year with another monthCode in "M01L" .. "M12L". > + * For Coptic and Ethiopian calendar, the Temporal monthCode values for any > + * years are "M01" to "M13". > + * > + * @param temporalMonth The value to be set for temporal monthCode. > + * @param status ICU Error Code > + */ > + FieldAccessorBuilder& setTemporalMonthCode(const char* temporalMonth, UErrorCode& status); > + > + /** > + * Sets the GregorianCalendar change date. This is the point when the switch from > + * Julian dates to Gregorian dates occurred. Default is 00:00:00 local time, October > + * 15, 1582. Previous to this time and date will be Julian dates. > + * > + * @param date The given Gregorian cutover date. > + * @param status Output param set to success/failure code on exit. > + */ > + FieldAccessorBuilder& setGregorianChange(UDate date, UErrorCode& status); > + > + CalendarFieldAccessor* buildFieldAccessor(UErrorCode& status) const; > + > + private: > + LocalPointer<Calendar> fCalendar; > +}; > + > // ------------------------------------- > > inline Calendar* > > > -- > Frank Yung-Fong Tang > 譚永鋒 / 🌭🍊 > Sr. Software Engineer > > -- > You received this message because you are subscribed to the Google Groups > "icu-design" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to icu...@un.... > To view this discussion visit > https://groups.google.com/a/unicode.org/d/msgid/icu-design/CA%2B7fzPGQXv4UpVBsoqqROk7wsws6MW%2B%3DCR4t37C-OTJoDAezoQ%40mail.gmail.com > <https://groups.google.com/a/unicode.org/d/msgid/icu-design/CA%2B7fzPGQXv4UpVBsoqqROk7wsws6MW%2B%3DCR4t37C-OTJoDAezoQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/a/unicode.org/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CABxsp%3D%3D5U%2Be7va1vYNk%2BmCNzwGog0gOTgnoAEOjD2XX6VvgbGQ%40mail.gmail.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: 'Rich G. v. icu-d. <icu...@un...> - 2025-03-26 23:50:16
|
Frank— Why is this important? Is this a performance issue, or a thread safety issue, or a developer-friendliness issue, or something else? And if we’re going to undertake a major redesign of the Calendar interface, might it make more sense to introduce a whole new Calendar class (Calendar2 or something) and leave the old one alone? That’d give you a lot more freedom to design a better API. What it seems like a lot of other APIs do is to separate a bag-of-fields class from the thing that does the actual calculations. Then the calculating class could be thread safe and the bag of fields could be pretty lightweight. But that’d be hard (or impossible) to do while maintaining backward compatibility with the Calendar class we have now. —Rich > On Mar 25, 2025, at 5:59 PM, 'Frank Tang (譚永鋒)' via icu-design <icu...@un...> wrote: > > Dear ICU teams > > I would like to propose the following API for: ICU 78 > Please provide feedback by: next Wednesday, 2025-04-02 > Designated API reviewer: Shane > > I would like to propose the following changes to Calendar API > The purpose is to add builder class and immutable interface for Calendar. Currently, the calendar object is mutable and could not be shared across thread, w/ immutable interface, the object can be passed across thread. Also, there are too many features in the Calendar API so I break down four different interfaces to cover specific usage. > > The prototype is in > https://github.com/unicode-org/icu/pull/3452 > > Ticket > https://unicode-org.atlassian.net/browse/ICU-22993 > > The added builder class > > > > Here is the proposed change to the public API > > > diff --git a/icu4c/source/i18n/unicode/calendar.h b/icu4c/source/i18n/unicode/calendar.h > index 4499e281f9c5..20363fcc2ae5 100644 > --- a/icu4c/source/i18n/unicode/calendar.h > +++ b/icu4c/source/i18n/unicode/calendar.h > @@ -56,6 +56,271 @@ typedef int32_t UFieldResolutionTable[12][8]; > > class BasicTimeZone; > class CharString; > + > +/** > + * The WeekRules interface in ICU defines the logic for week-related > + * calculations in different calendar systems. It manages parameters like the > + * first day of the week and the minimum days in the first week, supporting > + * various regional and international week numbering conventions, including the > + * ISO 8601 standard. This class works with the Calendar class, enabling > + * customization and adherence to specific week-related rules. > + */ > +class U_I18N_API WeekRules { > + public: > + /** > + * Gets what the first day of the week is; e.g., Sunday in US, Monday in France. > + * > + * @param status error code > + * @return The first day of the week. > + */ > + virtual UCalendarDaysOfWeek getFirstDayOfWeek(UErrorCode &status) const = 0; > + > + /** > + * Gets what the minimal days required in the first week of the year are; e.g., if > + * the first week is defined as one that contains the first day of the first month > + * of a year, getMinimalDaysInFirstWeek returns 1. If the minimal days required must > + * be a full week, getMinimalDaysInFirstWeek returns 7. > + * > + * @return The minimal days required in the first week of the year. > + */ > + virtual uint8_t getMinimalDaysInFirstWeek() const = 0; > + > + /** > + * Returns whether the given day of the week is a weekday, a weekend day, > + * or a day that transitions from one to the other, for the locale and > + * calendar system associated with this Calendar (the locale's region is > + * often the most determinant factor). If a transition occurs at midnight, > + * then the days before and after the transition will have the > + * type UCAL_WEEKDAY or UCAL_WEEKEND. If a transition occurs at a time > + * other than midnight, then the day of the transition will have > + * the type UCAL_WEEKEND_ONSET or UCAL_WEEKEND_CEASE. In this case, the > + * method getWeekendTransition() will return the point of > + * transition. > + * @param dayOfWeek The day of the week whose type is desired (UCAL_SUNDAY..UCAL_SATURDAY). > + * @param status The error code for the operation. > + * @return The UCalendarWeekdayType for the day of the week. > + */ > + virtual UCalendarWeekdayType getDayOfWeekType(UCalendarDaysOfWeek dayOfWeek, UErrorCode &status) const = 0; > + > + /** > + * Returns the time during the day at which the weekend begins or ends in > + * this calendar system. If getDayOfWeekType() returns UCAL_WEEKEND_ONSET > + * for the specified dayOfWeek, return the time at which the weekend begins. > + * If getDayOfWeekType() returns UCAL_WEEKEND_CEASE for the specified dayOfWeek, > + * return the time at which the weekend ends. If getDayOfWeekType() returns > + * some other UCalendarWeekdayType for the specified dayOfWeek, is it an error condition > + * (U_ILLEGAL_ARGUMENT_ERROR). > + * @param dayOfWeek The day of the week for which the weekend transition time is > + * desired (UCAL_SUNDAY..UCAL_SATURDAY). > + * @param status The error code for the operation. > + * @return The milliseconds after midnight at which the weekend begins or ends. > + */ > + virtual int32_t getWeekendTransition(UCalendarDaysOfWeek dayOfWeek, UErrorCode &status) const = 0; > + > + /** > + * Returns true if the given UDate is in the weekend in > + * this calendar system. > + * @param date The UDate in question. > + * @param status The error code for the operation. > + * @return true if the given UDate is in the weekend in > + * this calendar system, false otherwise. > + */ > + virtual UBool isWeekend(UDate date, UErrorCode &status) const = 0; > +}; > + > +/** > + * DateFieldRange interface defines permissible boundaries for date/time > + * components (e.g., month: 1-12). This ensures data integrity within the ICU > + * library by preventing invalid dates/times during formatting/parsing. It's > + * also useful for developers when iterating through date/time ranges (e.g., > + * generating schedules). Associated with constants like DAY_OF_MONTH, it > + * provides a structured way to manage date/time component constraints. > + */ > +class U_I18N_API DateFieldRange { > + public: > + /** > + * Gets the minimum value for the given time field. e.g., for Gregorian > + * DAY_OF_MONTH, 1. > + * > + * @param field The given time field. > + * @return The minimum value for the given time field. > + */ > + virtual int32_t getMinimum(UCalendarDateFields field) const = 0; > + > + /** > + * Gets the maximum value for the given time field. e.g. for Gregorian DAY_OF_MONTH, > + * 31. > + * > + * @param field The given time field. > + * @return The maximum value for the given time field. > + */ > + virtual int32_t getMaximum(UCalendarDateFields field) const = 0; > + > + /** > + * Gets the highest minimum value for the given field if varies. Otherwise same as > + * getMinimum(). For Gregorian, no difference. > + * > + * @param field The given time field. > + * @return The highest minimum value for the given time field. > + */ > + virtual int32_t getGreatestMinimum(UCalendarDateFields field) const = 0; > + > + /** > + * Gets the lowest maximum value for the given field if varies. Otherwise same as > + * getMaximum(). e.g., for Gregorian DAY_OF_MONTH, 28. > + * > + * @param field The given time field. > + * @return The lowest maximum value for the given time field. > + */ > + virtual int32_t getLeastMaximum(UCalendarDateFields field) const = 0; > + > + /** > + * Return the minimum value that this field could have, given the current date. > + * For the Gregorian calendar, this is the same as getMinimum() and getGreatestMinimum(). > + * > + * The version of this function on Calendar uses an iterative algorithm to determine the > + * actual minimum value for the field. There is almost always a more efficient way to > + * accomplish this (in most cases, you can simply return getMinimum()). GregorianCalendar > + * overrides this function with a more efficient implementation. > + * > + * @param field the field to determine the minimum of > + * @param status Fill-in parameter which receives the status of this operation. > + * @return the minimum of the given field for the current date of this Calendar > + */ > + virtual int32_t getActualMinimum(UCalendarDateFields field, UErrorCode& status) const = 0; > + > + /** > + * Return the maximum value that this field could have, given the current date. > + * For example, with the date "Feb 3, 1997" and the DAY_OF_MONTH field, the actual > + * maximum would be 28; for "Feb 3, 1996" it s 29. Similarly for a Hebrew calendar, > + * for some years the actual maximum for MONTH is 12, and for others 13. > + * > + * The version of this function on Calendar uses an iterative algorithm to determine the > + * actual maximum value for the field. There is almost always a more efficient way to > + * accomplish this (in most cases, you can simply return getMaximum()). GregorianCalendar > + * overrides this function with a more efficient implementation. > + * > + * @param field the field to determine the maximum of > + * @param status Fill-in parameter which receives the status of this operation. > + * @return the maximum of the given field for the current date of this Calendar > + */ > + virtual int32_t getActualMaximum(UCalendarDateFields field, UErrorCode& status) const = 0; > + > +}; > + > +/** > + * The CalendarFieldAccessor class provides an interface to get individual > + * components (year, month, day, etc.) of a Calendar object. This improves code > + * maintainability and flexibility. > + */ > +class U_I18N_API CalendarFieldAccessor { > + public: > + /** > + * Gets the value for a given time field. > + * > + * @param field The given time field. > + * @param status Fill-in parameter which receives the status of the operation. > + * @return The value for the given time field, or zero if the field is unset, > + * and set() has been called for any other field. > + */ > + virtual int32_t get(UCalendarDateFields field, UErrorCode& status) const = 0; > + > + /** > + * Returns true if this Calendar's current date-time is in the weekend in > + * this calendar system. > + * @return true if this Calendar's current date-time is in the weekend in > + * this calendar system, false otherwise. > + */ > + virtual UBool isWeekend() const = 0; > + > + /** > + * Returns true if the date is in a leap year. Recalculate the current time > + * field values if the time value has been changed by a call to * setTime(). > + * This method is semantically const, but may alter the object in memory. > + * A "leap year" is a year that contains more days than other years (for > + * solar or lunar calendars) or more months than other years (for lunisolar > + * calendars like Hebrew or Chinese), as defined in the ECMAScript Temporal > + * proposal. > + * > + * @param status ICU Error Code > + * @return True if the date in the fields is in a Temporal proposal > + * defined leap year. False otherwise. > + */ > + virtual bool inTemporalLeapYear(UErrorCode& status) const = 0; > + > + /** > + * Gets The Temporal monthCode value corresponding to the month for the date. > + * The value is a string identifier that starts with the literal grapheme > + * "M" followed by two graphemes representing the zero-padded month number > + * of the current month in a normal (non-leap) year and suffixed by an > + * optional literal grapheme "L" if this is a leap month in a lunisolar > + * calendar. The 25 possible values are "M01" .. "M13" and "M01L" .. "M12L". > + * For the Hebrew calendar, the values are "M01" .. "M12" for non-leap year, and > + * "M01" .. "M05", "M05L", "M06" .. "M12" for leap year. > + * For the Chinese calendar, the values are "M01" .. "M12" for non-leap year and > + * in leap year with another monthCode in "M01L" .. "M12L". > + * For Coptic and Ethiopian calendar, the Temporal monthCode values for any > + * years are "M01" to "M13". > + * > + * @param status ICU Error Code > + * @return One of 25 possible strings in {"M01".."M13", "M01L".."M12L"}. > + */ > + virtual const char* getTemporalMonthCode(UErrorCode& status) const = 0; > + > + /** > + * Queries if the current date for this Calendar is in Daylight Savings Time. > + * > + * @param status Fill-in parameter which receives the status of this operation. > + * @return True if the current date for this Calendar is in Daylight Savings Time, > + * false, otherwise. > + */ > + virtual UBool inDaylightTime(UErrorCode& status) const = 0; > + > + /** > + * Gets this Calendar's time as milliseconds. May involve recalculation of time due > + * to previous calls to set time field values. The time specified is non-local UTC > + * (GMT) time. Although this method is const, this object may actually be changed > + * (semantically const). > + * > + * @param status Output param set to success/failure code on exit. If any value > + * previously set in the time field is invalid or restricted by > + * leniency, this will be set to an error status. > + * @return The current time in UTC (GMT) time, or zero if the operation > + * failed. > + * @stable ICU 2.0 > + */ > + virtual UDate getTime(UErrorCode& status) const = 0; > +}; > + > +/** > + * The CenturyContext class provides a framework for interpreting year values > + * that are not fully specified with a century, such as a two-digit year. This > + * class addresses the ambiguity of two-digit years by providing context, such > + * as a default century or a range of years for interpretation. It is utilized > + * during date parsing and formatting to ensure accurate conversion between > + * textual representations of dates and the internal Calendar representation, > + * particularly when dealing with formats where the century might be omitted. > + */ > +class U_I18N_API CenturyContext { > + public: > + /** > + * @return true if this calendar has a default century (i.e. 03 -> 2003) > + * @internal > + */ > + virtual UBool haveDefaultCentury() const = 0; > + > + /** > + * @return the start of the default century, as a UDate > + * @internal > + */ > + virtual UDate defaultCenturyStart() const = 0; > + /** > + * @return the beginning year of the default century, as a year > + * @internal > + */ > + virtual int32_t defaultCenturyStartYear() const = 0; > +}; > + > /** > * `Calendar` is an abstract base class for converting between > * a `UDate` object and a set of integer fields such as > @@ -187,7 +452,11 @@ class CharString; > * > * @stable ICU 2.0 > */ > -class U_I18N_API Calendar : public UObject { > +class U_I18N_API Calendar : public UObject, > + public CenturyContext, > + public WeekRules, > + public DateFieldRange, > + public CalendarFieldAccessor { > public: > #ifndef U_FORCE_HIDE_DEPRECATED_API > /** > @@ -2413,23 +2682,6 @@ class U_I18N_API Calendar : public UObject { > friend class DefaultCalendarFactory; > #endif /* !UCONFIG_NO_SERVICE */ > > - /** > - * @return true if this calendar has a default century (i.e. 03 -> 2003) > - * @internal > - */ > - virtual UBool haveDefaultCentury() const = 0; > - > - /** > - * @return the start of the default century, as a UDate > - * @internal > - */ > - virtual UDate defaultCenturyStart() const = 0; > - /** > - * @return the beginning year of the default century, as a year > - * @internal > - */ > - virtual int32_t defaultCenturyStartYear() const = 0; > - > /** Get the locale for this calendar object. You can choose between valid and actual locale. > * @param type type of the locale we're looking for (valid or actual) > * @param status error code for the operation > @@ -2509,6 +2761,214 @@ class U_I18N_API Calendar : public UObject { > #endif /* U_HIDE_INTERNAL_API */ > }; > > +/** > + * Provides a builder pattern for constructing instances of > + * CalendarFieldAccessor,simplifying the creation and configuration of field > + * accessors for Calendar objects. > + */ > +class U_I18N_API FieldAccessorBuilder : public UObject { > + public: > + FieldAccessorBuilder(const Locale& locale, UErrorCode &status); > + virtual ~FieldAccessorBuilder(); > + > + FieldAccessorBuilder& adoptCalendar (Calendar *value, UErrorCode &status); > + FieldAccessorBuilder& setTimeZone(const TimeZone& value, UErrorCode &status); > + FieldAccessorBuilder& adoptTimeZone (TimeZone *value, UErrorCode &status); > + > + /** > + * Sets this Calendar's current time with the given UDate. The time specified should > + * be in non-local UTC (GMT) time. > + * > + * @param date The given UDate in UTC (GMT) time. > + * @param status Output param set to success/failure code on exit. If any value > + * set in the time field is invalid or restricted by > + * leniency, this will be set to an error status. > + */ > + FieldAccessorBuilder& setTime(UDate value, UErrorCode &status); > + > + /** > + * UDate Arithmetic function. Adds the specified (signed) amount of time to the given > + * time field, based on the calendar's rules. For example, to subtract 5 days from > + * the current time of the calendar, call add(Calendar::DATE, -5). When adding on > + * the month or Calendar::MONTH field, other fields like date might conflict and > + * need to be changed. For instance, adding 1 month on the date 01/31/96 will result > + * in 02/29/96. > + * Adding a positive value always means moving forward in time, so for the Gregorian calendar, > + * starting with 100 BC and adding +1 to year results in 99 BC (even though this actually reduces > + * the numeric value of the field itself). > + * > + * @param field Specifies which date field to modify. > + * @param amount The amount of time to be added to the field, in the natural unit > + * for that field (e.g., days for the day fields, hours for the hour > + * field.) > + * @param status Output param set to success/failure code on exit. If any value > + * previously set in the time field is invalid or restricted by > + * leniency, this will be set to an error status. > + */ > + FieldAccessorBuilder& add(UCalendarDateFields field, int32_t amount, UErrorCode& status); > + > + /** > + * Time Field Rolling function. Rolls by the given amount on the given > + * time field. For example, to roll the current date up by one day, call > + * roll(Calendar::DATE, +1, status). When rolling on the month or > + * Calendar::MONTH field, other fields like date might conflict and, need to be > + * changed. For instance, rolling the month up on the date 01/31/96 will result in > + * 02/29/96. Rolling by a positive value always means rolling forward in time (unless > + * the limit of the field is reached, in which case it may pin or wrap), so for > + * Gregorian calendar, starting with 100 BC and rolling the year by + 1 results in 99 BC. > + * When eras have a definite beginning and end (as in the Chinese calendar, or as in > + * most eras in the Japanese calendar) then rolling the year past either limit of the > + * era will cause the year to wrap around. When eras only have a limit at one end, > + * then attempting to roll the year past that limit will result in pinning the year > + * at that limit. Note that for most calendars in which era 0 years move forward in > + * time (such as Buddhist, Hebrew, or Islamic), it is possible for add or roll to > + * result in negative years for era 0 (that is the only way to represent years before > + * the calendar epoch). > + * When rolling on the hour-in-day or Calendar::HOUR_OF_DAY field, it will roll the > + * hour value in the range between 0 and 23, which is zero-based. > + * <P> > + * The only difference between roll() and add() is that roll() does not change > + * the value of more significant fields when it reaches the minimum or maximum > + * of its range, whereas add() does. > + * > + * @param field The time field. > + * @param amount Indicates amount to roll. > + * @param status Output param set to success/failure code on exit. If any value > + * previously set in the time field is invalid, this will be set to > + * an error status. > + */ > + FieldAccessorBuilder& roll(UCalendarDateFields field, int32_t amount, UErrorCode& status); > + > + /** > + * Specifies whether or not date/time interpretation is to be lenient. With lenient > + * interpretation, a date such as "February 942, 1996" will be treated as being > + * equivalent to the 941st day after February 1, 1996. With strict interpretation, > + * such dates will cause an error when computing time from the time field values > + * representing the dates. > + * > + * @param lenient True specifies date/time interpretation to be lenient. > + */ > + FieldAccessorBuilder& setLenient(UBool lenient, UErrorCode& status); > + > + /** > + * Sets the behavior for handling wall time repeating multiple times > + * at negative time zone offset transitions. For example, 1:30 AM on > + * November 6, 2011 in US Eastern time (America/New_York) occurs twice; > + * 1:30 AM EDT, then 1:30 AM EST one hour later. When <code>UCAL_WALLTIME_FIRST</code> > + * is used, the wall time 1:30AM in this example will be interpreted as 1:30 AM EDT > + * (first occurrence). When <code>UCAL_WALLTIME_LAST</code> is used, it will be > + * interpreted as 1:30 AM EST (last occurrence). The default value is > + * <code>UCAL_WALLTIME_LAST</code>. > + * <p> > + * <b>Note:</b>When <code>UCAL_WALLTIME_NEXT_VALID</code> is not a valid > + * option for this. When the argument is neither <code>UCAL_WALLTIME_FIRST</code> > + * nor <code>UCAL_WALLTIME_LAST</code>, this method has no effect and will keep > + * the current setting. > + * > + * @param option the behavior for handling repeating wall time, either > + * <code>UCAL_WALLTIME_FIRST</code> or <code>UCAL_WALLTIME_LAST</code>. > + * @see #getRepeatedWallTimeOption > + */ > + FieldAccessorBuilder& setRepeatedWallTimeOption(UCalendarWallTimeOption option, UErrorCode& status); > + > + /** > + * Sets the behavior for handling skipped wall time at positive time zone offset > + * transitions. For example, 2:30 AM on March 13, 2011 in US Eastern time (America/New_York) > + * does not exist because the wall time jump from 1:59 AM EST to 3:00 AM EDT. When > + * <code>UCAL_WALLTIME_FIRST</code> is used, 2:30 AM is interpreted as 30 minutes before 3:00 AM > + * EDT, therefore, it will be resolved as 1:30 AM EST. When <code>UCAL_WALLTIME_LAST</code> > + * is used, 2:30 AM is interpreted as 31 minutes after 1:59 AM EST, therefore, it will be > + * resolved as 3:30 AM EDT. When <code>UCAL_WALLTIME_NEXT_VALID</code> is used, 2:30 AM will > + * be resolved as next valid wall time, that is 3:00 AM EDT. The default value is > + * <code>UCAL_WALLTIME_LAST</code>. > + * <p> > + * <b>Note:</b>This option is effective only when this calendar is lenient. > + * When the calendar is strict, such non-existing wall time will cause an error. > + * > + * @param option the behavior for handling skipped wall time at positive time zone > + * offset transitions, one of <code>UCAL_WALLTIME_FIRST</code>, <code>UCAL_WALLTIME_LAST</code> and > + * <code>UCAL_WALLTIME_NEXT_VALID</code>. > + * @see #getSkippedWallTimeOption > + */ > + FieldAccessorBuilder& setSkippedWallTimeOption(UCalendarWallTimeOption option, UErrorCode& status); > + > + /** > + * Sets what the first day of the week is; e.g., Sunday in US, Monday in France. > + * > + * @param value The given first day of the week. > + */ > + FieldAccessorBuilder& setFirstDayOfWeek(UCalendarDaysOfWeek value, UErrorCode& status); > + > + /** > + * Sets what the minimal days required in the first week of the year are; For > + * example, if the first week is defined as one that contains the first day of the > + * first month of a year, call the method with value 1. If it must be a full week, > + * use value 7. > + * > + * @param value The given minimal days required in the first week of the year. > + */ > + FieldAccessorBuilder& setMinimalDaysInFirstWeek(uint8_t value, UErrorCode& status); > + > + /** > + * Sets the given time field with the given value. > + * > + * @param field The given time field. > + * @param value The value to be set for the given time field. > + */ > + FieldAccessorBuilder& set(UCalendarDateFields field, int32_t value, UErrorCode& status); > + > + /** > + * Clears the values of all the time fields, making them both unset and assigning > + * them a value of zero. The field values will be determined during the next > + * resolving of time into time fields. > + */ > + FieldAccessorBuilder& clear(UErrorCode& status); > + > + /** > + * Clears the value in the given time field, both making it unset and assigning it a > + * value of zero. This field value will be determined during the next resolving of > + * time into time fields. Clearing UCAL_ORDINAL_MONTH or UCAL_MONTH will > + * clear both fields. > + * > + * @param field The time field to be cleared. > + */ > + FieldAccessorBuilder& clear(UCalendarDateFields field, UErrorCode& status); > + > + /** > + * Sets The Temporal monthCode which is a string identifier that starts > + * with the literal grapheme "M" followed by two graphemes representing > + * the zero-padded month number of the current month in a normal > + * (non-leap) year and suffixed by an optional literal grapheme "L" if this > + * is a leap month in a lunisolar calendar. The 25 possible values are > + * "M01" .. "M13" and "M01L" .. "M12L". For Hebrew calendar, the values are > + * "M01" .. "M12" for non-leap years, and "M01" .. "M05", "M05L", "M06" > + * .. "M12" for leap year. > + * For the Chinese calendar, the values are "M01" .. "M12" for non-leap year and > + * in leap year with another monthCode in "M01L" .. "M12L". > + * For Coptic and Ethiopian calendar, the Temporal monthCode values for any > + * years are "M01" to "M13". > + * > + * @param temporalMonth The value to be set for temporal monthCode. > + * @param status ICU Error Code > + */ > + FieldAccessorBuilder& setTemporalMonthCode(const char* temporalMonth, UErrorCode& status); > + > + /** > + * Sets the GregorianCalendar change date. This is the point when the switch from > + * Julian dates to Gregorian dates occurred. Default is 00:00:00 local time, October > + * 15, 1582. Previous to this time and date will be Julian dates. > + * > + * @param date The given Gregorian cutover date. > + * @param status Output param set to success/failure code on exit. > + */ > + FieldAccessorBuilder& setGregorianChange(UDate date, UErrorCode& status); > + > + CalendarFieldAccessor* buildFieldAccessor(UErrorCode& status) const; > + > + private: > + LocalPointer<Calendar> fCalendar; > +}; > + > // ------------------------------------- > > inline Calendar* > > -- > Frank Yung-Fong Tang > 譚永鋒 / 🌭🍊 > Sr. Software Engineer > > -- > You received this message because you are subscribed to the Google Groups "icu-design" group. > To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un... <mailto:icu...@un...>. > To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CA%2B7fzPGQXv4UpVBsoqqROk7wsws6MW%2B%3DCR4t37C-OTJoDAezoQ%40mail.gmail.com <https://groups.google.com/a/unicode.org/d/msgid/icu-design/CA%2B7fzPGQXv4UpVBsoqqROk7wsws6MW%2B%3DCR4t37C-OTJoDAezoQ%40mail.gmail.com?utm_medium=email&utm_source=footer>. > For more options, visit https://groups.google.com/a/unicode.org/d/optout. -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/ADCAADAC-4BF1-4ABD-8D2A-8AB380A2D9C4%40apple.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: 'Frank T. (譚永鋒)' v. icu-d. <icu...@un...> - 2025-03-26 00:59:50
|
Dear ICU teams I would like to propose the following API for: ICU 78 Please provide feedback by: next Wednesday, 2025-04-02 Designated API reviewer: Shane I would like to propose the following changes to Calendar API The purpose is to add builder class and immutable interface for Calendar. Currently, the calendar object is mutable and could not be shared across thread, w/ immutable interface, the object can be passed across thread. Also, there are too many features in the Calendar API so I break down four different interfaces to cover specific usage. The prototype is in https://github.com/unicode-org/icu/pull/3452 Ticket https://unicode-org.atlassian.net/browse/ICU-22993 The added builder class Here is the proposed change to the public API diff --git a/icu4c/source/i18n/unicode/calendar.h b/icu4c/source/i18n/unicode/calendar.h index 4499e281f9c5..20363fcc2ae5 100644 --- a/icu4c/source/i18n/unicode/calendar.h +++ b/icu4c/source/i18n/unicode/calendar.h @@ -56,6 +56,271 @@ typedef int32_t UFieldResolutionTable[12][8]; class BasicTimeZone; class CharString; + +/** + * The WeekRules interface in ICU defines the logic for week-related + * calculations in different calendar systems. It manages parameters like the + * first day of the week and the minimum days in the first week, supporting + * various regional and international week numbering conventions, including the + * ISO 8601 standard. This class works with the Calendar class, enabling + * customization and adherence to specific week-related rules. + */ +class U_I18N_API WeekRules { + public: + /** + * Gets what the first day of the week is; e.g., Sunday in US, Monday in France. + * + * @param status error code + * @return The first day of the week. + */ + virtual UCalendarDaysOfWeek getFirstDayOfWeek(UErrorCode &status) const = 0; + + /** + * Gets what the minimal days required in the first week of the year are; e.g., if + * the first week is defined as one that contains the first day of the first month + * of a year, getMinimalDaysInFirstWeek returns 1. If the minimal days required must + * be a full week, getMinimalDaysInFirstWeek returns 7. + * + * @return The minimal days required in the first week of the year. + */ + virtual uint8_t getMinimalDaysInFirstWeek() const = 0; + + /** + * Returns whether the given day of the week is a weekday, a weekend day, + * or a day that transitions from one to the other, for the locale and + * calendar system associated with this Calendar (the locale's region is + * often the most determinant factor). If a transition occurs at midnight, + * then the days before and after the transition will have the + * type UCAL_WEEKDAY or UCAL_WEEKEND. If a transition occurs at a time + * other than midnight, then the day of the transition will have + * the type UCAL_WEEKEND_ONSET or UCAL_WEEKEND_CEASE. In this case, the + * method getWeekendTransition() will return the point of + * transition. + * @param dayOfWeek The day of the week whose type is desired (UCAL_SUNDAY..UCAL_SATURDAY). + * @param status The error code for the operation. + * @return The UCalendarWeekdayType for the day of the week. + */ + virtual UCalendarWeekdayType getDayOfWeekType(UCalendarDaysOfWeek dayOfWeek, UErrorCode &status) const = 0; + + /** + * Returns the time during the day at which the weekend begins or ends in + * this calendar system. If getDayOfWeekType() returns UCAL_WEEKEND_ONSET + * for the specified dayOfWeek, return the time at which the weekend begins. + * If getDayOfWeekType() returns UCAL_WEEKEND_CEASE for the specified dayOfWeek, + * return the time at which the weekend ends. If getDayOfWeekType() returns + * some other UCalendarWeekdayType for the specified dayOfWeek, is it an error condition + * (U_ILLEGAL_ARGUMENT_ERROR). + * @param dayOfWeek The day of the week for which the weekend transition time is + * desired (UCAL_SUNDAY..UCAL_SATURDAY). + * @param status The error code for the operation. + * @return The milliseconds after midnight at which the weekend begins or ends. + */ + virtual int32_t getWeekendTransition(UCalendarDaysOfWeek dayOfWeek, UErrorCode &status) const = 0; + + /** + * Returns true if the given UDate is in the weekend in + * this calendar system. + * @param date The UDate in question. + * @param status The error code for the operation. + * @return true if the given UDate is in the weekend in + * this calendar system, false otherwise. + */ + virtual UBool isWeekend(UDate date, UErrorCode &status) const = 0; +}; + +/** + * DateFieldRange interface defines permissible boundaries for date/time + * components (e.g., month: 1-12). This ensures data integrity within the ICU + * library by preventing invalid dates/times during formatting/parsing. It's + * also useful for developers when iterating through date/time ranges (e.g., + * generating schedules). Associated with constants like DAY_OF_MONTH, it + * provides a structured way to manage date/time component constraints. + */ +class U_I18N_API DateFieldRange { + public: + /** + * Gets the minimum value for the given time field. e.g., for Gregorian + * DAY_OF_MONTH, 1. + * + * @param field The given time field. + * @return The minimum value for the given time field. + */ + virtual int32_t getMinimum(UCalendarDateFields field) const = 0; + + /** + * Gets the maximum value for the given time field. e.g. for Gregorian DAY_OF_MONTH, + * 31. + * + * @param field The given time field. + * @return The maximum value for the given time field. + */ + virtual int32_t getMaximum(UCalendarDateFields field) const = 0; + + /** + * Gets the highest minimum value for the given field if varies. Otherwise same as + * getMinimum(). For Gregorian, no difference. + * + * @param field The given time field. + * @return The highest minimum value for the given time field. + */ + virtual int32_t getGreatestMinimum(UCalendarDateFields field) const = 0; + + /** + * Gets the lowest maximum value for the given field if varies. Otherwise same as + * getMaximum(). e.g., for Gregorian DAY_OF_MONTH, 28. + * + * @param field The given time field. + * @return The lowest maximum value for the given time field. + */ + virtual int32_t getLeastMaximum(UCalendarDateFields field) const = 0; + + /** + * Return the minimum value that this field could have, given the current date. + * For the Gregorian calendar, this is the same as getMinimum() and getGreatestMinimum(). + * + * The version of this function on Calendar uses an iterative algorithm to determine the + * actual minimum value for the field. There is almost always a more efficient way to + * accomplish this (in most cases, you can simply return getMinimum()). GregorianCalendar + * overrides this function with a more efficient implementation. + * + * @param field the field to determine the minimum of + * @param status Fill-in parameter which receives the status of this operation. + * @return the minimum of the given field for the current date of this Calendar + */ + virtual int32_t getActualMinimum(UCalendarDateFields field, UErrorCode& status) const = 0; + + /** + * Return the maximum value that this field could have, given the current date. + * For example, with the date "Feb 3, 1997" and the DAY_OF_MONTH field, the actual + * maximum would be 28; for "Feb 3, 1996" it s 29. Similarly for a Hebrew calendar, + * for some years the actual maximum for MONTH is 12, and for others 13. + * + * The version of this function on Calendar uses an iterative algorithm to determine the + * actual maximum value for the field. There is almost always a more efficient way to + * accomplish this (in most cases, you can simply return getMaximum()). GregorianCalendar + * overrides this function with a more efficient implementation. + * + * @param field the field to determine the maximum of + * @param status Fill-in parameter which receives the status of this operation. + * @return the maximum of the given field for the current date of this Calendar + */ + virtual int32_t getActualMaximum(UCalendarDateFields field, UErrorCode& status) const = 0; + +}; + +/** + * The CalendarFieldAccessor class provides an interface to get individual + * components (year, month, day, etc.) of a Calendar object. This improves code + * maintainability and flexibility. + */ +class U_I18N_API CalendarFieldAccessor { + public: + /** + * Gets the value for a given time field. + * + * @param field The given time field. + * @param status Fill-in parameter which receives the status of the operation. + * @return The value for the given time field, or zero if the field is unset, + * and set() has been called for any other field. + */ + virtual int32_t get(UCalendarDateFields field, UErrorCode& status) const = 0; + + /** + * Returns true if this Calendar's current date-time is in the weekend in + * this calendar system. + * @return true if this Calendar's current date-time is in the weekend in + * this calendar system, false otherwise. + */ + virtual UBool isWeekend() const = 0; + + /** + * Returns true if the date is in a leap year. Recalculate the current time + * field values if the time value has been changed by a call to * setTime(). + * This method is semantically const, but may alter the object in memory. + * A "leap year" is a year that contains more days than other years (for + * solar or lunar calendars) or more months than other years (for lunisolar + * calendars like Hebrew or Chinese), as defined in the ECMAScript Temporal + * proposal. + * + * @param status ICU Error Code + * @return True if the date in the fields is in a Temporal proposal + * defined leap year. False otherwise. + */ + virtual bool inTemporalLeapYear(UErrorCode& status) const = 0; + + /** + * Gets The Temporal monthCode value corresponding to the month for the date. + * The value is a string identifier that starts with the literal grapheme + * "M" followed by two graphemes representing the zero-padded month number + * of the current month in a normal (non-leap) year and suffixed by an + * optional literal grapheme "L" if this is a leap month in a lunisolar + * calendar. The 25 possible values are "M01" .. "M13" and "M01L" .. "M12L". + * For the Hebrew calendar, the values are "M01" .. "M12" for non-leap year, and + * "M01" .. "M05", "M05L", "M06" .. "M12" for leap year. + * For the Chinese calendar, the values are "M01" .. "M12" for non-leap year and + * in leap year with another monthCode in "M01L" .. "M12L". + * For Coptic and Ethiopian calendar, the Temporal monthCode values for any + * years are "M01" to "M13". + * + * @param status ICU Error Code + * @return One of 25 possible strings in {"M01".."M13", "M01L".."M12L"}. + */ + virtual const char* getTemporalMonthCode(UErrorCode& status) const = 0; + + /** + * Queries if the current date for this Calendar is in Daylight Savings Time. + * + * @param status Fill-in parameter which receives the status of this operation. + * @return True if the current date for this Calendar is in Daylight Savings Time, + * false, otherwise. + */ + virtual UBool inDaylightTime(UErrorCode& status) const = 0; + + /** + * Gets this Calendar's time as milliseconds. May involve recalculation of time due + * to previous calls to set time field values. The time specified is non-local UTC + * (GMT) time. Although this method is const, this object may actually be changed + * (semantically const). + * + * @param status Output param set to success/failure code on exit. If any value + * previously set in the time field is invalid or restricted by + * leniency, this will be set to an error status. + * @return The current time in UTC (GMT) time, or zero if the operation + * failed. + * @stable ICU 2.0 + */ + virtual UDate getTime(UErrorCode& status) const = 0; +}; + +/** + * The CenturyContext class provides a framework for interpreting year values + * that are not fully specified with a century, such as a two-digit year. This + * class addresses the ambiguity of two-digit years by providing context, such + * as a default century or a range of years for interpretation. It is utilized + * during date parsing and formatting to ensure accurate conversion between + * textual representations of dates and the internal Calendar representation, + * particularly when dealing with formats where the century might be omitted. + */ +class U_I18N_API CenturyContext { + public: + /** + * @return true if this calendar has a default century (i.e. 03 -> 2003) + * @internal + */ + virtual UBool haveDefaultCentury() const = 0; + + /** + * @return the start of the default century, as a UDate + * @internal + */ + virtual UDate defaultCenturyStart() const = 0; + /** + * @return the beginning year of the default century, as a year + * @internal + */ + virtual int32_t defaultCenturyStartYear() const = 0; +}; + /** * `Calendar` is an abstract base class for converting between * a `UDate` object and a set of integer fields such as @@ -187,7 +452,11 @@ class CharString; * * @stable ICU 2.0 */ -class U_I18N_API Calendar : public UObject { +class U_I18N_API Calendar : public UObject, + public CenturyContext, + public WeekRules, + public DateFieldRange, + public CalendarFieldAccessor { public: #ifndef U_FORCE_HIDE_DEPRECATED_API /** @@ -2413,23 +2682,6 @@ class U_I18N_API Calendar : public UObject { friend class DefaultCalendarFactory; #endif /* !UCONFIG_NO_SERVICE */ - /** - * @return true if this calendar has a default century (i.e. 03 -> 2003) - * @internal - */ - virtual UBool haveDefaultCentury() const = 0; - - /** - * @return the start of the default century, as a UDate - * @internal - */ - virtual UDate defaultCenturyStart() const = 0; - /** - * @return the beginning year of the default century, as a year - * @internal - */ - virtual int32_t defaultCenturyStartYear() const = 0; - /** Get the locale for this calendar object. You can choose between valid and actual locale. * @param type type of the locale we're looking for (valid or actual) * @param status error code for the operation @@ -2509,6 +2761,214 @@ class U_I18N_API Calendar : public UObject { #endif /* U_HIDE_INTERNAL_API */ }; +/** + * Provides a builder pattern for constructing instances of + * CalendarFieldAccessor,simplifying the creation and configuration of field + * accessors for Calendar objects. + */ +class U_I18N_API FieldAccessorBuilder : public UObject { + public: + FieldAccessorBuilder(const Locale& locale, UErrorCode &status); + virtual ~FieldAccessorBuilder(); + + FieldAccessorBuilder& adoptCalendar (Calendar *value, UErrorCode &status); + FieldAccessorBuilder& setTimeZone(const TimeZone& value, UErrorCode &status); + FieldAccessorBuilder& adoptTimeZone (TimeZone *value, UErrorCode &status); + + /** + * Sets this Calendar's current time with the given UDate. The time specified should + * be in non-local UTC (GMT) time. + * + * @param date The given UDate in UTC (GMT) time. + * @param status Output param set to success/failure code on exit. If any value + * set in the time field is invalid or restricted by + * leniency, this will be set to an error status. + */ + FieldAccessorBuilder& setTime(UDate value, UErrorCode &status); + + /** + * UDate Arithmetic function. Adds the specified (signed) amount of time to the given + * time field, based on the calendar's rules. For example, to subtract 5 days from + * the current time of the calendar, call add(Calendar::DATE, -5). When adding on + * the month or Calendar::MONTH field, other fields like date might conflict and + * need to be changed. For instance, adding 1 month on the date 01/31/96 will result + * in 02/29/96. + * Adding a positive value always means moving forward in time, so for the Gregorian calendar, + * starting with 100 BC and adding +1 to year results in 99 BC (even though this actually reduces + * the numeric value of the field itself). + * + * @param field Specifies which date field to modify. + * @param amount The amount of time to be added to the field, in the natural unit + * for that field (e.g., days for the day fields, hours for the hour + * field.) + * @param status Output param set to success/failure code on exit. If any value + * previously set in the time field is invalid or restricted by + * leniency, this will be set to an error status. + */ + FieldAccessorBuilder& add(UCalendarDateFields field, int32_t amount, UErrorCode& status); + + /** + * Time Field Rolling function. Rolls by the given amount on the given + * time field. For example, to roll the current date up by one day, call + * roll(Calendar::DATE, +1, status). When rolling on the month or + * Calendar::MONTH field, other fields like date might conflict and, need to be + * changed. For instance, rolling the month up on the date 01/31/96 will result in + * 02/29/96. Rolling by a positive value always means rolling forward in time (unless + * the limit of the field is reached, in which case it may pin or wrap), so for + * Gregorian calendar, starting with 100 BC and rolling the year by + 1 results in 99 BC. + * When eras have a definite beginning and end (as in the Chinese calendar, or as in + * most eras in the Japanese calendar) then rolling the year past either limit of the + * era will cause the year to wrap around. When eras only have a limit at one end, + * then attempting to roll the year past that limit will result in pinning the year + * at that limit. Note that for most calendars in which era 0 years move forward in + * time (such as Buddhist, Hebrew, or Islamic), it is possible for add or roll to + * result in negative years for era 0 (that is the only way to represent years before + * the calendar epoch). + * When rolling on the hour-in-day or Calendar::HOUR_OF_DAY field, it will roll the + * hour value in the range between 0 and 23, which is zero-based. + * <P> + * The only difference between roll() and add() is that roll() does not change + * the value of more significant fields when it reaches the minimum or maximum + * of its range, whereas add() does. + * + * @param field The time field. + * @param amount Indicates amount to roll. + * @param status Output param set to success/failure code on exit. If any value + * previously set in the time field is invalid, this will be set to + * an error status. + */ + FieldAccessorBuilder& roll(UCalendarDateFields field, int32_t amount, UErrorCode& status); + + /** + * Specifies whether or not date/time interpretation is to be lenient. With lenient + * interpretation, a date such as "February 942, 1996" will be treated as being + * equivalent to the 941st day after February 1, 1996. With strict interpretation, + * such dates will cause an error when computing time from the time field values + * representing the dates. + * + * @param lenient True specifies date/time interpretation to be lenient. + */ + FieldAccessorBuilder& setLenient(UBool lenient, UErrorCode& status); + + /** + * Sets the behavior for handling wall time repeating multiple times + * at negative time zone offset transitions. For example, 1:30 AM on + * November 6, 2011 in US Eastern time (America/New_York) occurs twice; + * 1:30 AM EDT, then 1:30 AM EST one hour later. When <code>UCAL_WALLTIME_FIRST</code> + * is used, the wall time 1:30AM in this example will be interpreted as 1:30 AM EDT + * (first occurrence). When <code>UCAL_WALLTIME_LAST</code> is used, it will be + * interpreted as 1:30 AM EST (last occurrence). The default value is + * <code>UCAL_WALLTIME_LAST</code>. + * <p> + * <b>Note:</b>When <code>UCAL_WALLTIME_NEXT_VALID</code> is not a valid + * option for this. When the argument is neither <code>UCAL_WALLTIME_FIRST</code> + * nor <code>UCAL_WALLTIME_LAST</code>, this method has no effect and will keep + * the current setting. + * + * @param option the behavior for handling repeating wall time, either + * <code>UCAL_WALLTIME_FIRST</code> or <code>UCAL_WALLTIME_LAST</code>. + * @see #getRepeatedWallTimeOption + */ + FieldAccessorBuilder& setRepeatedWallTimeOption(UCalendarWallTimeOption option, UErrorCode& status); + + /** + * Sets the behavior for handling skipped wall time at positive time zone offset + * transitions. For example, 2:30 AM on March 13, 2011 in US Eastern time (America/New_York) + * does not exist because the wall time jump from 1:59 AM EST to 3:00 AM EDT. When + * <code>UCAL_WALLTIME_FIRST</code> is used, 2:30 AM is interpreted as 30 minutes before 3:00 AM + * EDT, therefore, it will be resolved as 1:30 AM EST. When <code>UCAL_WALLTIME_LAST</code> + * is used, 2:30 AM is interpreted as 31 minutes after 1:59 AM EST, therefore, it will be + * resolved as 3:30 AM EDT. When <code>UCAL_WALLTIME_NEXT_VALID</code> is used, 2:30 AM will + * be resolved as next valid wall time, that is 3:00 AM EDT. The default value is + * <code>UCAL_WALLTIME_LAST</code>. + * <p> + * <b>Note:</b>This option is effective only when this calendar is lenient. + * When the calendar is strict, such non-existing wall time will cause an error. + * + * @param option the behavior for handling skipped wall time at positive time zone + * offset transitions, one of <code>UCAL_WALLTIME_FIRST</code>, <code>UCAL_WALLTIME_LAST</code> and + * <code>UCAL_WALLTIME_NEXT_VALID</code>. + * @see #getSkippedWallTimeOption + */ + FieldAccessorBuilder& setSkippedWallTimeOption(UCalendarWallTimeOption option, UErrorCode& status); + + /** + * Sets what the first day of the week is; e.g., Sunday in US, Monday in France. + * + * @param value The given first day of the week. + */ + FieldAccessorBuilder& setFirstDayOfWeek(UCalendarDaysOfWeek value, UErrorCode& status); + + /** + * Sets what the minimal days required in the first week of the year are; For + * example, if the first week is defined as one that contains the first day of the + * first month of a year, call the method with value 1. If it must be a full week, + * use value 7. + * + * @param value The given minimal days required in the first week of the year. + */ + FieldAccessorBuilder& setMinimalDaysInFirstWeek(uint8_t value, UErrorCode& status); + + /** + * Sets the given time field with the given value. + * + * @param field The given time field. + * @param value The value to be set for the given time field. + */ + FieldAccessorBuilder& set(UCalendarDateFields field, int32_t value, UErrorCode& status); + + /** + * Clears the values of all the time fields, making them both unset and assigning + * them a value of zero. The field values will be determined during the next + * resolving of time into time fields. + */ + FieldAccessorBuilder& clear(UErrorCode& status); + + /** + * Clears the value in the given time field, both making it unset and assigning it a + * value of zero. This field value will be determined during the next resolving of + * time into time fields. Clearing UCAL_ORDINAL_MONTH or UCAL_MONTH will + * clear both fields. + * + * @param field The time field to be cleared. + */ + FieldAccessorBuilder& clear(UCalendarDateFields field, UErrorCode& status); + + /** + * Sets The Temporal monthCode which is a string identifier that starts + * with the literal grapheme "M" followed by two graphemes representing + * the zero-padded month number of the current month in a normal + * (non-leap) year and suffixed by an optional literal grapheme "L" if this + * is a leap month in a lunisolar calendar. The 25 possible values are + * "M01" .. "M13" and "M01L" .. "M12L". For Hebrew calendar, the values are + * "M01" .. "M12" for non-leap years, and "M01" .. "M05", "M05L", "M06" + * .. "M12" for leap year. + * For the Chinese calendar, the values are "M01" .. "M12" for non-leap year and + * in leap year with another monthCode in "M01L" .. "M12L". + * For Coptic and Ethiopian calendar, the Temporal monthCode values for any + * years are "M01" to "M13". + * + * @param temporalMonth The value to be set for temporal monthCode. + * @param status ICU Error Code + */ + FieldAccessorBuilder& setTemporalMonthCode(const char* temporalMonth, UErrorCode& status); + + /** + * Sets the GregorianCalendar change date. This is the point when the switch from + * Julian dates to Gregorian dates occurred. Default is 00:00:00 local time, October + * 15, 1582. Previous to this time and date will be Julian dates. + * + * @param date The given Gregorian cutover date. + * @param status Output param set to success/failure code on exit. + */ + FieldAccessorBuilder& setGregorianChange(UDate date, UErrorCode& status); + + CalendarFieldAccessor* buildFieldAccessor(UErrorCode& status) const; + + private: + LocalPointer<Calendar> fCalendar; +}; + // ------------------------------------- inline Calendar* -- Frank Yung-Fong Tang 譚永鋒 / 🌭🍊 Sr. Software Engineer -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CA%2B7fzPGQXv4UpVBsoqqROk7wsws6MW%2B%3DCR4t37C-OTJoDAezoQ%40mail.gmail.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: 'Rich G. v. icu-d. <icu...@un...> - 2025-03-21 23:40:48
|
Definitely no objections from me… —Rich > On Mar 21, 2025, at 4:07 PM, Peter Edberg <pe...@un...> wrote: > > Dear ICU team & users, > > I would like to propose the following API change for: ICU 78 > Please provide feedback by: next Wednesday, 2025-03-26 (simple change, does not need a week) > Designated API reviewer: Rich G > Ticket: ICU-22142 <https://unicode-org.atlassian.net/browse/ICU-22142> > Background: > The mass unit “metric-ton" was introduced in CLDR 26; in CLDR 42 it was deprecated, and replaced with “tonne”. > ICU 54 used data from CLDR 26, and: > Added for C (i18n/unicode/measunit.h), createMetricTon (now @stable ICU 54). > Added for J (main/core/src/main/java/com/ibm/icu/util/MeasureUnit.java), METRIC_TON (now @stable ICU 54). > ICU 64 added for C (getXxx methods added to parallel createXxx): getMetricTon (now @stable 64). > ICU 72 used data from CLDR 42, and: > Added for C as draft: createTonne, getTonne (now @stable ICU 72). > Added for J as draft: TONNE (now @stable ICU 72). > Added a note for createMetricTon/getMetricTon/METRIC_TON that they are replaced by the methods/constants using Tonne/TONNE, and they would be deprecated in ICU 74 (when the Tonne/TONNE items would become @stable). However, that deprecation never happened. > > Proposal: We should now finally deprecate the C methods createMetricTon/getMetricTon in favor of createTonne/getTonne and deprecate the J constant METRIC_TON in favor of TONNE. > > - Peter > > > -- > You received this message because you are subscribed to the Google Groups "icu-design" group. > To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un... <mailto:icu...@un...>. > To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/FCCF7056-D20D-48D5-8D68-E76A79E2AC3E%40unicode.org <https://groups.google.com/a/unicode.org/d/msgid/icu-design/FCCF7056-D20D-48D5-8D68-E76A79E2AC3E%40unicode.org?utm_medium=email&utm_source=footer>. > For more options, visit https://groups.google.com/a/unicode.org/d/optout. > _______________________________________________ > icu-design mailing list > icu...@li... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-design -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/1AF2E4EA-BA10-49DA-870B-EFF29F4DC3DA%40apple.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: Peter E. <pe...@un...> - 2025-03-21 23:13:21
|
Dear ICU team & users, I would like to propose the following API change for: ICU 78 Please provide feedback by: next Wednesday, 2025-03-26 (simple change, does not need a week) Designated API reviewer: Rich G Ticket: ICU-22142 <https://unicode-org.atlassian.net/browse/ICU-22142> Background: The mass unit “metric-ton" was introduced in CLDR 26; in CLDR 42 it was deprecated, and replaced with “tonne”. ICU 54 used data from CLDR 26, and: Added for C (i18n/unicode/measunit.h), createMetricTon (now @stable ICU 54). Added for J (main/core/src/main/java/com/ibm/icu/util/MeasureUnit.java), METRIC_TON (now @stable ICU 54). ICU 64 added for C (getXxx methods added to parallel createXxx): getMetricTon (now @stable 64). ICU 72 used data from CLDR 42, and: Added for C as draft: createTonne, getTonne (now @stable ICU 72). Added for J as draft: TONNE (now @stable ICU 72). Added a note for createMetricTon/getMetricTon/METRIC_TON that they are replaced by the methods/constants using Tonne/TONNE, and they would be deprecated in ICU 74 (when the Tonne/TONNE items would become @stable). However, that deprecation never happened. Proposal: We should now finally deprecate the C methods createMetricTon/getMetricTon in favor of createTonne/getTonne and deprecate the J constant METRIC_TON in favor of TONNE. - Peter -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/FCCF7056-D20D-48D5-8D68-E76A79E2AC3E%40unicode.org. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: Markus S. <mar...@gm...> - 2025-03-11 00:23:51
|
Dear ICU team & users, I would like to propose the following API for: ICU 78 Please provide feedback by: next Wednesday, 2025-03-19 Designated API reviewer: Robin Ticket: ICU-23004 <https://unicode-org.atlassian.net/browse/ICU-23004> / draft pull request: icu/pull/3096 <https://github.com/unicode-org/icu/pull/3096> I would like to propose new C++ header-only APIs for iterating over the Unicode code points in a Unicode string, and more generally over the code units from a code unit iterator. These are modern C++ equivalents of some of the long-standing C macros for iterating over UTF-8 <https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/utf8_8h.html> and UTF-16 <https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/utf16_8h.html>. This C++ API also supports UTF-32. FYI: UTF-8 and UTF-16 encode code points with variable-length code unit sequences. A validating iterator needs to read and check all of the code units for one code point. When a code unit sequence is ill-formed, then the returned subsequence must be a prefix of a well-formed sequence. (Except we always return at least one code unit, so that we always progress.) (UTF-32 still has validation, but sequences always have length one.) The proposed API can read code units from a C++ input_iterator or forward_iterator or bidirectional_iterator <https://en.cppreference.com/w/cpp/iterator/iterator_tags>. The latter includes code unit pointers like const char * and const char16_t *. There is a convenience API for std::string_views. The main class is called UTFIterator. Its operator*() returns a value serving a variety of use cases: Class CodeUnits provides the code point, the start of its minimal subsequence, the number of code units, and whether they are well-formed. (All functions are declared inline. An optimizing compiler will usually omit fields that are not used, and the code for computing them.) UTFIterator has the API of a C++ STL iterator. It has template parameters for the code unit iterator type, for the code point type, and for how to handle ill-formed subsequences. std::make_reverse_iterator works for making reverse-range iterators. The convenience class UTFStringCodePoints turns a std::string_view (of variable code unit type) into a code point iteration “range” with begin()/end()/rbegin()/rend() functions. There are convenience functions utfIterator() and utfStringCodePoints() to simplify call sites; they deduce the code unit and base iterator types. For each of these classes and convenience functions, there is also an “unsafe” version, just like for the C macros. The normal versions validate the code unit sequences. The “unsafe” ones assume/require that the strings/sequences are well-formed. As a result, they yield much smaller and faster code. Sample code using U_HEADER_ONLY_NAMESPACE::utfIterator; using U_HEADER_ONLY_NAMESPACE::utfStringCodePoints; using U_HEADER_ONLY_NAMESPACE::unsafeUTFIterator; using U_HEADER_ONLY_NAMESPACE::unsafeUTFStringCodePoints; int32_t rangeLoop16(std::u16string_view s) { // We are just adding up the code points for minimal-code demonstration purposes. int32_t sum = 0; for (auto units : utfStringCodePoints<UChar32, UTF_BEHAVIOR_NEGATIVE>(s)) { sum += units.codePoint(); // < 0 if ill-formed } return sum; } int32_t loopIterPlusPlus16(std::u16string_view s) { auto range = utfStringCodePoints<char32_t, UTF_BEHAVIOR_FFFD>(s); int32_t sum = 0; for (auto iter = range.begin(), limit = range.end(); iter != limit;) { sum += (*iter++).codePoint(); // U+FFFD if ill-formed } return sum; } int32_t backwardLoop16(std::u16string_view s) { auto range = utfStringCodePoints<UChar32, UTF_BEHAVIOR_SURROGATE>(s); int32_t sum = 0; for (auto start = range.begin(), iter = range.end(); start != iter;) { sum += (*--iter).codePoint(); // surrogate code point if unpaired / ill-formed } return sum; } int32_t reverseLoop8(std::string_view s) { auto range = utfStringCodePoints<char32_t, UTF_BEHAVIOR_FFFD>(s); int32_t sum = 0; for (auto iter = range.rbegin(), limit = range.rend(); iter != limit; ++iter) { sum += iter->codePoint(); // U+FFFD if ill-formed } return sum; } int32_t countCodePoints16(std::u16string_view s) { auto range = utfStringCodePoints<UChar32, UTF_BEHAVIOR_SURROGATE>(s); return std::distance(range.begin(), range.end()); } int32_t unsafeRangeLoop16(std::u16string_view s) { int32_t sum = 0; for (auto units : unsafeUTFStringCodePoints<UChar32>(s)) { sum += units.codePoint(); } return sum; } int32_t unsafeReverseLoop8(std::string_view s) { auto range = unsafeUTFStringCodePoints<UChar32>(s); int32_t sum = 0; for (auto iter = range.rbegin(), limit = range.rend(); iter != limit; ++iter) { sum += iter->codePoint(); } return sum; } char32_t firstCodePointOrFFFD16(std::u16string_view s) { if (s.empty()) { return 0xfffd; } auto range = utfStringCodePoints<char32_t, UTF_BEHAVIOR_FFFD>(s); return range.begin()->codePoint(); } std::string_view firstSequence8(std::string_view s) { if (s.empty()) { return {}; } auto range = utfStringCodePoints<char32_t, UTF_BEHAVIOR_FFFD>(s); auto units = *(range.begin()); if (units.wellFormed()) { return units.stringView(); } else { return {}; } } Proposed public API signatures New header file: unicode/utfiterator.h // Some defined behaviors for handling ill-formed Unicode strings. typedef enum UTFIllFormedBehavior { // Returns a negative value instead of a code point. UTF_BEHAVIOR_NEGATIVE, // Returns U+FFFD Replacement Character. UTF_BEHAVIOR_FFFD, // UTF-8: Not allowed; // UTF-16: returns the unpaired surrogate; // UTF-32: returns the surrogate code point, or U+FFFD if out of range. UTF_BEHAVIOR_SURROGATE } UTFIllFormedBehavior; namespace U_HEADER_ONLY_NAMESPACE { /** * Result of decoding a minimal Unicode code unit sequence. * Returned from non-validating Unicode string code point iterators. * Base class for class CodeUnits which is returned from validating iterators. * * @tparam CP32 Code point type: UChar32 (=int32_t) or char32_t or uint32_t; * should be signed if UTF_BEHAVIOR_NEGATIVE * @tparam UnitIter An iterator (often a pointer) that returns a code unit type: * UTF-8: char or char8_t or uint8_t; * UTF-16: char16_t or uint16_t or (on Windows) wchar_t * @see UnsafeUTFIterator * @see UnsafeUTFStringCodePoints * @draft ICU 78 */ template<typename CP32, typename UnitIter, typename = void> class UnsafeCodeUnits { public: UnsafeCodeUnits(const UnsafeCodeUnits &other) = default; UnsafeCodeUnits &operator=(const UnsafeCodeUnits &other) = default; /** * @return the Unicode code point decoded from the code unit sequence. * If the sequence is ill-formed and the iterator validates, * then this is a replacement value according to the iterator‘s * UTFIllFormedBehavior template parameter. * @draft ICU 78 */ UChar32 codePoint() const { return c; } /** * @return the start of the minimal Unicode code unit sequence. * Only enabled if UnitIter is a (multi-pass) forward_iterator or better. * @draft ICU 78 */ UnitIter data() const { return p; } /** * @return the length of the minimal Unicode code unit sequence. * @draft ICU 78 */ uint8_t length() const { return len; } /** * @return a string_view of the minimal Unicode code unit sequence. * Only enabled if UnitIter is a pointer. * @draft ICU 78 */ stringView() const { }; /** * Result of validating and decoding a minimal Unicode code unit sequence. * Returned from validating Unicode string code point iterators. * Adds function wellFormed() to base class UnsafeCodeUnits. * * @tparam CP32 Code point type: UChar32 (=int32_t) or char32_t or uint32_t; * should be signed if UTF_BEHAVIOR_NEGATIVE * @tparam UnitIter An iterator (often a pointer) that returns a code unit type: * UTF-8: char or char8_t or uint8_t; * UTF-16: char16_t or uint16_t or (on Windows) wchar_t * @see UTFIterator * @see UTFStringCodePoints * @draft ICU 78 */ template<typename CP32, typename UnitIter, typename = void> class CodeUnits : public UnsafeCodeUnits<CP32, UnitIter> { public: CodeUnits(const CodeUnits &other) = default; CodeUnits &operator=(const CodeUnits &other) = default; bool wellFormed() const { return ok; } }; validating /** * Validating iterator over the code points in a Unicode string. * * The UnitIter can be * an input_iterator, a forward_iterator, or a bidirectional_iterator (including a pointer). * The UTFIterator will have the corresponding iterator_category. * * For reverse iteration, either use this iterator directly as in <code>*--iter</code> * or wrap it using std::make_reverse_iterator(iter). * * @tparam CP32 Code point type: UChar32 (=int32_t) or char32_t or uint32_t; * should be signed if UTF_BEHAVIOR_NEGATIVE * @tparam behavior How to handle ill-formed Unicode strings * @tparam UnitIter An iterator (often a pointer) that returns a code unit type: * UTF-8: char or char8_t or uint8_t; * UTF-16: char16_t or uint16_t or (on Windows) wchar_t * @draft ICU 78 */ template<typename CP32, UTFIllFormedBehavior behavior, typename UnitIter> class UTFIterator { public: // Constructor with start <= p < limit. // All of these iterators/pointers should be at code point boundaries. // Only enabled if UnitIter is a (multi-pass) forward_iterator or better. // TODO: Should we enable this only for a bidirectional_iterator? inline UTFIterator(UnitIter start, UnitIter p, UnitIter limit) : // Constructs an iterator with start=p. inline UTFIterator(UnitIter p, UnitIter limit) : // Constructs an iterator start or limit sentinel. // Requires UnitIter to be copyable. inline UTFIterator(UnitIter p) inline UTFIterator(UTFIterator &&src) noexcept = default; inline UTFIterator &operator=(UTFIterator &&src) noexcept = default; inline UTFIterator(const UTFIterator &other) = default; inline UTFIterator &operator=(const UTFIterator &other) = default; inline bool operator==(const UTFIterator &other) const { inline bool operator!=(const UTFIterator &other) const { return !operator==(other); } inline CodeUnits<CP32, UnitIter> operator*() const { /** * @return the current decoded subsequence via an opaque proxy object * so that <code>iter->codePoint()</code> etc. works. * @draft ICU 78 */ inline Proxy operator->() const { inline UTFIterator &operator++() { // pre-increment /** * @return a copy of this iterator from before the increment. * If UnitIter is a single-pass input_iterator, then this function * returns an opaque proxy object so that <code>*iter++</code> still works. * @draft ICU 78 */ inline UTFIterator operator++(int) { // post-increment // Only enabled if UnitIter is a bidirectional_iterator (including a pointer). inline UTFIterator &operator--() { // pre-decrement // Only enabled if UnitIter is a bidirectional_iterator (including a pointer). inline UTFIterator operator--(int) { // post-decrement }; /** * A C++ "range" for validating iteration over all of the code points of a Unicode string. * * @tparam CP32 Code point type: UChar32 (=int32_t) or char32_t or uint32_t; * should be signed if UTF_BEHAVIOR_NEGATIVE * @tparam behavior How to handle ill-formed Unicode strings * @tparam Unit Code unit type: * UTF-8: char or char8_t or uint8_t; * UTF-16: char16_t or uint16_t or (on Windows) wchar_t * @draft ICU 78 */ template<typename CP32, UTFIllFormedBehavior behavior, typename Unit> class UTFStringCodePoints { public: /** * Constructs a C++ "range" object over the code points in the string. * @draft ICU 78 */ UTFStringCodePoints(std::basic_string_view<Unit> s) : s(s) {} /** @draft ICU 78 */ UTFStringCodePoints(const UTFStringCodePoints &other) = default; /** @draft ICU 78 */ UTFStringCodePoints &operator=(const UTFStringCodePoints &other) = default; /** @draft ICU 78 */ UTFIterator<CP32, behavior, const Unit *> begin() const { /** @draft ICU 78 */ UTFIterator<CP32, behavior, const Unit *> end() const { /** * @return std::reverse_iterator(end()) * @draft ICU 78 */ auto rbegin() const { /** * @return std::reverse_iterator(begin()) * @draft ICU 78 */ auto rend() const { }; /** * UTFIterator factory function for start <= p < limit. * Only enabled if UnitIter is a (multi-pass) forward_iterator or better. * * @tparam CP32 Code point type: UChar32 (=int32_t) or char32_t or uint32_t * @tparam behavior How to handle ill-formed Unicode strings * @tparam UnitIter Can usually be omitted/deduced: * An iterator (often a pointer) that returns a code unit type: * UTF-8: char or char8_t or uint8_t; * UTF-16: char16_t or uint16_t or (on Windows) wchar_t * @param start start code unit iterator * @param p current-position code unit iterator * @param limit limit (exclusive-end) code unit iterator * @return a UTFIterator<CP32, behavior, UnitIter> * for the given code unit iterators or character pointers * @draft ICU 78 */ template<typename CP32, UTFIllFormedBehavior behavior, typename UnitIter> auto utfIterator(UnitIter start, UnitIter p, UnitIter limit) { /** * UTFIterator factory function for start = p < limit. * ... */ template<typename CP32, UTFIllFormedBehavior behavior, typename UnitIter> auto utfIterator(UnitIter p, UnitIter limit) { /** * UTFIterator factory function for a start or limit sentinel. * Requires UnitIter to be copyable. * ... */ template<typename CP32, UTFIllFormedBehavior behavior, typename UnitIter> auto utfIterator(UnitIter p) { /** * @tparam CP32 Code point type: UChar32 (=int32_t) or char32_t or uint32_t; * should be signed if UTF_BEHAVIOR_NEGATIVE * @tparam behavior How to handle ill-formed Unicode strings * @tparam StringView Can usually be omitted/deduced: A std::basic_string_view<Unit> * @param s input string_view * @return a UTFStringCodePoints<CP32, behavior, Unit> * for the given std::basic_string_view<Unit>, * deducing the Unit character type * @draft ICU 78 */ template<typename CP32, UTFIllFormedBehavior behavior, typename StringView> auto utfStringCodePoints(StringView s) { non-validating /** * Non-validating iterator over the code points in a Unicode string. * The string must be well-formed. * * The UnitIter can be * an input_iterator, a forward_iterator, or a bidirectional_iterator (including a pointer). * The UTFIterator will have the corresponding iterator_category. * * For reverse iteration, either use this iterator directly as in <code>*--iter</code> * or wrap it using std::make_reverse_iterator(iter). * * @tparam CP32 Code point type: UChar32 (=int32_t) or char32_t or uint32_t * @tparam UnitIter An iterator (often a pointer) that returns a code unit type: * UTF-8: char or char8_t or uint8_t; * UTF-16: char16_t or uint16_t or (on Windows) wchar_t * @draft ICU 78 */ template<typename CP32, typename UnitIter> class UnsafeUTFIterator { public: inline UnsafeUTFIterator(UnitIter p) : p_(p), units_(0, 0, p) {} inline UnsafeUTFIterator(UnsafeUTFIterator &&src) noexcept = default; inline UnsafeUTFIterator &operator=(UnsafeUTFIterator &&src) noexcept = default; inline UnsafeUTFIterator(const UnsafeUTFIterator &other) = default; inline UnsafeUTFIterator &operator=(const UnsafeUTFIterator &other) = default; inline bool operator==(const UnsafeUTFIterator &other) const { inline bool operator!=(const UnsafeUTFIterator &other) const { return !operator==(other); } inline UnsafeCodeUnits<CP32, UnitIter> operator*() const { /** * @return the current decoded subsequence via an opaque proxy object * so that <code>iter->codePoint()</code> etc. works. * @draft ICU 78 */ inline Proxy operator->() const { inline UnsafeUTFIterator &operator++() { // pre-increment /** * @return a copy of this iterator from before the increment. * If UnitIter is a single-pass input_iterator, then this function * returns an opaque proxy object so that <code>*iter++</code> still works. * @draft ICU 78 */ inline UnsafeUTFIterator operator++(int) { // post-increment // Only enabled if UnitIter is a bidirectional_iterator (including a pointer). inline UnsafeUTFIterator &operator--() { // pre-decrement // Only enabled if UnitIter is a bidirectional_iterator (including a pointer). inline UnsafeUTFIterator operator--(int) { // post-decrement }; /** * A C++ "range" for non-validating iteration over all of the code points of a Unicode string. * The string must be well-formed. * * @tparam CP32 Code point type: UChar32 (=int32_t) or char32_t or uint32_t * @tparam Unit Code unit type: * UTF-8: char or char8_t or uint8_t; * UTF-16: char16_t or uint16_t or (on Windows) wchar_t * @draft ICU 78 */ template<typename CP32, typename Unit> class UnsafeUTFStringCodePoints { public: /** * Constructs a C++ "range" object over the code points in the string. * @draft ICU 78 */ UnsafeUTFStringCodePoints(std::basic_string_view<Unit> s) : s(s) {} /** @draft ICU 78 */ UnsafeUTFStringCodePoints(const UnsafeUTFStringCodePoints &other) = default; /** @draft ICU 78 */ UnsafeUTFStringCodePoints &operator=(const UnsafeUTFStringCodePoints &other) = default; /** @draft ICU 78 */ UnsafeUTFIterator<CP32, const Unit *> begin() const { /** @draft ICU 78 */ UnsafeUTFIterator<CP32, const Unit *> end() const { /** * @return std::reverse_iterator(end()) * @draft ICU 78 */ auto rbegin() const { /** * @return std::reverse_iterator(begin()) * @draft ICU 78 */ auto rend() const { }; /** * @tparam CP32 Code point type: UChar32 (=int32_t) or char32_t or uint32_t * @tparam UnitIter Can usually be omitted/deduced: * An iterator (often a pointer) that returns a code unit type: * UTF-8: char or char8_t or uint8_t; * UTF-16: char16_t or uint16_t or (on Windows) wchar_t * @param iter code unit iterator * @return an UnsafeUTFIterator<CP32, UnitIter> * for the given code unit iterator or character pointer * @draft ICU 78 */ template<typename CP32, typename UnitIter> auto unsafeUTFIterator(UnitIter iter) { /** * @tparam CP32 Code point type: UChar32 (=int32_t) or char32_t or uint32_t * @tparam StringView Can usually be omitted/deduced: A std::basic_string_view<Unit> * @param s input string_view * @return an UnsafeUTFStringCodePoints<CP32, Unit> * for the given std::basic_string_view<Unit>, * deducing the Unit character type * @draft ICU 78 */ template<typename CP32, typename StringView> auto unsafeUTFStringCodePoints(StringView s) { } // namespace U_HEADER_ONLY_NAMESPACE Sincerely, markus -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CAN49p6rY9q_OBZTiurH1B8W%3DT6%2BgsYARX-qZG%2BEt73LWp4G4NQ%40mail.gmail.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: Gilad A. <g....@ou...> - 2025-02-06 18:08:45
|
Hi Rich, Can you provide more information on the environment you are running on? Is the UI Mirrored for Arabic, Urdu, and Persian? can you share some sample strings? Users rarely mix and match locale elements as far as I know, Anrew Glass may have some data around this from Windows. Gilad Standard institute of Israel Hebrew Support Commitee Chair EX MSFT BIDI i18n and BIDI PM ________________________________ From: 'Rich Gillam' via icu-design <icu...@un...> Sent: Thursday, February 6, 2025 02:51 To: design-wg (CLDR) <des...@un...>; icu-design <icu...@un...> Subject: [icu-design] Bidi behavior in number and date formatting Hi everybody— I’ve run into a number of bidi-related problems lately, and I need some guidance. I’m thinking about filing tickets, but thought it might be good to discuss by email before filing, so I have a better idea of what to say in the tickets. I’ve run into a couple of situations lately where the bidi behavior of some element might be different depending on which digits we’re using to format a number, and there doesn’t seem to be any facility in CLDR to deal with this. We offer the ability for users to choose their numbering system independent of their locale, so if your system language is Arabic or Urdu, you can use either native digits or Latin digits. But consider the degree sign: If you’re using Latin digits, you want it to stick to the right-hand side of the number, but if you’re using native digits, you want it to stick to the left-hand side of the number. We get that behavior in Arabic “for free” due to the characters’ bidi properties. But we don’t get that behavior in Urdu or Persian. And I can't just deal with this by changing our copy of CLDR to put a RLM in front of the degree sign, because that’ll move it to the left-hand side regardless of my numbering system. I end up having to include clumsy special-case code. I’ve run into other variations on this: if I format a time in 12-hour format, which side “AM” or “PM” appears on might depend on the digits I’m using for the time, but I can’t control that, either (I also can’t peg it to one side or the other by changing the “AM” and “PM” strings— the bidi mark has to go on the other side of the time). I’ve also run into problems with currency formats where I’m operating in a RTL language but a particular currency symbol is all LTR characters. So what’s the preferred solution to these kinds of problems? Right now I can’t think of anything other than special-case code. For these and many other reasons, it seems like we should be getting away from embedding bidi controls in our CLDR data and moving to a code-based solution based on the bidi isolate characters (and even then, I’m not quite sure how to solve the above problems). What would it take to make that move? —Rich Gillam -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fa%2Funicode.org%2Fd%2Fmsgid%2Ficu-design%2F72389DA2-8574-4BF6-85AE-783C1A9D2D78%2540apple.com&data=05%7C02%7C%7Cf87fb39ac4c54d88bce208dd46487007%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638743999154263949%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=PGxtUX5CyyQ9qMfSOHkYdpVnX6NsZEWINgNxCy0rHAc%3D&reserved=0<https://groups.google.com/a/unicode.org/d/msgid/icu-design/72389DA2-8574-4BF6-85AE-783C1A9D2D78%40apple.com>. For more options, visit https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fa%2Funicode.org%2Fd%2Foptout&data=05%7C02%7C%7Cf87fb39ac4c54d88bce208dd46487007%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638743999154284418%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=IGkLnCNqEDRnUiZtWq5gdlNMZi7IeMACrioWLCrbWzo%3D&reserved=0<https://groups.google.com/a/unicode.org/d/optout>. _______________________________________________ icu-design mailing list icu...@li... To Un/Subscribe: https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Ficu-design&data=05%7C02%7C%7Cf87fb39ac4c54d88bce208dd46487007%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638743999154298273%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=xmIHLT9pZp0dSRgHj7oWzwIChjBUsF4O154GjCP1kew%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/icu-design> |
|
From: 'Rich G. v. icu-d. <icu...@un...> - 2025-02-06 00:51:36
|
Hi everybody— I’ve run into a number of bidi-related problems lately, and I need some guidance. I’m thinking about filing tickets, but thought it might be good to discuss by email before filing, so I have a better idea of what to say in the tickets. I’ve run into a couple of situations lately where the bidi behavior of some element might be different depending on which digits we’re using to format a number, and there doesn’t seem to be any facility in CLDR to deal with this. We offer the ability for users to choose their numbering system independent of their locale, so if your system language is Arabic or Urdu, you can use either native digits or Latin digits. But consider the degree sign: If you’re using Latin digits, you want it to stick to the right-hand side of the number, but if you’re using native digits, you want it to stick to the left-hand side of the number. We get that behavior in Arabic “for free” due to the characters’ bidi properties. But we don’t get that behavior in Urdu or Persian. And I can't just deal with this by changing our copy of CLDR to put a RLM in front of the degree sign, because that’ll move it to the left-hand side regardless of my numbering system. I end up having to include clumsy special-case code. I’ve run into other variations on this: if I format a time in 12-hour format, which side “AM” or “PM” appears on might depend on the digits I’m using for the time, but I can’t control that, either (I also can’t peg it to one side or the other by changing the “AM” and “PM” strings— the bidi mark has to go on the other side of the time). I’ve also run into problems with currency formats where I’m operating in a RTL language but a particular currency symbol is all LTR characters. So what’s the preferred solution to these kinds of problems? Right now I can’t think of anything other than special-case code. For these and many other reasons, it seems like we should be getting away from embedding bidi controls in our CLDR data and moving to a code-based solution based on the bidi isolate characters (and even then, I’m not quite sure how to solve the above problems). What would it take to make that move? —Rich Gillam -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/72389DA2-8574-4BF6-85AE-783C1A9D2D78%40apple.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: Markus S. <mar...@gm...> - 2025-01-31 17:52:01
|
On Fri, Jan 31, 2025 at 1:28 AM Younies Mahmoud <yo...@un...> wrote: > I'm writing to ask to take and accept the following ticket: > > - Ticket Number: ICU-23032 > <https://unicode-org.atlassian.net/browse/ICU-23032?atlOrigin=eyJpIjoiMTY5OWI4MzJlYjFiNDg4OWI2NTNkM2I0ZDVkMzllZTgiLCJwIjoiaiJ9> > - Summary: Fix/Improve Units Documentation and Perform Minor Cleanup > > sgtm Please remember that code changes and API doc changes need to be in by feb13. User Guide changes can continue later, but if they go into the main branch after we create the release branch, then they need to use a new ticket for 78. tnx markus -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CAN49p6ry1q1M9VsNx4hqdo0hw-qK%2BGFGfGtePvU-vHmpEgn0qQ%40mail.gmail.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: 'Rich G. v. icu-d. <icu...@un...> - 2025-01-07 00:55:30
|
George and I have talked about this proposal a couple times before. I’m wholeheartedly in favor; I just wish I’d thought of it first. :-) I’m taking a look at the PR right now... —Rich > On Jan 6, 2025, at 2:52 PM, 'George Rhoten' via icu-design <icu...@un...> wrote: > > Thanks! > > Rich said that he will try to get to it today. > > Also porting to ICU4C was simpler than anticipated. The pull request for all the changes is here: https://github.com/unicode-org/icu/pull/3326 > > George > >> On Jan 6, 2025, at 1:44 PM, Mark Davis Ⓤ <ma...@un...> wrote: >> >> Looks reasonable to me, but I'd also like to make sure Rich is cool with it. >> >> On Sun, Jan 5, 2025 at 10:34 AM 'George Rhoten' via icu-design <icu...@un... <mailto:icu...@un...>> wrote: >>> Dear ICU team & users, >>> >>> I would like to propose the following RBNF syntax change for: ICU 77 >>> Please provide feedback by: 2024-1-15 >>> Designated API reviewer: Volunteers are welcome >>> Ticket: https://unicode-org.atlassian.net/browse/ICU-22979 >>> Prototype: https://github.com/unicode-org/icu/compare/main...grhoten:icu:ff02dc17a85d8411cda6eb9061cff789a0e67f11 >>> >>> This proposal only affects the documentation and RBNF syntax. >>> >>> I’d like to extend the RBNF syntax to support more complex grammar. I’d like to change the omission rule with square brackets. By default, everything between the square brackets are omitted when the remainder is 0. My proposal will not change this behavior by default, unless a “|” (pipe symbol) is present between the square brackets. You can think of it performing like an else statement. Everything between the beginning square bracket and the pipe acts as it currently does. Everything between the pipe symbol and the end square bracket will be used instead of omitting the text. >>> >>> This behavior is important for supporting large ordinals in slavic languages. It’s convenient for other languages, like English. >>> >>> The test case in the prototype and the ticket provides more examples of the change. Below is a simplified example of the new syntax. Right now, we have the following ordinals in English. >>> %%tieth: >>> 0: tieth; >>> 1: ty-=%spellout-ordinal=; >>> %spellout-ordinal: >>> ... >>> 20: twen>%%tieth>; >>> 30: thir>%%tieth>; >>> 40: for>%%tieth>; >>> 50: fif>%%tieth>; >>> That could be simplified to the following rules instead. >>> %spellout-ordinal: >>> ... >>> 20: twent[y->>|ieth]; >>> 30: thirt[y->>|ieth]; >>> 40: fort[y->>|ieth]; >>> 50: fift[y->>|ieth]; >>> The cardinal and ordinal rules will work on either side of the pipe symbol. >>> >>> I plan to port these changes from ICU4J to ICU4C before creating a pull request. Once a released version of ICU starts supporting this syntax, the CLDR rules will be able to adopt this new syntax for the languages that need it. >>> >>> Sincerely, >>> George > > > -- > You received this message because you are subscribed to the Google Groups "icu-design" group. > To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un... <mailto:icu...@un...>. > To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/09F11AC0-8045-469E-A063-635E3FE84180%40apple.com <https://groups.google.com/a/unicode.org/d/msgid/icu-design/09F11AC0-8045-469E-A063-635E3FE84180%40apple.com?utm_medium=email&utm_source=footer>. > For more options, visit https://groups.google.com/a/unicode.org/d/optout. -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/5D685F88-5839-4885-A944-8A4563536171%40apple.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |
|
From: Markus S. <mar...@gm...> - 2024-12-23 18:47:18
|
On Fri, Dec 13, 2024 at 1:19 PM Markus Scherer <mar...@gm...> wrote: > While I was working in this area, I also made LocalPointer and its > customizations header-only > <https://unicode-org.atlassian.net/browse/ICU-22980>. The changes there > did not involve any publicly visible API changes. We are just making sure > to not use the same symbols inside ICU. > This part did not work, because some DLL-exported ICU classes have LocalPointer members, so LocalPointer needs to be (at least notionally) DLL-exported. So I am reverting the changes for this part. Best regards / happy holidays, markus > -- You received this message because you are subscribed to the Google Groups "icu-design" group. To unsubscribe from this group and stop receiving emails from it, send an email to icu...@un.... To view this discussion visit https://groups.google.com/a/unicode.org/d/msgid/icu-design/CAN49p6pdxC15ib8Jh9z%2BOizqGz4DSNMLA7b-s34O7WPfy%2Br-QA%40mail.gmail.com. For more options, visit https://groups.google.com/a/unicode.org/d/optout. |