From: George R. <gr...@us...> - 2007-08-14 05:20:05
|
I think your diff is missing some documentation changes. The=20 documentation still refers Extended Grapheme Clusters, but I guess you=20 skipped that diff for clarity in the proposal. So what is considered a "Combining Character Sequence". For example,=20 Czech has "ch", but I thought of that as a grapheme cluster. My first=20 impression of "Combining Character Sequence" is that combining characters=20 are involved with a base character. I'm sure ICU users don't have all the = Unicode specs burned in their mind. What would be some differences that a = user might see when using createCharacterInstance and your new factory=20 function? A summary of differences would be nice. Where would I go to=20 see a detailed list of differences? The new proposed API is missing the "Instance" suffix, but that would make = one heck of a long function to call. George Rhoten IBM Globalization Center of Competency/ICU San Jos=E9, CA, USA http://www.icu-project.org/ "Andy Heninger" <and...@gm...>=20 Sent by: icu...@li... 08/13/2007 09:34 PM Please respond to icu...@li... To icu...@li... cc Subject [icu-design] BreakIterator API changes following the UTC ICU 3.8 has draft API that was introduced to support Extended Grapheme Clusters as they were proposed at the Unicode Technical Committee meeting last quarter. In last weeks UTC meeting, the name was changed to Extended Combining Character Sequence, and the definition was altered slightly, to exclude Thai syllables. The question now is what to do in ICU, at this late stage in the release=20 of 3.8 I propose that we 1. Rename the APIs in ICU4C to reflect the latest thinking in the UTC. 2. Add a "Technology Preview" type warning to the API, saying that these are implementing proposals that have not yet been approved by the Unicode Consortium, and that both the names and the behavior are subject to change. 4 Drop Thai syllables from the boundary rules (A trivial change) 3. Not add extended combining sequence to ICU4J for 3.8. Extended Grapheme Clusters are not in ICU4J now; putting something in would be a completely new addition, not just a rename. The proposed API renaming: in ubrk.h - /** Extended Grapheme Cluster breaks @draft ICU 3.8 */ - UBRK=5FX=5FGRAPHEME=5FCLUSTER=3D5, + /** Extended Combining Character Sequence breaks @draft ICU 3.8 */ + UBRK=5FX=5FCOMBINING=5FCHARACTER=5FSEQUENCE=3D5, UBRK=5FCOUNT =3D 6 } UBreakIteratorType; In brkiter.h static BreakIterator* U=5FEXPORT2 - createXGraphemeClusterInstance(const Locale& loc, UErrorCode& status); + createXCombiningCharacterSequence(const Locale& loc, UErrorCode&=20 status); -- Andy |