Re: Question About Constructing Pattern Strings From API Results

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Can you give an example of a process that would make use of  
multi-character strings from an exemplar set?

Deborah

On Jul 30, 2004, at 1:59 PM, George Rhoten wrote:

> For your purposes, these grapheme clusters or contractions aren't very
> useful for you.  For other things, like collation or anything that  
> deals
> with alphabets, they are very important.  Unless any of these strings
> contain combining characters, they should not get any special treatment
> from a font.  For example, don't turn the AE grapheme cluster
> (\u0041\u0045) into the AE ligature (\u00C6).
>
> Here is another example, in traditional Spanish, the letters ch and ll  
> are
> each considered a single character (grapheme cluster), which are  
> different
> from c, h and l.  These multi-codepoint characters can get title cased  
> or
> collated differently.  Modern Spanish no longer uses these grapheme
> clusters any more, at least that is what my old and new Spanish
> dictionaries tell me.  Both of my Spanish dictionaries sort the words
> differently because of this difference.
>
> The LDML specification also briefly goes over this topic too:
> http://www.unicode.org/reports/tr35/
>
> George Rhoten
> IBM Globalization Center of Competency/ICU  San José, CA, USA
> ICU main website: http://oss.software.ibm.com/icu/index.html
>
>
>
> "Elisha Berns" <e....@co...>
> Sent by: icu...@ww...
> 07/30/2004 12:02 PM
> Please respond to
> e.berns
>
>
> To
> <an...@jt...>
> cc
> "'ICU Support'" <icu...@ww...>
> Subject
> RE: FW: Question About Constructing Pattern Strings From API Results
>
>
>
>
>
>
> Thanks for the reply Andy,
>
> I'm starting to feel really stupid asking so many questions about this
> thing, please forgive me; I really am trying to wind this up!
>
> You wrote:
>
>> I need to look into this.  I thought that scripts just populated a set
>> with the code points with the matching script property, no strings.
>
> I think you are correct about this when the exemplar set pattern string
> is a script name; however some of the exemplar set pattern strings do
> contain multicharacter strings.  For example, Hungarian:
>
> [a-z\u00E1\u00E9\u00ED\u00F3\u00F6\u00FA\u00FC\u0151\u0171
> {ccs}{cs}{ddz}{ddzs}{dz}{dzs}{ggy}{gy}{lly}{ly}{nny}{ny}{ssz}
> {sz}{tty}{ty}{zs}{zzs}]
>
> So all those groups of characters enclosed in curly braces, what is
> their meaning since they were contained in the range [a-z] at the
> beginning of the pattern string?  Do they get normalized to some kind  
> of
> diacritical/letter combination?  Is this their normalized
> representation?
>
> My question is how do you transform (??) what is inside the curly  
> braces
> to one or more code points that can be displayed by a font?  Or do I
> just have a major misunderstanding about this:  when any one of these
> combinations of code points, the "multicharacter string" is fed to a
> TrueType/OpenType layout engine, the layout engine will convert this
> string to a special glyph?  And the only test that is *required* is for
> unique code points, not all these duplicates?
>
> Thanks,
>
> Elisha
>
>
>
> _______________________________________________
> icu...@os... - icu4c-support mailing list
> To Un/Subscribe:
> http://oss.software.ibm.com/developerworks/oss/mailman/listinfo/icu4c- 
> support
>
>
>
> _______________________________________________
> icu...@os... - icu4c-support mailing list
> To Un/Subscribe:
> http://oss.software.ibm.com/developerworks/oss/mailman/listinfo/icu4c- 
> support

Re: Question About Constructing Pattern Strings From API Results

Open Source C/C++/Java libraries from Unicode

Re: Question About Constructing Pattern Strings From API Results