Re: [locale-bugs] incoming/200

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Elisha, we have been trying to clarify the use of exemplar characters, which
are for more than just collation; as a matter of fact, are not really
co-extensive with the tailored collation characters, although they will
generally overlap a good deal.

In addition, we have just adopted a change to add a new element that allows
for the addition of auxiliary exemplar characters. Please look over the
working draft, especially the section at:

http://oss.software.ibm.com/cvs/icu/~checkout~/locale/docs/tr35.html#<characters>

The cases where a script is used are in some cases simply data bugs; they
should be replaced by a more explicit list of just what is needed for a
given language. I think Indic is probably ok, but Hebrew etc should be
fixed; but someone needs to propose the exact list.

‎Mark

----- Original Message ----- 
From: "Mark Davis" <mar...@jt...>
To: <cl...@un...>
Sent: Thursday, August 12, 2004 17:47
Subject: Fw: [locale-bugs] incoming/200

> This bug is based on a misunderstanding of the exemplar characters, which
> are for more than collation. (see latest LDML draft). The cases where a
> script is used are in some cases simply data bugs; they should be replaced
> by a more explicit list of just what is needed for a given language.
>
> ‎Mark
>
> ----- Original Message ----- 
> From: <loc...@jt...>
> To: <cld...@un...>
> Sent: Thursday, August 12, 2004 17:31
> Subject: [locale-bugs] incoming/200
>
>
> > new message incoming/200
> > URL: http://www.jtcsv.com/cgibin/locale-bugs?findid=200
> >
> > ====> ORIGINAL MESSAGE FOLLOWS <====
> >
> > From: e....@co...
> > Date: Thu Aug 12 20:31:21 2004
> > Subject: Exemplar Sets
> >
> > Full_Name: Elisha Berns
> > Version: 1.1
> > Submission from: (NULL) (64.164.82.122)
> >
> >
> > FEATURE REQUEST:
> >
> > Background:
> >
> > The Exemplar Sets may be the correct format for determining collation
> rules for
> > a locale's language, but they are not well formed nor well
conceptualized
> to
> > determine font coverage for the locale's language.  If an exemplar set
is
> used
> > to generate the set of code points needed for standard, common text
layout
> for a
> > language the resulting set typically is either too large or too small to
> be
> > accurate.
> >
> > Some exemplar sets are formed using the locale's language *script* name
> which
> > includes many more code points than are needed for standard writing in
> that
> > language.  Other exemplar sets contain only the code points for the
lower
> case
> > letters and collation sequences used in that language.  If you generate
> upper
> > case variants for these code points you can get many code points never
> used by
> > the language.
> >
> > If one attempts to modify the exemplar set to include only commonly used
> > characters often the modifications become complicated, unweildy and may
> never
> > work correctly.  For example, the exemplar set for Hebrew (he), uses the
> > complete script name [:Hebr:].  To eliminate unnecessary code points
from
> this
> > set you can *attempt* to modify this set by the following set
operations:
> > [[[:dt=none:][:dt=canonical:]]&[:hebr:]] or perhaps this:
> > [[[:dt=none:][:dt=canonical:]]&[:hebr:]&[:letter:]].  However, this is
> only one
> > of many examples where such gyrations are needed to limit the set
> membership to
> > commonly  needed code points.  To make matters worse, effectively
applying
> > character properties to modify these sets depends too much on having
> intimate
> > knowledge of these languages.
> >
> > Solution:
> >
> > It would far simpler and much more accurate to create a new type of
> exemplar
> > set, the Standard Writing Exemplar Set, whose express, stated purpose is
> to
> > provide the set of code points needed for standard, common writing (text
> layout)
> > in each locale's language.  This, by design, includes lower and upper
case
> > characters and standard punctuation.
> >
> > Upper case characters are needed if one uses upper case letters when
> commonly
> > writing in the language.  Punctuation is needed if punctuation
characters
> are
> > used when commonly writing in the language.  For example, the Standard
> Writing
> > Exemplar Set for English is [a-zA-Z.,;:!?()'"].  This type of proposed
set
> would
> > directly supply the data for a test of font coverage for the locale's
> language.
> > If other types of code points are commonly used for mandatory ligatures
or
> > presentation forms they should be considered also.  The idea is to
> explicitly
> > include those code points needed for common writing in the locale's
> language and
> > not leave set membership dependent upon set operations or some other
type
> of
> > implicit mappings.
> >
> > Summary:
> >
> > To create a Standard Writing Exemplar Set which is differentiated from
the
> > current Exemplar Set both in its explicit purpose and in its actual set
> > membership.  The purpose is to provide the explicit data for performing
> font
> > coverage tests for locales.  The membership rule to include/exclude code
> points
> > in the set is the test whether a code point is commonly needed for
common
> > writing in the language.  Writing includes spelling, syntax and
> punctutation.
> >
> > Elisha Berns
> > 8/12/04
> >
> >
> > Elisha Berns 8/12/04
> >
> >
> > To remove yourself from this mail list, send an e-mail to
> > ec...@un... and write "unsubscribe cldr-bugrfe" in the
> > subject line.
> >
> >
>
>
>

Re: [locale-bugs] incoming/200

Open Source C/C++/Java libraries from Unicode

Re: [locale-bugs] incoming/200