Re: [icu-support] ICU4J Locales supported for word/line-break iterators

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On 5/24/06, Niti Hantaweepant <nha...@ad...> wrote:

>
> Could you please point out where I can find the list of locales supported
> for ICU4J word and line-break iterators? Thank you.
>

You can see the data feeding into break iterators here ...
http://dev.icu-project.org/cgi-bin/viewcvs.cgi/icu/source/data/brkitr/

Nearly all locales use the default word and line break rules described
in Unicode UAX 14 and 29.
See http://www.unicode.org/reports/index.html.

Here are the exceptions:

For JA, word breaks are modified so no breaks appear between adjacent
Hirigana or Idoegraphic characters.

For en_us_POSIX, colon ':' will separate words.  By the Unicode TR
rules, it does not.

For ICU 3.4, the Thai locale is special-cased.  Thai dictionary usage
is only available when the Thai locale is specified.  For 3.6, the
Thai dictionary will be used whenever Thai script data is encountered,
independent of the locale.  (This was contributed by Apple to ICU4C.
The ICU4J port is not yet done, and is uncertain for ICU 3.6).

  -- Andy

Re: [icu-support] ICU4J Locales supported for word/line-break iterators

Open Source C/C++/Java libraries from Unicode

Re: [icu-support] ICU4J Locales supported for word/line-break iterators