From: Andy H. <and...@gm...> - 2006-05-31 01:46:22
|
On 5/24/06, Niti Hantaweepant <nha...@ad...> wrote: > > Could you please point out where I can find the list of locales supported > for ICU4J word and line-break iterators? Thank you. > You can see the data feeding into break iterators here ... http://dev.icu-project.org/cgi-bin/viewcvs.cgi/icu/source/data/brkitr/ Nearly all locales use the default word and line break rules described in Unicode UAX 14 and 29. See http://www.unicode.org/reports/index.html. Here are the exceptions: For JA, word breaks are modified so no breaks appear between adjacent Hirigana or Idoegraphic characters. For en_us_POSIX, colon ':' will separate words. By the Unicode TR rules, it does not. For ICU 3.4, the Thai locale is special-cased. Thai dictionary usage is only available when the Thai locale is specified. For 3.6, the Thai dictionary will be used whenever Thai script data is encountered, independent of the locale. (This was contributed by Apple to ICU4C. The ICU4J port is not yet done, and is uncertain for ICU 3.6). -- Andy |