From: Steven R. L. <sr...@ic...> - 2009-01-31 22:32:14
|
Samuel, I'm glad to hear about brltty using icu, that's some very important technology which I am a fan of. I thought I had written some code to parse unihan, but I can't find it right now. Please feel free to file a bug with how you would like to use the unihan data. I had some old code that parsed/used kangxi tables on: http://www.icu-project.org/repos/icu/icuapps/trunk/ubrowse/ubrowse.c http://www.icu-project.org/repos/icu/icuapps/trunk/locexp/util/kangxi.c I don't have the generators though- they were probably just sed scripts on the command line at that time (1999) -s Samuel Thibault wrote: > Hello, > > Let me first explain the context a bit: I'm working in brltty, a > screen reading daemon. It peeks the text of the screen as unicode > strings, converts it to braille, and renders that on a braille device. > > As braille uses only 8-dot cells, there are sometimes ambiguities > in the translation. To solve them, there is a "descchar" command > that lets the user get information about a precise character ; for > now it just displays the unicode name of the character. However, in > the CJK compatibility planes, e.g. U+9000, we just get "CJK UNIFIED > IDEOGRAPH-9000" which is not particularly helpful to the user. I have > noticed that unicode.org provides a Unihan database which includes > information for such CJK characters, for instance for U+9000 there is > notably an english description: > > U+9000 kDefinition step back, retreat, withdraw > > and pronunciations in the various languages using it: > > U+9000 kCangjie YAV > U+9000 kCantonese teoi3 > U+9000 kHanyuPinlu tui4(254) > U+9000 kJapaneseKun SHIRIZOKU SHIRIZOKERU > U+9000 kJapaneseOn TAI TON > U+9000 kKorean THOY > U+9000 kMandarin TUI4 > > I couldn't find this kind of information in icu, is it there, else could > it be considered to add it? > > And actually, for chinese people who do not know english, is there > a source of information that provides more details in chinese? > (pronunciation only is not enough since many words have exactly the same > pronunciation). Same question for other languages using CJK of course... > > Samuel > > |