#22 Hangul syllable to/from jamo conversion for Korean language

open
None
5
2008-11-13
2008-11-09
Changwoo Ryu
No

I'm planning to build Korean dictionary for hunspell. I did some success with hunspell command but it required Hangul Unicode syllables/jamo converter like this.

$ echo korean_words | ./syl2jamo | hunspell -d ko | ./jamo2syl

Usually (modern) Korean language is written with Hangul syllables code (U+AC00-U+D7A3 in Unicode). But it is a bad choice when calculating word edit distances. Hangul jamo code (U+1100 in Unicode) is much better and can be processed in the same way of Western scripts, as each jamo code represents its own sound/keystroke.

Can hunspell convert Hangul syllable characters into Jamo characters and vice versa, so internally it processes Hangul text only in Jamo code? This will be a good start to implement Korean language support.

Conversion is simple:
http://unicode.org/reports/tr15/#Hangul

Discussion

  • Changwoo Ryu
    Changwoo Ryu
    2008-11-13

    I just found 1.2.8's new feature ICONV/OCONV.

    I guess these keywords can be used to define the internal Hangul code. But all the 11172 syllables should be listed wastefully. So it's still much better to implement this conversion.

     
  • You are right, I will check the Unicode algorithm. Thanks for your report, László

     
    • assigned_to: nobody --> nemethl
     
  • Outstanding post, I conceive blog owners should learn a lot from this web site its real user genial . A happy childhood has spoiled many a promising life. by Robertson Davies.
    <a href="http://eyeuser.com/blogs/viewstory/329022" title="Fashion">Fashion</a>