#93 ICONV/OCONV and the private words

Changwoo Ryu

ICONV/OCONV keywords does not work on the words in the private dictionary. Such words are usually saved by applications independently.

I think Hunspell::add() and Hunspell::add_with_affix() needs the ICONV conversion.


  • Hi, also thanks for this report.

    A note about Korean spell checking possibilities in OpenOffice.org: CWS hunspell4thesaurus with Hunspell 1.2.8 is ready for QA. I hope, OOo 3.0.1 will be released with Hunspell 1.2.8, and you can use ICONV/OCONV for your dictionary. You can try the working test builds here:

    Windows: ftp://ftp.fsf.hu/OpenOffice.org_hu/devel/OOoWinDEV300m35_20081126.zip
    Linux: ftp://ftp.fsf.hu/OpenOffice.org_hu/devel/OOo_3.1.0_081122_LinuxIntel_install.tar.gz

    Regards, László

    • assigned_to: nobody --> nemethl
  • Changwoo Ryu
    Changwoo Ryu

    Thanks for the news. I've already tried the dynamically-linked OOo Debian package and libhunspell from 1.2.8. Here is a successful screenshot:


    It is still far from real use. (It's a heavy job to write Korean affix rules.) But it's being improved.

  • Nice screenshot!

    There is a quick method to develop the first version of the Korean spelling dictionary:
    1. Download Korean Wikipedia from download.wikipedia.org (>80 thousand articles, http://download.wikimedia.org/kowiki/20081126/kowiki-20081126-pages-articles.xml.bz2\)
    2. Extract page texts and convert to jamo
    3. Use affixcompress (hunspell/src/tools) on the (LC_ALL=C) sorted word list, and convert the result (aff and dic file) to Hangul.

    In fact, this is a compression of the words of the Korean Wiki with all uncommon words. Future version of affixcompress will support filtering of uncommon words and statistical classification (~describe real morphology) of the words of (agglutinative) languages.

  • Changwoo Ryu
    Changwoo Ryu

    It looks promising. I'll try it.

    But the Korean Wikipedia lacks a large set of words and affixes, because many Korean words have different agglutinations by speaker/audience relationships and all the Wikipedia articles have a consistent style. I think such dictionary will be good for report or news articles but not for the other types of text.