#93 ICONV/OCONV and the private words

open
None
5
2008-11-27
2008-11-26
No

ICONV/OCONV keywords does not work on the words in the private dictionary. Such words are usually saved by applications independently.

I think Hunspell::add() and Hunspell::add_with_affix() needs the ICONV conversion.

Discussion

  • Németh László

    Hi, also thanks for this report.

    A note about Korean spell checking possibilities in OpenOffice.org: CWS hunspell4thesaurus with Hunspell 1.2.8 is ready for QA. I hope, OOo 3.0.1 will be released with Hunspell 1.2.8, and you can use ICONV/OCONV for your dictionary. You can try the working test builds here:

    Windows: ftp://ftp.fsf.hu/OpenOffice.org_hu/devel/OOoWinDEV300m35_20081126.zip
    Linux: ftp://ftp.fsf.hu/OpenOffice.org_hu/devel/OOo_3.1.0_081122_LinuxIntel_install.tar.gz

    Regards, László

     
  • Németh László

    • assigned_to: nobody --> nemethl
     
  • Changwoo Ryu

    Changwoo Ryu - 2008-11-27

    Thanks for the news. I've already tried the dynamically-linked OOo Debian package and libhunspell from 1.2.8. Here is a successful screenshot:

    http://img78.imageshack.us/img78/8905/openofficehunspellri3.png

    It is still far from real use. (It's a heavy job to write Korean affix rules.) But it's being improved.

     
  • Németh László

    Nice screenshot!

    There is a quick method to develop the first version of the Korean spelling dictionary:
    1. Download Korean Wikipedia from download.wikipedia.org (>80 thousand articles, http://download.wikimedia.org/kowiki/20081126/kowiki-20081126-pages-articles.xml.bz2\)
    2. Extract page texts and convert to jamo
    3. Use affixcompress (hunspell/src/tools) on the (LC_ALL=C) sorted word list, and convert the result (aff and dic file) to Hangul.

    In fact, this is a compression of the words of the Korean Wiki with all uncommon words. Future version of affixcompress will support filtering of uncommon words and statistical classification (~describe real morphology) of the words of (agglutinative) languages.

     
  • Changwoo Ryu

    Changwoo Ryu - 2008-11-28

    It looks promising. I'll try it.

    But the Korean Wikipedia lacks a large set of words and affixes, because many Korean words have different agglutinations by speaker/audience relationships and all the Wikipedia articles have a consistent style. I think such dictionary will be good for report or news articles but not for the other types of text.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks