Hunspell and Finnish - is there a problem?

2010-11-03
2013-06-03
  • peter braham
    peter braham
    2010-11-03

    I read on wiki.services.openoffice.org/wiki/Dictionaries  that Hunspell is not suitable for Finnish, and we should use Voika (sp?)
    However, there is a hunspell dictionary for Finnish.
    Can anyone shed light on this?

    Peter

    Source:
    http://wiki.services.openoffice.org/wiki/Dictionaries says: NOTE: Hunspell is not capable of handling Finnish language properly, so these dictionaries should not be used. Instead, Voikko project should be used,

     
  • Eleonora
    Eleonora
    2010-11-05

    I can only guess.
    To write a dic/aff pair is not a trivial task.
    I assume, the Finnish one is sub-optimal (thinks are missing and or incorrect), therefore they suggest ot use Voikko, which is designed for Finnish, and programmatically evens the deficiencies of their dic/aff pair.

    The correct solution would be to fix their dic/aff pair, for example using the Estonian one as an example. This must be done by a Finnish mother tongue person.

     
  • Cristina
    Cristina
    2011-08-18

    Any news about a better support for the Finnish dictionary? Anybody knows if the Hunspell team has any plans about improving the Finnish dictionary?

    Thank you!

     
  • hmankka
    hmankka
    2011-08-23

    I don't think there is anything wrong with the Hunspell support of Finnish, Voikko is simply better.

    The Voikko team explains the reasons for not working on the Hunspell dictionary on their web page. Hunspell is not well suited for Finnish: Hunspell only supports two consecutive suffixes for any word. This is of course a major improvement compared to ispell or myspell that supported only one. But in Finnish it is common to have three or more suffixes in a word. and Hunspell has methods for allowing or forbidding certain inflected or derived forms in compounds. But binary allow/forbid is not enough, we need a method for specifying compound rules by word class and then specifying that certain derivational suffixes cause the word to belong to a different class.

    Henrik

     
  • Eleonora
    Eleonora
    2011-08-23

    It is not true, that hunspell supports only two suffixes. Even with ispell you can support much more than two suffixes, by using more flag characters. For example, class 'A' adds suf1, and class'B' the adds suf1suf1 and then class 'C' adds suf1suf2suf3, and so on. In ispell the classes are rather limited, a-z and A-Z, in hunspell much more are possible (256 in sum). Since also 2 character classes are allowed, the limit is in fact 256*256 that should be enough for any human language.

    The forbidding and allowing is not necessary at all, if the affixes are properly set up.

    Error checking in case of OCR is also  part of hunspell, using the REP facility of .aff.

    Grammar analysis is not really part of hunspell, and I think, for serious applications it is better to use a different library and approach, than hunspell does. Hunspell is not a grammar library, but a spell checking one.

    I do not think, that any other tool is better for spell checking than hunspell for any language in 2011.

    However, if you are more happy with voikko, keep on using it. Just do not blame hunspell falsely.

     
  • Eleonora
    Eleonora
    2011-08-23

    I wanted to write: 'and class'B' the adds suf1suf2'.