I read on wiki.services.openoffice.org/wiki/Dictionaries that Hunspell is not suitable for Finnish, and we should use Voika (sp?)
However, there is a hunspell dictionary for Finnish.
Can anyone shed light on this?
http://wiki.services.openoffice.org/wiki/Dictionaries says: NOTE: Hunspell is not capable of handling Finnish language properly, so these dictionaries should not be used. Instead, Voikko project should be used,
I can only guess.
To write a dic/aff pair is not a trivial task.
I assume, the Finnish one is sub-optimal (thinks are missing and or incorrect), therefore they suggest ot use Voikko, which is designed for Finnish, and programmatically evens the deficiencies of their dic/aff pair.
The correct solution would be to fix their dic/aff pair, for example using the Estonian one as an example. This must be done by a Finnish mother tongue person.
Any news about a better support for the Finnish dictionary? Anybody knows if the Hunspell team has any plans about improving the Finnish dictionary?
I don't think there is anything wrong with the Hunspell support of Finnish, Voikko is simply better.
The Voikko team explains the reasons for not working on the Hunspell dictionary on their web page. Hunspell is not well suited for Finnish: Hunspell only supports two consecutive suffixes for any word. This is of course a major improvement compared to ispell or myspell that supported only one. But in Finnish it is common to have three or more suffixes in a word. and Hunspell has methods for allowing or forbidding certain inflected or derived forms in compounds. But binary allow/forbid is not enough, we need a method for specifying compound rules by word class and then specifying that certain derivational suffixes cause the word to belong to a different class.
It is not true, that hunspell supports only two suffixes. Even with ispell you can support much more than two suffixes, by using more flag characters. For example, class 'A' adds suf1, and class'B' the adds suf1suf1 and then class 'C' adds suf1suf2suf3, and so on. In ispell the classes are rather limited, a-z and A-Z, in hunspell much more are possible (256 in sum). Since also 2 character classes are allowed, the limit is in fact 256*256 that should be enough for any human language.
The forbidding and allowing is not necessary at all, if the affixes are properly set up.
Error checking in case of OCR is also part of hunspell, using the REP facility of .aff.
Grammar analysis is not really part of hunspell, and I think, for serious applications it is better to use a different library and approach, than hunspell does. Hunspell is not a grammar library, but a spell checking one.
I do not think, that any other tool is better for spell checking than hunspell for any language in 2011.
However, if you are more happy with voikko, keep on using it. Just do not blame hunspell falsely.
I wanted to write: 'and class'B' the adds suf1suf2'.