(sorry for the long post)
It would be very helpful when developing spelling dictionaries intended to be used in a graphical environment like OOo and others, to be able to spell check texts in exactly the same way as it would be in said graphical environment. Currently I have found no such option.
Such an option would influence several aspects of hunspell, at least the following:
- suggestion order
OOo tokenisation: one string - "www.infonuorra.no"
hunspell -a tokenisation: three stings: "www", "infonuorra", "no"
Wanted: a tokenisation behaviour that is replicating the one in OOo as closely as possible, such that given the same input, you would get the same tokens out in the other end.
OOo suggestions: Mielkke, Baikke, dárkkel, Råhkkel, Fuoikke
hunspell -a suggestions: Mielkke, Mikrof, Baikke, Mierkká
Wanted: a command line option that would produce exactly the same set of suggestions as the library version/OOo version
Because there is so huge a difference in many cases between the suggestions given by OOo/hunspell and those given by command-line hunspell, it is hard to find examples of order difference (and there might not be any among the suggestions that are identical). But to be able to test the quality of the suggestions (e.g. whether the expected suggestion is among the given suggestions, and which position it has), it is important that the command line version of hunspell can produce suggestions in exactly the same order as given to OOo and other clients.
To have a look at what such testing can provide in order of quality measurements and statistics, please have a look at:
(the site is down from time to time - if so, retry in a while)
hunspell dic+aff files: http://divvun.no/static_files/hunspell-sme-smj-30-10-2007.tar.bz2
I used only the smj files, renamed as et_EE (Estonian) in OOo (there is not yet built-in support for smj in OOo - request for it has been submitted).
smj = Lule Sámi