Steps to reproduce:
- build hfst-ospell @HEAD (revision 3791 tested)
- build $GTHOME/langs/sme/, with ./configure --without-xfst --with-hfst --enable-spellers
- time echo illullu | ./hfst-ospell $GTHOME/langs/sme/tools/spellcheckers/fstbased/hfst/se.zhfst
- build $GTHOME/langs/kal/, with ./configure --without-xfst --with-hfst --enable-spellers,
make as follows to turn hyper-minimisation on: make HFST_LEXC_FLAGS=-F
- time echo illuklu | ./hfst-ospell $GTHOME/langs/kal/tools/spellcheckers/fstbased/hfst/kl.zhfst
Note that sme returns in < 0.5 sec, whereas kal never returns (I waited more than 15 minutes).
The only difference is the hyper-minimisation.
OS: MacOSX 10.9.2
Both SME and KAL are svn trunk@HEAD.
It just crossed my mind that we also made changes to optionalise minimisation in spelling part because it wouldn't finish for some langs, maybe this is another thing to look at.
Other issues with hyperminimisation:
in the GTDivvun infra, langs/sma/, the following build will segfault:
It segfaults when building the analyser-gt-desc.hfst:
When building yrk for spellers, the build seems to go fine, but when testing the speller zhfst file, hfst-ospell segfaults. To reproduce:
The speller test ends as follows:
Followup comments from the creator of the bug (I had forgotten to log in when I created it):
Using the latest hfst code (revision 3974), these problems seem to be mostly solved when the lexicon is compiled both with the -F and the -M flags of hfst-lexc (ie hyperminimisation and flag minimisation). The KAL test above now runs as follows:
KAL (and SMA, YRK below) was built using:
The configuration was as reported in the original bug report.
The problem is solved to the extent that the speller now works and returns the expected output. It is also solved in the sense that the fst file is of a managable size: the acceptor is 14 Mb, the error model is now 8.3 Mb, and the zhfst file is just 4.1 Mb.
It is NOT solved, though, in the sense that the speller is still not usable: waiting more than 11 seconds to get suggestions is way too long. So further work needs to be done to speed up the speller.
It is interesting that speed is not an issue for the case reported in the second comment: SMA.
SMA now compiles without issues, and running the speller is definitely much faster than for KAL:
For shorter input strings the response time is less than a second, which is quite ok for most users.
For YRK, the bug is also fixed, no segmentation fault when running the speller:
Also the speed for the YRK speller seems to be just fine, following SMA rather than KAL.
Conclusion: hyperminimisation together with the recently introduced flag minimisation seems to be stable now, producing working analysers and spellers. There is still a speed issue, but only with KAL.