$ time echo seksuálaláhkarihkkunlávdegottesesoŋŋaáigi | hfst-lookup -q tools/spellcheckers/fstbased/analyser-fstspeller-gt-norm.hfst Killed: 9 real 70m39.561s user 69m42.073s sys 0m43.796s
To repeat:
1) svn co https://gtsvn.uit.no/langtech/trunk/gtcore
2) ./autogen.sh && ./configure && make && sudo make install
3) svn -r123846 co https://gtsvn.uit.no/langtech/trunk/langs/sme
4) ./autogen.sh && ./configure --with-hfst --without-xfst --enable-spellers
5) make
6) echo seksuálaláhkarihkkunlávdegottesesoŋŋaáigi | hfst-lookup tools/spellcheckers/fstbased/analyser-fstspeller-gt-norm.hfst
The previous version did not crash, just took an enormous amount of time. The fst has some inserted flag diacritics to regulate compound behaviour.
Maybe related to bug #311 (or at least solution is in same part of code)?
That file is in OpenFst format, right? In optimized-lookup the lookup terminates in 0.67 seconds for me, which is still rather slow of course. The main problem seems to be flag diacritic -induced; there are 20 distinct outputs, but they can be reached in 1329 different ways. Overall about 80K flag diacritic settings are accepted on the way.
Yes, the file is in OpenFst format. We have a number of flag diacritics regulating compounding, and eliminating some of them would certainly help on the speed. The problem then is https://sourceforge.net/p/hfst/bugs/324/, which means we can't use flag elimination because it changes the language coverage of the fst.