Menu

#325 hfst-lookup crashes after > an hour on long compound word (North Sámi)

future
open
nobody
None
1
2015-10-30
2015-10-28
sjurum
No
$ time echo seksuálaláhkarihkkunlávdegottesesoŋŋaáigi | hfst-lookup -q tools/spellcheckers/fstbased/analyser-fstspeller-gt-norm.hfst 
Killed: 9

real    70m39.561s
user    69m42.073s
sys 0m43.796s

To repeat:

1) svn co https://gtsvn.uit.no/langtech/trunk/gtcore
2) ./autogen.sh && ./configure && make && sudo make install
3) svn -r123846 co https://gtsvn.uit.no/langtech/trunk/langs/sme
4) ./autogen.sh && ./configure --with-hfst --without-xfst --enable-spellers
5) make
6) echo seksuálaláhkarihkkunlávdegottesesoŋŋaáigi | hfst-lookup tools/spellcheckers/fstbased/analyser-fstspeller-gt-norm.hfst

The previous version did not crash, just took an enormous amount of time. The fst has some inserted flag diacritics to regulate compound behaviour.

Discussion

  • Flammie Pirinen

    Flammie Pirinen - 2015-10-29

    Maybe related to bug #311 (or at least solution is in same part of code)?

     
  • Sam Hardwick

    Sam Hardwick - 2015-10-30

    That file is in OpenFst format, right? In optimized-lookup the lookup terminates in 0.67 seconds for me, which is still rather slow of course. The main problem seems to be flag diacritic -induced; there are 20 distinct outputs, but they can be reached in 1329 different ways. Overall about 80K flag diacritic settings are accepted on the way.

     
    • sjurum

      sjurum - 2015-10-30

      Yes, the file is in OpenFst format. We have a number of flag diacritics regulating compounding, and eliminating some of them would certainly help on the speed. The problem then is https://sourceforge.net/p/hfst/bugs/324/, which means we can't use flag elimination because it changes the language coverage of the fst.