Helsinki Finite-State Technology / Bugs / #325 hfst-lookup crashes after > an hour on long compound word (North Sámi)

#325 hfst-lookup crashes after > an hour on long compound word (North Sámi)

Milestone: future

Status: open

Owner: nobody

Labels: None

Priority: 1

Updated: 2015-10-30

Created: 2015-10-28

Creator: sjurum

Private: No

$ time echo seksuálaláhkarihkkunlávdegottesesoŋŋaáigi | hfst-lookup -q tools/spellcheckers/fstbased/analyser-fstspeller-gt-norm.hfst 
Killed: 9

real    70m39.561s
user    69m42.073s
sys 0m43.796s

To repeat:

1) svn co https://gtsvn.uit.no/langtech/trunk/gtcore
2) ./autogen.sh && ./configure && make && sudo make install
3) svn -r123846 co https://gtsvn.uit.no/langtech/trunk/langs/sme
4) ./autogen.sh && ./configure --with-hfst --without-xfst --enable-spellers
5) make
6) echo seksuálaláhkarihkkunlávdegottesesoŋŋaáigi | hfst-lookup tools/spellcheckers/fstbased/analyser-fstspeller-gt-norm.hfst

The previous version did not crash, just took an enormous amount of time. The fst has some inserted flag diacritics to regulate compound behaviour.

Discussion

Flammie Pirinen - 2015-10-29

Maybe related to bug #311 (or at least solution is in same part of code)?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Hardwick - 2015-10-30

That file is in OpenFst format, right? In optimized-lookup the lookup terminates in 0.67 seconds for me, which is still rather slow of course. The main problem seems to be flag diacritic -induced; there are 20 distinct outputs, but they can be reached in 1329 different ways. Overall about 80K flag diacritic settings are accepted on the way.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- sjurum - 2015-10-30
  
  Yes, the file is in OpenFst format. We have a number of flag diacritics regulating compounding, and eliminating some of them would certainly help on the speed. The problem then is https://sourceforge.net/p/hfst/bugs/324/, which means we can't use flag elimination because it changes the language coverage of the fst.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

hfst-lookup crashes after > an hour on long compound word (North Sámi)

Group

Searches

Help

#325 hfst-lookup crashes after > an hour on long compound word (North Sámi)

Discussion