Menu

#200 hfst-ol.jar produces spurious tags ; disagrees with hfst-lookup

future
open
nobody
6
2015-02-04
2013-08-22
No

Problem: hfst-ol.jar produces spurious [POSS=3] and [CLIT=HAN] tags which are not seen in hfst-lookup for that exact same input and transducer. The output of hfst-lookup is correct, the output of hfst-ol.jar is incorrect.

Program version:
hsfst-ol.jar -> latest available for download
hfst-lookup -> hfst-lookup 0.6 (hfst 3.4.5)

Transducer:

Transducer file from Flammie, transformed from .hfst to .hfstol using hfst-fst2fst -w. I can attach it if needed.

Example:

mäntyä gets an extra [POSS=3] and koiransa gets an extra [CLIT=HAN].

$ java -jar hfst-ol.jar ../MODEL/morphology.omor.hfstol 
Reading header...
Reading alphabet...
Reading transition and index tables...
Ready for input.
mäntyä
mäntyä  [WORD_ID=mänty][POS=NOUN][NUM=SG][CASE=PAR][POSS=3]     313.0

koiransa
koiransa        [WORD_ID=koira][POS=NOUN][NUM=SG][CASE=NOM][POSS=3][CLIT=HAN]   313.0
koiransa        [WORD_ID=koira][POS=NOUN][NUM=SG][CASE=GEN][POSS=3][CLIT=HAN]   313.0
koiransa        [WORD_ID=koira][POS=NOUN][NUM=PL][CASE=NOM][POSS=3][CLIT=HAN]   313.0

Compare to hfst-lookup output of the same:

$ hfst-lookup ../MODEL/morphology.omor.hfstol
mäntyä
mäntyä  [WORD_ID=mänty][POS=NOUN][NUM=SG][CASE=PAR]     313.000000

koiransa
koiransa        [WORD_ID=koira][POS=NOUN][NUM=PL][CASE=NOM][POSS=3]     313.000000
koiransa        [WORD_ID=koira][POS=NOUN][NUM=SG][CASE=GEN][POSS=3]     313.000000
koiransa        [WORD_ID=koira][POS=NOUN][NUM=SG][CASE=NOM][POSS=3]     313.000000

Discussion

  • Eetu Mäkelä

    Eetu Mäkelä - 2014-01-15

    There is a bug in getAnalyses() where the integer symbol version of the analysis is transformed to a string with uninitialized data. That is, the integer terminator character is written only after transforming the analysis, not before it.

    The attached patch fixes the problem. As an aside, it also fixes a problem where weights were in some situations calculated incorrectly, leading to huge result weights.

     

    Last edit: Eetu Mäkelä 2014-01-15
  • Sam Hardwick

    Sam Hardwick - 2014-01-16

    Thank for your contribution! I'm applying the patch.

     
MongoDB Logo MongoDB