Menu

#231 Flag diacritics break in hfst-lexc

future
open
nobody
None
1
2015-05-18
2014-03-19
Anonymous
No

A flag in the attached lexicon breaks using hfst-lexc. This doe not happen in foma.

hfst-lexc 0.1 (hfst 3.6.1)
Copyright (C) 2010 University of Helsinki,
License GPLv3: GNU GPL version 3 http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

$ hfst-lexc mdf.foo.lexc > mdf.foo.hfst

$ hfst-lookup -X print-space -X show-flags mdf.foo.hfst
hfst-lookup: warning: It is not possible to perform fast lookups with OpenFST, std arc, tropical semiring format automata.
Using HFST basic transducer format and performing slow lookups

кандомс+V+TV+Ind+Prt1+ScSg2
к а н д о м с +V +TV +Ind +Prt1 +ScSg2 к а н д @P.CONJ.ObjAll@ > {ОЕØ} т ь @U.CONJ-MX.IND@ @U.CONJ-TX.PRT1@ @ U . C O N J - P X . 2 0 @ 0.000000

1 Attachments

Discussion

  • Anonymous

    Anonymous - 2014-03-19

    And the source file :)

     
  • Flammie Pirinen

    Flammie Pirinen - 2014-03-19

    Surely it's related to 0 handling bugs like [#277]? Have you tried Xerox tools version? How about without % before 0?

     
  • Anonymous

    Anonymous - 2014-03-19

    Didn't try Xerox, but I did try your suggestion about changing @U.CONJ-PX.2%0@ to @U.CONJ-PX.20@ and at least lookup fails. Probably, the flags are still broken.

     
  • Senka Drobac

    Senka Drobac - 2014-03-24

    It looks this bug can be reproduced with this small example:

    Multichar_Symbols
    @U.CONJ-PX.20@

    LEXICON Root
    a0@U.CONJ-PX.20@:aa@U.CONJ-PX.20@ #;
    c0@U.CONJ-PX.20@:cc@U.CONJ-PX.20@ #;

    If there is only one line in the Root lexicon, or if there aren't any epsilons, then it works ok. It looks like the problem comes from function char strip_percents(const char s, bool do_zeros); in lexc-utils.cc.

     
  • Erik Axelson

    Erik Axelson - 2015-05-18

    It seems that the problem is in using zeros inside multichar symbols. '@U.CONJ-PX.2@' works fine, but '@U.CONJ-PX.20@' and '@U.CONJ-PX.2%0@' do not.

     
MongoDB Logo MongoDB