Menu

#324 hfst-xfst's eliminate flag command does not always enforce the flag semantics

future
open
nobody
None
1
2015-10-23
2015-10-23
sjurum
No

The following looks like a bug in hfst-xfst's command eliminate flag FLAG, and was observed when building and testing spellers for North Sami in the Giella infrastructure. Instructions for reproductions are further down. There are potentially several steps leading to this bug, so please accept the rather long explanation.

Consider the following network traversals:

traverse> @D.NeedNoun.ON@
On path @P.Px.add@:@P.Px.add@ a:a l:@_EPSILON_SYMBOL_@ a:l s:a +CmpNP/First:s +N:@_EPSILON_SYMBOL_@ @_EPSILON_SYMBOL_@:> +Cmp/SgNom:@_EPSILON_SYMBOL_@ @U.NeedsVowRed.OFF@:@U.NeedsVowRed.OFF@ @P.CmpFrst.FALSE@:@P.CmpFrst.FALSE@ @P.CmpPref.FALSE@:@P.CmpPref.FALSE@ @D.CmpLast.TRUE@:@D.CmpLast.TRUE@ @D.CmpNone.TRUE@:@D.CmpNone.TRUE@ @U.CmpNone.FALSE@:@U.CmpNone.FALSE@ @P.CmpOnly.TRUE@:@P.CmpOnly.TRUE@ +Cmp/SplitR:- @U.NeedsVowRed.ON@:@U.NeedsVowRed.ON@ @C.NeedsVowRed@:@C.NeedsVowRed@ +Use/SpellNoSugg:@_EPSILON_SYMBOL_@ @D.CmpOnly.FALSE@:@D.CmpOnly.FALSE@ @D.CmpPref.TRUE@:@D.CmpPref.TRUE@ @D.NeedNoun.ON@:@D.NeedNoun.ON@ ' are continuations: <Nothing, you've hit a dead end here> On path@P.Px.add@:@P.Px.add@ a:a l:l a:l s:a +CmpNP/First:s +N:@EPSILON_SYMBOL@ @EPSILON_SYMBOL@:> +Cmp/SgNom:@EPSILON_SYMBOL@ @U.NeedsVowRed.OFF@:@U.NeedsVowRed.OFF@ @P.CmpFrst.FALSE@:@P.CmpFrst.FALSE@ @P.CmpPref.FALSE@:@P.CmpPref.FALSE@ @D.CmpLast.TRUE@:@D.CmpLast.TRUE@ @D.CmpNone.TRUE@:@D.CmpNone.TRUE@ @U.CmpNone.FALSE@:@U.CmpNone.FALSE@ @P.CmpOnly.TRUE@:@P.CmpOnly.TRUE@ +Cmp/SplitR:- @U.NeedsVowRed.ON@:@U.NeedsVowRed.ON@ @C.NeedsVowRed@:@C.NeedsVowRed@ +Use/SpellNoSugg:@EPSILON_SYMBOL@ @D.CmpOnly.FALSE@:@D.CmpOnly.FALSE@ @D.CmpPref.TRUE@:@D.CmpPref.TRUE@ @D.NeedNoun.ON@:@D.NeedNoun.ON@ ' are continuations:
<Nothing, you've="" hit="" a="" dead="" end="" here="">

Compare with the following, which is the only possible path:

traverse> @D.NeedNoun.ON@
On path `@P.Px.add@:@P.Px.add@ a:a l:@EPSILON_SYMBOL@ a:l s:a +CmpNP/First:s +N:@EPSILON_SYMBOL@ @EPSILON_SYMBOL@:> +Cmp/SgNom:@EPSILON_SYMBOL@ @U.NeedsVowRed.ON@:@U.NeedsVowRed.ON@ @P.CmpFrst.FALSE@:@P.CmpFrst.FALSE@ @P.CmpPref.FALSE@:@P.CmpPref.FALSE@ @D.CmpLast.TRUE@:@D.CmpLast.TRUE@ @D.CmpNone.TRUE@:@D.CmpNone.TRUE@ @U.CmpNone.FALSE@:@U.CmpNone.FALSE@ @P.CmpOnly.TRUE@:@P.CmpOnly.TRUE@ +Cmp/SplitR:- @U.NeedsVowRed.ON@:@U.NeedsVowRed.ON@ @C.NeedsVowRed@:@C.NeedsVowRed@ +Use/SpellNoSugg:@EPSILON_SYMBOL@ @D.CmpOnly.FALSE@:@D.CmpOnly.FALSE@ @D.CmpPref.TRUE@:@D.CmpPref.TRUE@ @D.NeedNoun.ON@:@D.NeedNoun.ON@ ' are continuations:
<Nothing, you've="" hit="" a="" dead="" end="" here="">

There are two crucial differences here:
surface ll vs l, that is: allas- (genitive compound form) vs alas- (nominative compound form)
the value of the flag NeedsVowRed
* the tag +Cmp/SgNom

What happens during the build is the following:

  • a default compounding flag +CmpN/SgN is inserted in front of +N
  • the inserted tag +CmpN/SgN is converted into a flag @P.CmpN.SgN@
  • the existing tag +Cmp/SgNom is turned into the flag sequence @R.CmpN.SgN@ @C.CmpN@

The effect of this is to block compounds in the genitive case (this is the default, but can be overridden). For the example above, we want to allow "alas-", but block "allas-". And this works as intended, as long as we don't eliminate any flags.

Then I started to eliminate flags. First I eliminated CmpN. Worked fine. Then I eliminated NeedsVowRed, the one targeted in the network traversal above. That's when thing started to misbehave, like the test case above.

To reproduce, the easiest is to use the giella infra for sme:
1) svn co https://gtsvn.uit.no/langtech/trunk/gtcore
2) ./autogen.sh && ./configure && make && sudo make install
3) svn -r123471 co https://gtsvn.uit.no/langtech/trunk/langs/sme
4) ./autogen.sh && ./configure --with-hfst --without-xfst --enable-spellers
5) make

After make is done, you should get the following analyses:

echo allas- | hfst-lookup -q tools/spellcheckers/fstbased/analyser-fstspeller-gt-norm.hfst
allas- alas+N+Cmp/SplitR+Use/SpellNoSugg 20013,031250

$ echo alas- | hfst-lookup -q tools/spellcheckers/fstbased/analyser-fstspeller-gt-norm.hfst
alas- alas+N+Cmp/SplitR+Use/SpellNoSugg 20013,031250

Now edit the file tools/spellcheckers/fstbased/Makefile.am, and remove the line:

     eliminate flag NeedsVowRed \n\

Touch e.g. tools/spellcheckers/fstbased/generator-fstspeller-gt-norm-comp_restricted.tmp.hfst and make again. Redo the analyses above. The result should now be:

$ echo allas- | hfst-lookup -q tools/spellcheckers/fstbased/analyser-fstspeller-gt-norm.hfst
allas- allas-+? inf

$ echo alas- | hfst-lookup -q tools/spellcheckers/fstbased/analyser-fstspeller-gt-norm.hfst
alas- alas+N+Cmp/SplitR+Use/SpellNoSugg 20013,031250

This is the intended behavior, and the only difference is the (non-)elimination of the flag NeedsVowRed.

Discussion