The following looks like a bug in hfst-xfst's command eliminate flag FLAG, and was observed when building and testing spellers for North Sami in the Giella infrastructure. Instructions for reproductions are further down. There are potentially several steps leading to this bug, so please accept the rather long explanation.
Consider the following network traversals:
traverse> @D.NeedNoun.ON@
On path @P.Px.add@:@P.Px.add@ a:a l:@_EPSILON_SYMBOL_@ a:l s:a +CmpNP/First:s +N:@_EPSILON_SYMBOL_@ @_EPSILON_SYMBOL_@:> +Cmp/SgNom:@_EPSILON_SYMBOL_@ @U.NeedsVowRed.OFF@:@U.NeedsVowRed.OFF@ @P.CmpFrst.FALSE@:@P.CmpFrst.FALSE@ @P.CmpPref.FALSE@:@P.CmpPref.FALSE@ @D.CmpLast.TRUE@:@D.CmpLast.TRUE@ @D.CmpNone.TRUE@:@D.CmpNone.TRUE@ @U.CmpNone.FALSE@:@U.CmpNone.FALSE@ @P.CmpOnly.TRUE@:@P.CmpOnly.TRUE@ +Cmp/SplitR:- @U.NeedsVowRed.ON@:@U.NeedsVowRed.ON@ @C.NeedsVowRed@:@C.NeedsVowRed@ +Use/SpellNoSugg:@_EPSILON_SYMBOL_@ @D.CmpOnly.FALSE@:@D.CmpOnly.FALSE@ @D.CmpPref.TRUE@:@D.CmpPref.TRUE@ @D.NeedNoun.ON@:@D.NeedNoun.ON@ ' are continuations:
<Nothing, you've hit a dead end here>
On path
@P.Px.add@:@P.Px.add@ a:a l:l a:l s:a +CmpNP/First:s +N:@EPSILON_SYMBOL@ @EPSILON_SYMBOL@:> +Cmp/SgNom:@EPSILON_SYMBOL@ @U.NeedsVowRed.OFF@:@U.NeedsVowRed.OFF@ @P.CmpFrst.FALSE@:@P.CmpFrst.FALSE@ @P.CmpPref.FALSE@:@P.CmpPref.FALSE@ @D.CmpLast.TRUE@:@D.CmpLast.TRUE@ @D.CmpNone.TRUE@:@D.CmpNone.TRUE@ @U.CmpNone.FALSE@:@U.CmpNone.FALSE@ @P.CmpOnly.TRUE@:@P.CmpOnly.TRUE@ +Cmp/SplitR:- @U.NeedsVowRed.ON@:@U.NeedsVowRed.ON@ @C.NeedsVowRed@:@C.NeedsVowRed@ +Use/SpellNoSugg:@EPSILON_SYMBOL@ @D.CmpOnly.FALSE@:@D.CmpOnly.FALSE@ @D.CmpPref.TRUE@:@D.CmpPref.TRUE@ @D.NeedNoun.ON@:@D.NeedNoun.ON@ ' are continuations:
<Nothing, you've="" hit="" a="" dead="" end="" here="">
Compare with the following, which is the only possible path:
traverse> @D.NeedNoun.ON@
On path `@P.Px.add@:@P.Px.add@ a:a l:@EPSILON_SYMBOL@ a:l s:a +CmpNP/First:s +N:@EPSILON_SYMBOL@ @EPSILON_SYMBOL@:> +Cmp/SgNom:@EPSILON_SYMBOL@ @U.NeedsVowRed.ON@:@U.NeedsVowRed.ON@ @P.CmpFrst.FALSE@:@P.CmpFrst.FALSE@ @P.CmpPref.FALSE@:@P.CmpPref.FALSE@ @D.CmpLast.TRUE@:@D.CmpLast.TRUE@ @D.CmpNone.TRUE@:@D.CmpNone.TRUE@ @U.CmpNone.FALSE@:@U.CmpNone.FALSE@ @P.CmpOnly.TRUE@:@P.CmpOnly.TRUE@ +Cmp/SplitR:- @U.NeedsVowRed.ON@:@U.NeedsVowRed.ON@ @C.NeedsVowRed@:@C.NeedsVowRed@ +Use/SpellNoSugg:@EPSILON_SYMBOL@ @D.CmpOnly.FALSE@:@D.CmpOnly.FALSE@ @D.CmpPref.TRUE@:@D.CmpPref.TRUE@ @D.NeedNoun.ON@:@D.NeedNoun.ON@ ' are continuations:
<Nothing, you've="" hit="" a="" dead="" end="" here="">
There are two crucial differences here:
surface ll vs l, that is: allas- (genitive compound form) vs alas- (nominative compound form)
the value of the flag NeedsVowRed
* the tag +Cmp/SgNom
What happens during the build is the following:
The effect of this is to block compounds in the genitive case (this is the default, but can be overridden). For the example above, we want to allow "alas-", but block "allas-". And this works as intended, as long as we don't eliminate any flags.
Then I started to eliminate flags. First I eliminated CmpN. Worked fine. Then I eliminated NeedsVowRed, the one targeted in the network traversal above. That's when thing started to misbehave, like the test case above.
To reproduce, the easiest is to use the giella infra for sme:
1) svn co https://gtsvn.uit.no/langtech/trunk/gtcore
2) ./autogen.sh && ./configure && make && sudo make install
3) svn -r123471 co https://gtsvn.uit.no/langtech/trunk/langs/sme
4) ./autogen.sh && ./configure --with-hfst --without-xfst --enable-spellers
5) make
After make is done, you should get the following analyses:
echo allas- | hfst-lookup -q tools/spellcheckers/fstbased/analyser-fstspeller-gt-norm.hfst
allas- alas+N+Cmp/SplitR+Use/SpellNoSugg 20013,031250
$ echo alas- | hfst-lookup -q tools/spellcheckers/fstbased/analyser-fstspeller-gt-norm.hfst
alas- alas+N+Cmp/SplitR+Use/SpellNoSugg 20013,031250
Now edit the file tools/spellcheckers/fstbased/Makefile.am, and remove the line:
eliminate flag NeedsVowRed \n\
Touch e.g. tools/spellcheckers/fstbased/generator-fstspeller-gt-norm-comp_restricted.tmp.hfst and make again. Redo the analyses above. The result should now be:
$ echo allas- | hfst-lookup -q tools/spellcheckers/fstbased/analyser-fstspeller-gt-norm.hfst
allas- allas-+? inf
$ echo alas- | hfst-lookup -q tools/spellcheckers/fstbased/analyser-fstspeller-gt-norm.hfst
alas- alas+N+Cmp/SplitR+Use/SpellNoSugg 20013,031250
This is the intended behavior, and the only difference is the (non-)elimination of the flag NeedsVowRed.