Menu

#198 hfst-twolc gets confused about pairs with spaces in characters?

future
open
nobody
hfst-twolc (2)
1
2013-08-21
2013-08-21
No

When trying to mangle ftb3 tags, hfst-twolc seemingly fails to recognise pairs or sets with spaces in characters:

Alphabet

% N % N:0 % A % A:0 % V % V:0
% Pron % Pron:0 % Num % Num:0
% Abbr % Abbr:0 % Prop % Prop:0
% Interj % Interj:0 % Dem % Dem:0
% Interr % Interr:0
% Rel % Rel:0 % Qnt % Qnt:0 % Refl % Refl:0
#

;

Sets
KillBeforeBoundaries = % N % A % V % Pron % Num % Abbr % Prop %Interj %Interr % Dem % Rel % Qnt % Refl ;

Rules
"Kill analyses before boundaries"
KBB:0 <=> _ ?* #: ; where KBB in KillBeforeBoundaries ;

Compiling with:

$ hfst-twolc rewrite-ftb3.twolc 
Reading input from rewrite-ftb3.twolc.
Writing output to STDOUT.
Reading alphabet.
Reading sets.
Reading rules and compiling their contexts and centers.
Unknown pair: __HFST_TWOLC_SPACEN __HFST_TWOLC_0
terminate called after throwing an instance of 'UndefinedSymbolPairsFound'
Aborted

Using software and hardware on hfst.ling.helsinki.fi.

The error message also could use a bit of beautifying, e.g. __HFST_TWOLC_SPACEN and __HFST_TWOLC_0 might read " N" "0" or so.

Discussion

MongoDB Logo MongoDB