(1) can kill giorr stuff completely: giorr-xx.txt moves to the lexicon with \.'s; token-xx.txt rules where a "." => <d>
all giorr-xx.pre stuff and code in giorr function proper
get encoded as token-xx.txt rules that "enclose" non-terminal punc. in longer tokens. What's left will be <X>.</X> or whatever.
(2) localizes all uses of BDCHARS in tokenization, search
Log in to post a comment.