hi --
Looking at re2c for SpamAssassin -- it's improved a lot
since the last time I checked ;) nice work!
one thing, though. it would be really great if re2c
could track subsumed tokens. For example:
/*!re2c
"foo" {return "FOO";}
"food" {return "FOOD";}
[\000-\377] { return NULL; }
*/
Assume the input string is "food", and an
appropriately-smart caller who knows to track the
YYCURSOR state and call multiple times until it
receives NULL is being used. This should return "FOO"
on first call, then "FOOD" on the second call, then
NULL on the third call.
Instead, the longest matching token is used: return
"FOOD" on first call, then NULL on the third call.
most re2c users could write their token tables to
automatically return *both* "FOO" and "FOOD" on the
first call -- and initially I was doing this. however,
in my usage, the tokens are derived from spamassassin
rules, so I can't always know if one is subsumed by
another... and determining this programatically in
advance would require rewriting most of re2c ;)
Instead, I've been changing my calling code to not
support full regexp semantics in the input to re2c.
This is obviously defeating much of the point, so I'd
love to fix that...
Are there any plans to implement this?
cheers,
--j.
Logged In: YES
user_id=271023
re2c is designed in a way that requires most complex rule
first. In this case it means the "FOOD" rule needs to be
in front of the "FOO" rule. Then when re2c reads "FOO" you
get the token and the story ends. However you can write
some handling the code generated by re2c to do what you
want. That is why re2c was built with a focus on extreme
flexibility.
additional comment
Logged In: YES
user_id=271023
As said already you have to provide the code. What you
want will increase complaxity from O(n) to O(n^2) and is
nothing a code generator is supposed to deal with. Any
indeed the "FOOD" rule ahs to be on top of the "FOO" rule.
That's just the way re2c or any other tool of the same
kind is designed. Sorry if that is not perfectly what you
want.
p.s.: You cannot add a comment as i closed the RFE.
This bug entry is being heavily spammed. Could anyone do anything about this?