Share

re2c scanner generator

Tracker: Feature Requests

5 RFE: way for scanner to report subsumed tokens - ID: 1540845
Last Update: Comment added ( nobody )

hi --

Looking at re2c for SpamAssassin -- it's improved a lot
since the last time I checked ;) nice work!

one thing, though. it would be really great if re2c
could track subsumed tokens. For example:

/*!re2c
"foo" {return "FOO";}
"food" {return "FOOD";}
[\000-\377] { return NULL; }
*/


Assume the input string is "food", and an
appropriately-smart caller who knows to track the
YYCURSOR state and call multiple times until it
receives NULL is being used. This should return "FOO"
on first call, then "FOOD" on the second call, then
NULL on the third call.

Instead, the longest matching token is used: return
"FOOD" on first call, then NULL on the third call.

most re2c users could write their token tables to
automatically return *both* "FOO" and "FOOD" on the
first call -- and initially I was doing this. however,
in my usage, the tokens are derived from spamassassin
rules, so I can't always know if one is subsumed by
another... and determining this programatically in
advance would require rewriting most of re2c ;)

Instead, I've been changing my calling code to not
support full regexp semantics in the input to re2c.
This is obviously defeating much of the point, so I'd
love to fix that...

Are there any plans to implement this?

cheers,

--j.


Justin Mason ( jmason ) - 2006-08-15 18:47

5

Closed

None

Marcus Börger

None

None

Public


Comments ( 3 )




Date: 2009-09-27 09:09
Sender: nobody

YMFhZ5 <a href="http://dztewhtgjses.com/">dztewhtgjses</a>,
[url=http://ielvugclvpbl.com/]ielvugclvpbl[/url],
[link=http://adsvdtjwgtwd.com/]adsvdtjwgtwd[/link],
http://wzjtrghajszi.com/


Date: 2006-08-16 19:21
Sender: hellyProject AdminAccepting Donations

Logged In: YES
user_id=271023

As said already you have to provide the code. What you
want will increase complaxity from O(n) to O(n^2) and is
nothing a code generator is supposed to deal with. Any
indeed the "FOOD" rule ahs to be on top of the "FOO" rule.
That's just the way re2c or any other tool of the same
kind is designed. Sorry if that is not perfectly what you
want.

p.s.: You cannot add a comment as i closed the RFE.


Date: 2006-08-15 20:53
Sender: hellyProject AdminAccepting Donations

Logged In: YES
user_id=271023

re2c is designed in a way that requires most complex rule
first. In this case it means the "FOOD" rule needs to be
in front of the "FOO" rule. Then when re2c reads "FOO" you
get the token and the story ends. However you can write
some handling the code generated by re2c to do what you
want. That is why re2c was built with a focus on extreme
flexibility.


Log in to comment.

Attached File ( 1 )

Filename Description Download
comment2.txt additional comment Download

Changes ( 8 )

Field Old Value Date By
close_date - 2006-08-16 19:21 helly
status_id Open 2006-08-16 19:21 helly
File Added 189310: comment2.txt 2006-08-16 11:05 jmason
close_date 2006-08-15 20:53 2006-08-16 10:56 jmason
status_id Closed 2006-08-16 10:56 jmason
status_id Open 2006-08-15 20:53 helly
close_date - 2006-08-15 20:53 helly
assigned_to nobody 2006-08-15 20:53 helly