Menu

#9 RFE: way for scanner to report subsumed tokens

closed
None
5
2006-08-16
2006-08-15
No

hi --

Looking at re2c for SpamAssassin -- it's improved a lot
since the last time I checked ;) nice work!

one thing, though. it would be really great if re2c
could track subsumed tokens. For example:

/*!re2c
"foo" {return "FOO";}
"food" {return "FOOD";}
[\000-\377] { return NULL; }
*/

Assume the input string is "food", and an
appropriately-smart caller who knows to track the
YYCURSOR state and call multiple times until it
receives NULL is being used. This should return "FOO"
on first call, then "FOOD" on the second call, then
NULL on the third call.

Instead, the longest matching token is used: return
"FOOD" on first call, then NULL on the third call.

most re2c users could write their token tables to
automatically return *both* "FOO" and "FOOD" on the
first call -- and initially I was doing this. however,
in my usage, the tokens are derived from spamassassin
rules, so I can't always know if one is subsumed by
another... and determining this programatically in
advance would require rewriting most of re2c ;)

Instead, I've been changing my calling code to not
support full regexp semantics in the input to re2c.
This is obviously defeating much of the point, so I'd
love to fix that...

Are there any plans to implement this?

cheers,

--j.

Discussion

  • Marcus Börger

    Marcus Börger - 2006-08-15
    • assigned_to: nobody --> helly
    • status: open --> closed
     
  • Marcus Börger

    Marcus Börger - 2006-08-15

    Logged In: YES
    user_id=271023

    re2c is designed in a way that requires most complex rule
    first. In this case it means the "FOOD" rule needs to be
    in front of the "FOO" rule. Then when re2c reads "FOO" you
    get the token and the story ends. However you can write
    some handling the code generated by re2c to do what you
    want. That is why re2c was built with a focus on extreme
    flexibility.

     
  • Justin Mason

    Justin Mason - 2006-08-16
    • status: closed --> open
     
  • Justin Mason

    Justin Mason - 2006-08-16

    additional comment

     
  • Marcus Börger

    Marcus Börger - 2006-08-16
    • status: open --> closed
     
  • Marcus Börger

    Marcus Börger - 2006-08-16

    Logged In: YES
    user_id=271023

    As said already you have to provide the code. What you
    want will increase complaxity from O(n) to O(n^2) and is
    nothing a code generator is supposed to deal with. Any
    indeed the "FOOD" rule ahs to be on top of the "FOO" rule.
    That's just the way re2c or any other tool of the same
    kind is designed. Sorry if that is not perfectly what you
    want.

    p.s.: You cannot add a comment as i closed the RFE.

     
  • Anonymous

    Anonymous - 2010-03-27

    This bug entry is being heavily spammed. Could anyone do anything about this?

     

Log in to post a comment.