Bit of a flex newbie.
Trying to match/tokenise input as shown below (the tokenised values I
want are shown on the RHS of the arrows below (note there are many more
values, I've extracted here the essence)
true false
--> TRUEVAL FALSEVAL
true,false
--> TRUEVAL COMMA FALSEVAL
false:true
--> FALSEVAL COLON TRUEVAL
<whitespace> true <whitespace> false
--> TRUEVAL FALSEVAL
i.e. I'd like true and false behave like keywords, separated by
whitespace, colons, commas, newlines or tabs. Its not line-based -
there could be leading separator chars and/or whitespace and some of
those separators are themselves tokens. I want to prevent matching (or
match with a different pattern) cases like this
truefalse
I want this to not match with true nor with false. I have the following
(section markers removed for brevity, printf for now - I'll deal with
integrating it with bison later when I get this doing what I want):
--- start of code
DEADSEP [ \t\n]
LIVESEP [:,]
":" { printf ("COLON\n"); }
"," { printf ("COMMA\n"); }
true{DEADSEP} { printf ("TRUEVAL\n"); }
true/{LIVESEP} { printf ("TRUEVAL\n"); }
false{DEADSEP} { printf ("FALSEVAL\n"); }
false/{LIVESEP} { printf ("FALSEVAL\n"); }
[0-9]+ { printf("INTVAL%s\n", yytext); }
{DEADSEP} { /* ignore white space */ }
. { printf("Mystery characters %c\n", *yytext); }
--- end of code
This works except for
truefalse
which allows 'false' to match even though there is no separation from
the leading 'true' (similarly for 'falsetrue').
The problem I have is wanting a leading context as well as a trailing
one - I want for example DEADSEP chars preceding one of the words
(including start-of-line) to be ignored but LIVESEP ones to be smatched
but not consumed, so they can separately generate their own tokens.
Something like the \w of perl could also do it. The only thing I can
think of is a re-entrant call because for example at the point where
",true" is matched I want the action to be "go tokenise the comma then
come back here and output a token for 'true'". Is this what is needed
to solve this ? Is it something about
the matching that leading and trailing context can't work or is
there some way to do it? There *is* another way I can think of - a
switch statement on the action - consume the leading and trailing
chars then test to see which one of the LIVESEP chars it is and do the
appropriate action that would go with matching that char - but that's
such an ugly way (and there may be more than two LIVESEP chars)
- this is a general problem - how do I solve it (with a spanner not a
sledgehammer) ?
andy
|