On 6/15/10 10:32 AM, "Steve Rowe" <sa...@od...> wrote:
> '$' does not match end-of-file. It is an end-of-line ('\n') lookahead.
> [^a-z] includes '\n', so that's why you're getting this warning.
Makes sense. The strange thing is that I was already using this construct
for another similar rule and it didn't complain about that one:
// Catch U.S. States that are also valid roman numerals
STATEFRAG="mi"|"dc"|"md"
...
{STATEFRAG} / [^a-z] { return newTok("WORD"); }
{STATEFRAG} $ { return newTok("WORD"); }
> I *think* you can handle this situation by using three rules and a
> non-default lexical state:
>
> %state NONROMAN
> ...
> {ROMANPAT} / [^a-z] { return newTok("ROMAN"); }
> {ROMANPAT} / [a-z] { yypushback(yylength()); yybegin(NONROMAN); }
> {ROMANPAT} { return newTok("ROMAN"); }
>
> <NONROMAN,YYINITIAL> {
> ... // non-roman matching rules go here.
> }
That would get pretty messy if I need to use the same technique for more
than one rule in the same grammar though.
Speaking of messy, as a stopgap measure I ended up solving this by peeking
ahead in the stream (using some of the same techniques as in
JFlex.Emitter.emitLexFunctHeader() ) to see if the current match is followed
by a letter. Totally illegal & unmaintainable, but it does seem to work.
--
Ken Williams
Sr. Research Scientist
Thomson Reuters
Phone: 651-848-7712
ken...@th...
|