Ken Williams wrote:
> On 6/15/10 10:32 AM, "Steve Rowe" <sa...@od...> wrote:
>> '$' does not match end-of-file. It is an end-of-line ('\n') lookahead.
>> [^a-z] includes '\n', so that's why you're getting this warning.
>
> Makes sense. The strange thing is that I was already using this construct
> for another similar rule and it didn't complain about that one:
>
> // Catch U.S. States that are also valid roman numerals
> STATEFRAG="mi"|"dc"|"md"
> ...
> {STATEFRAG} / [^a-z] { return newTok("WORD"); }
> {STATEFRAG} $ { return newTok("WORD"); }
Hmm, that is strange. Seems like these rules should have the same problem.
>> I *think* you can handle this situation by using three rules and a
>> non-default lexical state:
>>
>> %state NONROMAN
>> ...
>> {ROMANPAT} / [^a-z] { return newTok("ROMAN"); }
>> {ROMANPAT} / [a-z] { yypushback(yylength()); yybegin(NONROMAN); }
>> {ROMANPAT} { return newTok("ROMAN"); }
>>
>> <NONROMAN,YYINITIAL> {
>> ... // non-roman matching rules go here.
>> }
>
> That would get pretty messy if I need to use the same technique for more
> than one rule in the same grammar though.
>
> Speaking of messy, as a stopgap measure I ended up solving this by peeking
> ahead in the stream (using some of the same techniques as in
> JFlex.Emitter.emitLexFunctHeader() ) to see if the current match is followed
> by a letter. Totally illegal & unmaintainable, but it does seem to work.
I'm glad you figured it out. I'll look into adding end-of-file
lookahead assertion, like Perl's /\Z/ and /\z/.
Steve
|