Thank you, that solved it! In fact, the JFlex 1.4.* manual states that it doesn't really support characters outside the 16-bit range unless you define them as macros: http://jflex.de/manual.html#SECTION000101000000000000000. I suspect that implies that a character range definition is out of the question, unless I'm missing something.
Either way, I've documented that gotcha in my source code, so other developers at least have a starting point for debugging issues should they occur.
Thanks,
Ruslan
________________________________
From: Martin Walch <wal...@we...>
To: jfl...@li...; Ruslan Dimov <rus...@ya...>
Sent: Friday, August 30, 2013 5:26 AM
Subject: Re: [jflex-users] [Help] Misbehaving JFLex rules - wrong rule matched
Hi,
> If you wouldn't mind, I'd rather point you to the question I posted today on
> StackOverflow:
> http://stackoverflow.com/questions/18520420/misbehaving-jflex-rules-wrong-r
> ule-matched
I am not a jflex expert, but I still give it a shot.
Your code says:
> han = [\u3400-\u9fff\uf900-\ufaff\u2f800-\u2fa1f]
My guess is that handling the unicode characters \u2f800-\u2fa1f above number
65535 is not that easy in jflex.
The manual states:
> %unicode
> %16bit
> Both options cause the generated scanner to use the full 16 bit Unicode
> input character set that Java supports natively (character code points
> 0-65535).
Maybe you can work around this by splitting those characters. You will
probably need an additional scanner state for this.
Regards
Martin Walch
-- |