Re: [jflex-users] Possible bug?
The fast lexer generator for Java
Brought to you by:
lsf37,
steve_rowe
From: William F. <bil...@gm...> - 2016-05-12 13:03:50
|
Gerwin, Could you help me understand the status of this? At the end of April I sent you a small test case (6 files, including grammar, test driver, etc.) which I think demonstrates this problem. Since I haven't heard back and because I sent it off list, I'm wondering if you received it, or if it somehow ended up in a spam folder? Or is the situation that you have not been able to devote any time to this? I used a string reader to avoid any encoding issues, and added a test to insure that the string reader was delivering the control characters as expected. My initial conclusion is that the processing of jletterdigit possibly has a flaw in which a subset of the ASCII control characters are included. I haven't tried to confirm the situation in the JFlex source yet. No doubt you would be much more efficient than I in figuring this out, but I'll give it a try as time permits. Best, Bill Fenlason On Thu, Apr 28, 2016 at 7:22 AM, Gerwin Klein <Ger...@ni...> wrote: > Hi William, > > this does sound like it could be a bug, yes. > > Do you have a small test spec and input with expected output? I’d like to > try to reproduce across different versions, may be I can see what is going > on. > > A common pitfall with such characters is the encoding, both of the spec > file for JFlex and the input file to the compiled scanner. If you’re using > the unicode escape sequences, the former shouldn’t matter, but the latter > still might. > > Cheers, > Gerwin > > On 26 Apr 2016, at 13:29, William Fenlason <bil...@gm...> wrote: > > RL1.1 Hex Notation > > *To meet this requirement, an implementation shall supply a mechanism for > specifying any Unicode code point (from U+0000 to U+10FFFF), using the > hexadecimal code point representation.* > > JFlex conforms. Syntax is provided to express values across the whole > range, via \uXXXX, where XXXX is a 4-digit hex value; \Uyyyyyy, where > yyyyyy is a 6-digit hex value; and \u{X+( X+)*}, where X+ is a 1-6 digit > hex value. > > > ------------------------------------------------------------------------------------------------- > > If I understand it correctly, the above (taken from the JFlex User Manual) > implies that all hex characters from \U0000 through \U10FFFF may be used in > a lexical specification. I don't think that is the case, and this is why. > > As we know, <<EOF>> cannot be used for look ahead processing. It has been > suggested here that one way to simulate it is to append a unique character > to the end of the file, use it for look ahead, and then discard it. That > approach was adopted. > > We developed an extension of java.io.Reader which allows any specified > character to be transparently appended to the end of the file (Eclipse > document, actually), and also a substitute character to be returned in case > the specified character occurs in the file. > > It seemed that a reasonable choice for an EOF character was to use one of > the ASCII control characters from \x00 thru \x1F, avoiding the commonly > used ones like \x00 and \x07 thru \x0D. Initially, ETX (\x03) and EOT > (\x04) appeared to be good alternatives. > > Initial testing did not bear this out - in a test case, two versions of > JFlex (1.4.3 and 1.6.1) appended these characters to other tokens rather > than recognizing them as separate tokens. Additional testing convinced us > that of the reasonable control character choices, only File Separator (FS - > \x1C) and Group Separator (GS - \x1D) work as expected. > > Why should some control characters work, and others not work? My > suspicion is that somewhere in the JFlex code there are specific character > dependencies in the ASCII control character range. > > I believe that this is a bug, either in the code or in the above > documentation, and is contrary to the idea that any hex character may be > used in a specification. > > Am I mis-reading this documentation? Do others agree that this is a bug > to be fixed? > > I've downloaded the JFlex source and am willing to look for the cause, but > I have no idea where to start exploring. Does anyone have suggestions? > > Obviously \x1C as the EOF character is a pragmatic solution "because it > works", but that seems a bit of a kludge.. > > Bill Fenlason > > > > ------------------------------------------------------------------------------ > Find and fix application performance issues faster with Applications > Manager > Applications Manager provides deep performance insights into multiple > tiers of > your business applications. It resolves application problems quickly and > reduces your MTTR. Get your free trial! > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z-- > jflex-users mailing list > https://lists.sourceforge.net/lists/listinfo/jflex-users > > > > ------------------------------ > > The information in this e-mail may be confidential and subject to legal > professional privilege and/or copyright. National ICT Australia Limited > accepts no liability for any damage caused by this email or its attachments. > |