Re: [jflex-users] Possible bug?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Gerwin,

Could you help me understand the status of this?

At the end of April I sent you a small test case (6 files, including
grammar, test driver, etc.) which I think demonstrates this problem.  Since
I haven't heard back and because I sent it off list, I'm wondering if you
received it, or if it somehow ended up in a spam folder?  Or is the
situation that you have not been able to devote any time to this?

I used a string reader to avoid any encoding issues, and added a test to
insure that the string reader was delivering the control characters as
expected.  My initial conclusion is that the processing of jletterdigit
possibly has a flaw in which a subset of the ASCII control characters are
included.  I haven't tried to confirm the situation in the JFlex source
yet.  No doubt you would be much more efficient than I in figuring this
out, but I'll give it a try as time permits.

Best,

Bill Fenlason

On Thu, Apr 28, 2016 at 7:22 AM, Gerwin Klein <Ger...@ni...>
wrote:

> Hi William,
>
> this does sound like it could be a bug, yes.
>
> Do you have a small test spec and input with expected output? I’d like to
> try to reproduce across different versions, may be I can see what is going
> on.
>
> A common pitfall with such characters is the encoding, both of the spec
> file for JFlex and the input file to the compiled scanner. If you’re using
> the unicode escape sequences, the former shouldn’t matter, but the latter
> still might.
>
> Cheers,
> Gerwin
>
> On 26 Apr 2016, at 13:29, William Fenlason <bil...@gm...> wrote:
>
> RL1.1 Hex Notation
>
> *To meet this requirement, an implementation shall supply a mechanism for
> specifying any Unicode code point (from U+0000 to U+10FFFF), using the
> hexadecimal code point representation.*
>
> JFlex conforms. Syntax is provided to express values across the whole
> range, via \uXXXX, where XXXX is a 4-digit hex value; \Uyyyyyy, where
> yyyyyy is a 6-digit hex value; and \u{X+( X+)*}, where X+ is a 1-6 digit
> hex value.
>
>
> -------------------------------------------------------------------------------------------------
>
> If I understand it correctly, the above (taken from the JFlex User Manual)
> implies that all hex characters from \U0000 through \U10FFFF may be used in
> a lexical specification.  I don't think that is the case, and this is why.
>
> As we know, <<EOF>> cannot be used for look ahead processing.  It has been
> suggested here that one way to simulate it is to append a unique character
> to the end of the file, use it for look ahead, and then discard it.  That
> approach was adopted.
>
> We developed an extension of java.io.Reader which allows any specified
> character to be transparently appended to the end of the file (Eclipse
> document, actually), and also a substitute character to be returned in case
> the specified character occurs in the file.
>
> It seemed that a reasonable choice for an EOF character was to use one of
> the ASCII control characters from \x00 thru \x1F, avoiding the commonly
> used ones like \x00 and \x07 thru \x0D.  Initially, ETX (\x03) and EOT
> (\x04) appeared to be good alternatives.
>
> Initial testing did not bear this out - in a test case, two versions of
> JFlex (1.4.3 and 1.6.1) appended these characters to other tokens rather
> than recognizing them as separate tokens.  Additional testing convinced us
> that of the reasonable control character choices, only File Separator (FS -
> \x1C) and Group Separator (GS - \x1D) work as expected.
>
> Why should some control characters work, and others not work?  My
> suspicion is that somewhere in the JFlex code there are specific character
> dependencies in the ASCII control character range.
>
> I believe that this is a bug, either in the code or in the above
> documentation, and is contrary to the idea that any hex character may be
> used in a specification.
>
> Am I mis-reading this documentation?  Do others agree that this is a bug
> to be fixed?
>
> I've downloaded the JFlex source and am willing to look for the cause, but
> I have no idea where to start exploring.  Does anyone have suggestions?
>
> Obviously \x1C as the EOF character is a pragmatic solution "because it
> works", but that seems a bit of a kludge..
>
> Bill Fenlason
>
>
>
> ------------------------------------------------------------------------------
> Find and fix application performance issues faster with Applications
> Manager
> Applications Manager provides deep performance insights into multiple
> tiers of
> your business applications. It resolves application problems quickly and
> reduces your MTTR. Get your free trial!
> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z--
> jflex-users mailing list
> https://lists.sourceforge.net/lists/listinfo/jflex-users
>
>
>
> ------------------------------
>
> The information in this e-mail may be confidential and subject to legal
> professional privilege and/or copyright. National ICT Australia Limited
> accepts no liability for any damage caused by this email or its attachments.
>

Re: [jflex-users] Possible bug?

The fast lexer generator for Java

Re: [jflex-users] Possible bug?