Re: [jflex-users] Possible bug?
The fast lexer generator for Java
Brought to you by:
lsf37,
steve_rowe
|
From: William F. <bil...@gm...> - 2016-05-12 18:26:02
|
PS
Perhaps a possible thing to do in JFlex is to change line 90 of
LexParse.cup to
return Character.isJavaIdentifierPart(c) && c > 31;
although having to code around what is (imho) a Java flaw is distasteful.
Bill
On Thu, May 12, 2016 at 9:24 AM, Gerwin Klein <Ger...@ni...>
wrote:
> Sorry, I did receive it but got bogged down in other work and haven’t had
> a chance to look at it yet. Should have at least let you know..
>
> I should be able to look at it this weekend.
>
> Cheers,
> Gerwin
>
>
>
> On 12.05.2016, at 23:03, William Fenlason <bil...@gm...> wrote:
>
> Gerwin,
>
> Could you help me understand the status of this?
>
> At the end of April I sent you a small test case (6 files, including
> grammar, test driver, etc.) which I think demonstrates this problem. Since
> I haven't heard back and because I sent it off list, I'm wondering if you
> received it, or if it somehow ended up in a spam folder? Or is the
> situation that you have not been able to devote any time to this?
>
> I used a string reader to avoid any encoding issues, and added a test to
> insure that the string reader was delivering the control characters as
> expected. My initial conclusion is that the processing of jletterdigit
> possibly has a flaw in which a subset of the ASCII control characters are
> included. I haven't tried to confirm the situation in the JFlex source
> yet. No doubt you would be much more efficient than I in figuring this
> out, but I'll give it a try as time permits.
>
> Best,
>
> Bill Fenlason
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Apr 28, 2016 at 7:22 AM, Gerwin Klein <Ger...@ni...>
> wrote:
>
>> Hi William,
>>
>> this does sound like it could be a bug, yes.
>>
>> Do you have a small test spec and input with expected output? I’d like to
>> try to reproduce across different versions, may be I can see what is going
>> on.
>>
>> A common pitfall with such characters is the encoding, both of the spec
>> file for JFlex and the input file to the compiled scanner. If you’re using
>> the unicode escape sequences, the former shouldn’t matter, but the latter
>> still might.
>>
>> Cheers,
>> Gerwin
>>
>> On 26 Apr 2016, at 13:29, William Fenlason <bil...@gm...>
>> wrote:
>>
>> RL1.1 Hex Notation
>>
>> *To meet this requirement, an implementation shall supply a mechanism for
>> specifying any Unicode code point (from U+0000 to U+10FFFF), using the
>> hexadecimal code point representation.*
>>
>> JFlex conforms. Syntax is provided to express values across the whole
>> range, via \uXXXX, where XXXX is a 4-digit hex value; \Uyyyyyy, where
>> yyyyyy is a 6-digit hex value; and \u{X+( X+)*}, where X+ is a 1-6 digit
>> hex value.
>>
>>
>> -------------------------------------------------------------------------------------------------
>>
>> If I understand it correctly, the above (taken from the JFlex User
>> Manual) implies that all hex characters from \U0000 through \U10FFFF may be
>> used in a lexical specification. I don't think that is the case, and this
>> is why.
>>
>> As we know, <<EOF>> cannot be used for look ahead processing. It has
>> been suggested here that one way to simulate it is to append a unique
>> character to the end of the file, use it for look ahead, and then discard
>> it. That approach was adopted.
>>
>> We developed an extension of java.io.Reader which allows any specified
>> character to be transparently appended to the end of the file (Eclipse
>> document, actually), and also a substitute character to be returned in case
>> the specified character occurs in the file.
>>
>> It seemed that a reasonable choice for an EOF character was to use one of
>> the ASCII control characters from \x00 thru \x1F, avoiding the commonly
>> used ones like \x00 and \x07 thru \x0D. Initially, ETX (\x03) and EOT
>> (\x04) appeared to be good alternatives.
>>
>> Initial testing did not bear this out - in a test case, two versions of
>> JFlex (1.4.3 and 1.6.1) appended these characters to other tokens rather
>> than recognizing them as separate tokens. Additional testing convinced us
>> that of the reasonable control character choices, only File Separator (FS -
>> \x1C) and Group Separator (GS - \x1D) work as expected.
>>
>> Why should some control characters work, and others not work? My
>> suspicion is that somewhere in the JFlex code there are specific character
>> dependencies in the ASCII control character range.
>>
>> I believe that this is a bug, either in the code or in the above
>> documentation, and is contrary to the idea that any hex character may be
>> used in a specification.
>>
>> Am I mis-reading this documentation? Do others agree that this is a bug
>> to be fixed?
>>
>> I've downloaded the JFlex source and am willing to look for the cause,
>> but I have no idea where to start exploring. Does anyone have suggestions?
>>
>> Obviously \x1C as the EOF character is a pragmatic solution "because it
>> works", but that seems a bit of a kludge..
>>
>> Bill Fenlason
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Find and fix application performance issues faster with Applications
>> Manager
>> Applications Manager provides deep performance insights into multiple
>> tiers of
>> your business applications. It resolves application problems quickly and
>> reduces your MTTR. Get your free trial!
>> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z--
>> jflex-users mailing list
>> https://lists.sourceforge.net/lists/listinfo/jflex-users
>>
>>
>>
>> ------------------------------
>>
>> The information in this e-mail may be confidential and subject to legal
>> professional privilege and/or copyright. National ICT Australia Limited
>> accepts no liability for any damage caused by this email or its attachments.
>>
>
>
>
|