jflex-users Mailing List for JFlex

The fast lexer generator for Java

Brought to you by: lsf37, steve_rowe

jflex-users — from users for users, help, problems, discussions

You can subscribe to this list here.

2001	Jan	Feb	Mar (2)	Apr	May	Jun	Jul (1)	Aug (5)	Sep (1)	Oct (5)	Nov	Dec (6)
2002	Jan (3)	Feb (12)	Mar (14)	Apr	May	Jun	Jul	Aug	Sep	Oct (3)	Nov (3)	Dec (6)
2003	Jan (8)	Feb (5)	Mar (7)	Apr (2)	May (5)	Jun	Jul (5)	Aug (4)	Sep (7)	Oct	Nov (21)	Dec (7)
2004	Jan (6)	Feb (5)	Mar	Apr (1)	May (10)	Jun (1)	Jul	Aug (1)	Sep (4)	Oct	Nov (2)	Dec (2)
2005	Jan (13)	Feb (2)	Mar (6)	Apr (4)	May (2)	Jun	Jul (4)	Aug (12)	Sep (3)	Oct (6)	Nov (1)	Dec
2006	Jan (7)	Feb (3)	Mar (11)	Apr (5)	May (1)	Jun (2)	Jul (2)	Aug	Sep (13)	Oct	Nov (3)	Dec (6)
2007	Jan (1)	Feb (4)	Mar (2)	Apr	May (4)	Jun (11)	Jul (2)	Aug (4)	Sep	Oct	Nov	Dec (2)
2008	Jan (1)	Feb (4)	Mar (7)	Apr	May (8)	Jun (1)	Jul (2)	Aug (4)	Sep (3)	Oct	Nov	Dec
2009	Jan (3)	Feb (10)	Mar (6)	Apr	May (6)	Jun (8)	Jul (7)	Aug	Sep	Oct	Nov (3)	Dec (4)
2010	Jan	Feb	Mar	Apr (15)	May	Jun (7)	Jul	Aug (5)	Sep	Oct	Nov	Dec
2011	Jan	Feb	Mar	Apr (7)	May (2)	Jun	Jul (2)	Aug (4)	Sep (3)	Oct	Nov	Dec
2012	Jan	Feb (1)	Mar (3)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2013	Jan (2)	Feb	Mar	Apr	May (2)	Jun (2)	Jul	Aug (6)	Sep	Oct	Nov (3)	Dec
2014	Jan (8)	Feb (3)	Mar (5)	Apr	May (7)	Jun (1)	Jul	Aug	Sep	Oct	Nov (4)	Dec
2015	Jan (2)	Feb	Mar (3)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (2)	Dec
2016	Jan (1)	Feb (3)	Mar (3)	Apr (2)	May (7)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2017	Jan	Feb (1)	Mar	Apr	May (1)	Jun	Jul	Aug	Sep (1)	Oct	Nov	Dec
2019	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (4)	Nov	Dec (1)

Flat | Threaded

1 2 3 .. 20 > >> (Page 1 of 20)

Re: [jflex-users] mau be a bug with the negation operator

From: Gerwin K. <ge...@do...> - 2019-12-09 10:25:53

Just wanted to report on the list that the issue Pascal found and reported has been fixed in the development version, with the fix to be included in the upcoming 1.8.0 release.

The defect turned out to be in the code for removal of dead states after the computation of the negated automaton. In the scanner generation process, after an NFA is negated, there can be states from which no final state is reachable any more. These have to be removed from the automaton for the scanning engine to work correctly, and under specific circumstances that removal went wrong.

This bug triggers very rarely. One can determine wether a lexer spec was affected by looking at the number of DFA states before minimisation in JFlex 1.7.0 and (the upcoming) JFLex 1.8.0 or the current development snapshot. If the number of states differ, it may have been affected by the bug, if the number of states is equal, it was not.

Thanks again to Pascal for reporting this one, it was one of the more interesting bugs in JFlex in the past few years.

Cheers,
Gerwin

> On 24 Oct 2019, at 14:08, Gerwin Klein <ge...@do...> wrote:
> 
> Hi Pascal,
> 
> I haven’t really gotten to the bottom of it yet, but it is some interaction between the presence of a negated character class and the negation operator.
> 
> If you need a work-around, changing the spec to the equivalent
> 
> EXP = [\u{0}-`b-\u{10FFFF}] [^]* [\u{0}-`b-\u{10FFFF}]
> 
> should make it work as expected (you can tell when jflex warns that the second action can never be matched). 
> 
> Cheers,
> Gerwin 
> 
>> On 22 Oct 2019, at 02:50, Pascal HENNEQUIN <pas...@te...> wrote:
>> 
>> hello,
>>  I found an issue with the negation operator "!"
>>  With the following specification, string "baba" is not matched
>>  by either EXP ou !EXP .
>> 
>> Pascal Hennequin
>> 
>> 
>> -------------------------------
>> %%
>> %standalone
>> %{
>> void ECHO(String cat) { System.out.print("["+cat+":"+yytext()+"]"); }
>> %}
>> 
>> EXP = ( [^a] [^]* [^a] )
>> ALL = {EXP} | ! {EXP}  
>> 
>> %%
>> {ALL}  { ECHO("1"); }
>> baba   { ECHO("2"); }
>> ---------------------------------
>> 
>> 
>> --
>> jflex-users mailing list
>> https://lists.sourceforge.net/lists/listinfo/jflex-users
> 
> 
> 
> --
> jflex-users mailing list
> https://lists.sourceforge.net/lists/listinfo/jflex-users

Re: [jflex-users] mau be a bug with the negation operator

From: Gerwin K. <ge...@do...> - 2019-10-24 04:00:07

> On 22 Oct 2019, at 08:05, Alan Eliasen <el...@mi... <mailto:el...@mi...>> wrote:
> To begin withn, I don't understand what [^] is supposed to match.   It looks like a negating character class, but with nothing to negate.   This makes no sense, so obviously something else was intended.   What was it?

In JFlex, [] matches nothing, and [^] is the character class that negates that, i.e. it matches any single input character. It’s a generalisation of “.”

See also the section “Semantics” on character classes on https://www.jflex.de/manual.html <https://www.jflex.de/manual.html> .

The operator ! negates entire expressions. Since Pascal is matching something of the form "r | !r", this should match literally everything (either r matches or it doesn’t), and the second line in his spec should therefore never get a chance to run (but for some reason it does for the input he sent).

Cheers,
Gerwin

Re: [jflex-users] mau be a bug with the negation operator

From: Gerwin K. <ge...@do...> - 2019-10-24 03:55:41

Hi Pascal,

I haven’t really gotten to the bottom of it yet, but it is some interaction between the presence of a negated character class and the negation operator.

If you need a work-around, changing the spec to the equivalent

EXP = [\u{0}-`b-\u{10FFFF}] [^]* [\u{0}-`b-\u{10FFFF}]

should make it work as expected (you can tell when jflex warns that the second action can never be matched). 

Cheers,
Gerwin 

> On 22 Oct 2019, at 02:50, Pascal HENNEQUIN <pas...@te...> wrote:
> 
> hello,
>   I found an issue with the negation operator "!"
>   With the following specification, string "baba" is not matched
>   by either EXP ou !EXP .
> 
> Pascal Hennequin
> 
> 
> -------------------------------
> %%
> %standalone
> %{
> void ECHO(String cat) { System.out.print("["+cat+":"+yytext()+"]"); }
> %}
> 
> EXP = ( [^a] [^]* [^a] )
> ALL = {EXP} | ! {EXP}  
> 
> %%
> {ALL}  { ECHO("1"); }
> baba   { ECHO("2"); }
> ---------------------------------
> 
> 
> --
> jflex-users mailing list
> https://lists.sourceforge.net/lists/listinfo/jflex-users

Re: [jflex-users] mau be a bug with the negation operator

From: Alan E. <el...@mi...> - 2019-10-21 21:05:52

On October 21, 2019 9:50:02 AM MDT, Pascal HENNEQUIN <pas...@te...> wrote:
>hello,
>   I found an issue with the negation operator "!"
>   With the following specification, string "baba" is not matched
>   by either EXP ou !EXP .
>
>Pascal Hennequin
>
>
>-------------------------------
>%%
>%standalone
>%{
>void ECHO(String cat) { System.out.print("["+cat+":"+yytext()+"]"); }
>%}
>
>EXP = ( [^a] [^]* [^a] )
>ALL = {EXP} | ! {EXP}  
>
>%%
>{ALL}  { ECHO("1"); }
>baba   { ECHO("2"); }
>---------------------------------
>
>
>--
>jflex-users mailing list
>https://lists.sourceforge.net/lists/listinfo/jflex-users

To begin withn, I don't understand what [^] is supposed to match.   It looks like a negating character class, but with nothing to negate.   This makes no sense, so obviously something else was intended.   What was it?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

[jflex-users] mau be a bug with the negation operator

From: Pascal H. <pas...@te...> - 2019-10-21 15:50:14

hello,
   I found an issue with the negation operator "!"
   With the following specification, string "baba" is not matched
   by either EXP ou !EXP .

Pascal Hennequin


-------------------------------
%%
%standalone
%{
void ECHO(String cat) { System.out.print("["+cat+":"+yytext()+"]"); }
%}

EXP = ( [^a] [^]* [^a] )
ALL = {EXP} | ! {EXP}  

%%
{ALL}  { ECHO("1"); }
baba   { ECHO("2"); }
---------------------------------

[jflex-users] IllegalArgumentException: character value expected

From: davide s. <sfo...@st...> - 2017-09-09 13:18:46

Hi there,

I'm having some problem with the generation of the .java file. After the
DFA minimization, during the code writing an exception is thrown and the
class content truncated:

java.lang.IllegalArgumentException: character value expected
at jflex.PackEmitter.emitUC(PackEmitter.java:108)
at jflex.CountEmitter.emit(CountEmitter.java:102)
at jflex.Emitter.emitDynamicInit(Emitter.java:530)
at jflex.Emitter.emit(Emitter.java:1431)
at jflex.Main.generate(Main.java:112)
at jflex.Main.generate(Main.java:394)
at jflex.Main.main(Main.java:411)

Thanks for your help

-- 
Davide Sforza

[jflex-users] Markdown?

From: Hanns H. R. <co...@sc...> - 2017-05-19 23:39:29

hi there,

has anyone implemented a .flex definition for Markdown yet?

best, .h.h.

[jflex-users] How can I write a token reg exp for an unterminated string literal including EOF?

From: Scott W. <sco...@gm...> - 2017-02-15 14:00:31

The string literal token reg exp for my language is:

STRING_LITERAL='([^'\\\n]|\\.)*'

For the consumer of this lexer (IntelliJ IDEA custom language plugin), I
also need to have a token that represents an unterminated string literal
(technically they could be the same token).  For the most part, the
following works:

UNTERMINATED_STRING_LITERAL='([^'\\\n]|\\.)*['\n]

However, when the entire document is, for example, the following:

String str = 'foo.bar<eof>

it's not recognized.  Is there some way to include the notion of
end-of-file in the token like I'm able to include the notion of
end-of-line?  I've tried using the Java Pattern \Z and \z, but those
apparently aren't valid for JFlex's regular expression syntax.

Oh, and because of how I'm using this, all line endings are already
normalized to \n, so I don't need to consider \r or \r\n here.

Thanks much in advance!
Scott

Re: [jflex-users] Possible bug?

From: Gerwin K. <Ger...@ni...> - 2016-05-14 03:55:29

Looks like this is pretty much sorted out:

Yes, [:jletterdigit:] is intended to mean exactly isJavaIdentifierPart, with all its faults. The idea is to give access to the Java platform definitions, so it would not be a good idea to tweak it.

There’s nothing stopping you from defining your own character class macro, though. Maybe something along the lines of the following?

ignorable = [\u0000-\u0008,\u000E-\u001B,\u007F-\u009F]
letterdigit = [[:jletterdigit:] -- {ignorable}]

Cheers,
Gerwin

On 14.05.2016, at 06:01, William Fenlason <bil...@gm...<mailto:bil...@gm...>> wrote:

Lee,

Yes, I agree. Certainly isIdentifierIgnorable() is preferable.

Allowing nonprinting characters or "ignorable" characters within identifiers makes no sense to me. If the characters are "ignorable", does that mean that equals() is affected? Are two identifiers, one with embedded control characters and one without (but otherwise the same) equal? If not, what does "ignorable" mean? If so, are the equals() overrides cost justified?

Currently JFlex defines [:jletterdigit:] to be identical with isJavaIdentifierPart. For my purposes it would be nice if JFlex specified that [:jletterdigit:] does NOT include ignorable characters, but I doubt that Gerwin feels the same, nor should he. I don't know if there are potential problems in JFlex with regard to identifiers containing control characters, but obviously they should be avoided. Probably they only occur in special situations like mine, where a control character is artificially inserted into the input.

Bottom line - I think it can be argued that including "ignorable", nonprinting characters in isJavaIdentifierPart() was a design error, but obviously we have to live with it.

On Fri, May 13, 2016 at 1:31 PM, Lee Carver <le...@pn...<mailto:le...@pn...>> wrote:
This appears to be by (weird) design. My guess is that a call to isIdentifierIgnorable() would be a better test then > 31.

The Oracle JavaSE-7 documents this behavior for isJavaIdentifierPart() -

<>
A character may be part of a Java identifier if any of the following are true:
...
- isIdentifierIgnorable(codePoint) returns true for the character

</>

And under isIdentifierIgnorable(char ch) we have -

<>
Determines if the specified character should be regarded as an ignorable character in a Java identifier or a Unicode identifier.
The following Unicode characters are ignorable in a Java identifier or a Unicode identifier:

ISO control characters that are not whitespace
'\u0000' through '\u0008'
'\u000E' through '\u001B'
'\u007F' through '\u009F'
all characters that have the FORMAT general category value
</>

On Thu, May 12, 2016 at 11:25 AM, William Fenlason <bil...@gm...<mailto:bil...@gm...>> wrote:
PS

Perhaps a possible thing to do in JFlex is to change line 90 of LexParse.cup to

return Character.isJavaIdentifierPart(c) && c > 31;

although having to code around what is (imho) a Java flaw is distasteful.

Bill

On Thu, May 12, 2016 at 9:24 AM, Gerwin Klein <Ger...@ni...<mailto:Ger...@ni...>> wrote:
Sorry, I did receive it but got bogged down in other work and haven’t had a chance to look at it yet. Should have at least let you know..

I should be able to look at it this weekend.

Cheers,
Gerwin

On 12.05.2016, at 23:03, William Fenlason <bil...@gm...<mailto:bil...@gm...>> wrote:

Gerwin,

Could you help me understand the status of this?

At the end of April I sent you a small test case (6 files, including grammar, test driver, etc.) which I think demonstrates this problem. Since I haven't heard back and because I sent it off list, I'm wondering if you received it, or if it somehow ended up in a spam folder? Or is the situation that you have not been able to devote any time to this?

I used a string reader to avoid any encoding issues, and added a test to insure that the string reader was delivering the control characters as expected. My initial conclusion is that the processing of jletterdigit possibly has a flaw in which a subset of the ASCII control characters are included. I haven't tried to confirm the situation in the JFlex source yet. No doubt you would be much more efficient than I in figuring this out, but I'll give it a try as time permits.

Best,

Bill Fenlason

On Thu, Apr 28, 2016 at 7:22 AM, Gerwin Klein <Ger...@ni...<mailto:Ger...@ni...>> wrote:
Hi William,

this does sound like it could be a bug, yes.

Do you have a small test spec and input with expected output? I’d like to try to reproduce across different versions, may be I can see what is going on.

A common pitfall with such characters is the encoding, both of the spec file for JFlex and the input file to the compiled scanner. If you’re using the unicode escape sequences, the former shouldn’t matter, but the latter still might.

Cheers,
Gerwin

On 26 Apr 2016, at 13:29, William Fenlason <bil...@gm...<mailto:bil...@gm...>> wrote:

RL1.1 Hex Notation

To meet this requirement, an implementation shall supply a mechanism for specifying any Unicode code point (from U+0000 to U+10FFFF), using the hexadecimal code point representation.

JFlex conforms. Syntax is provided to express values across the whole range, via \uXXXX, where XXXX is a 4-digit hex value; \Uyyyyyy, where yyyyyy is a 6-digit hex value; and \u{X+( X+)*}, where X+ is a 1-6 digit hex value.

-------------------------------------------------------------------------------------------------

If I understand it correctly, the above (taken from the JFlex User Manual) implies that all hex characters from \U0000 through \U10FFFF may be used in a lexical specification. I don't think that is the case, and this is why.

As we know, <<EOF>> cannot be used for look ahead processing. It has been suggested here that one way to simulate it is to append a unique character to the end of the file, use it for look ahead, and then discard it. That approach was adopted.

We developed an extension of java.io.Reader which allows any specified character to be transparently appended to the end of the file (Eclipse document, actually), and also a substitute character to be returned in case the specified character occurs in the file.

It seemed that a reasonable choice for an EOF character was to use one of the ASCII control characters from \x00 thru \x1F, avoiding the commonly used ones like \x00 and \x07 thru \x0D. Initially, ETX (\x03) and EOT (\x04) appeared to be good alternatives.

Initial testing did not bear this out - in a test case, two versions of JFlex (1.4.3 and 1.6.1) appended these characters to other tokens rather than recognizing them as separate tokens. Additional testing convinced us that of the reasonable control character choices, only File Separator (FS - \x1C) and Group Separator (GS - \x1D) work as expected.

Why should some control characters work, and others not work? My suspicion is that somewhere in the JFlex code there are specific character dependencies in the ASCII control character range.

I believe that this is a bug, either in the code or in the above documentation, and is contrary to the idea that any hex character may be used in a specification.

Am I mis-reading this documentation? Do others agree that this is a bug to be fixed?

I've downloaded the JFlex source and am willing to look for the cause, but I have no idea where to start exploring. Does anyone have suggestions?

Obviously \x1C as the EOF character is a pragmatic solution "because it works", but that seems a bit of a kludge..

Bill Fenlason

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z--
jflex-users mailing list
https://lists.sourceforge.net/lists/listinfo/jflex-users

________________________________

The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.

------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
--
jflex-users mailing list
https://lists.sourceforge.net/lists/listinfo/jflex-users

Re: [jflex-users] Possible bug?

From: William F. <bil...@gm...> - 2016-05-13 20:01:41

Lee,

Yes, I agree.  Certainly isIdentifierIgnorable() is preferable.

Allowing nonprinting characters or "ignorable" characters within
identifiers makes no sense to me.  If the characters are "ignorable", does
that mean that equals() is affected?  Are two identifiers, one with
embedded control characters and one without (but otherwise the same)
equal?  If not, what does "ignorable" mean?  If so, are the equals()
overrides cost justified?

Currently JFlex defines [:jletterdigit:] to be identical with
isJavaIdentifierPart.  For my purposes it would be nice if JFlex specified
that [:jletterdigit:] does NOT include ignorable characters, but I doubt
that Gerwin feels the same, nor should he.  I don't know if there are
potential problems in JFlex with regard to identifiers containing control
characters, but obviously they should be avoided.  Probably they only occur
in special situations like mine, where a control character is artificially
inserted into the input.

Bottom line - I think it can be argued that including "ignorable",
nonprinting characters in isJavaIdentifierPart() was a design error, but
obviously we have to live with it.

On Fri, May 13, 2016 at 1:31 PM, Lee Carver <le...@pn...> wrote:

> This appears to be by (weird) design.  My guess is that a call to
> isIdentifierIgnorable() would be a better test then > 31.
>
> The Oracle JavaSE-7 documents this behavior for isJavaIdentifierPart() -
>
> <>
> A character may be part of a Java identifier if any of the following are
> true:
> ...
> - isIdentifierIgnorable(codePoint) returns true for the character
>
> </>
>
> And under  isIdentifierIgnorable(char ch) we have -
>
> <>
> Determines if the specified character should be regarded as an ignorable
> character in a Java identifier or a Unicode identifier.
> The following Unicode characters are ignorable in a Java identifier or a
> Unicode identifier:
>
> ISO control characters that are not whitespace
> '\u0000' through '\u0008'
> '\u000E' through '\u001B'
> '\u007F' through '\u009F'
> all characters that have the FORMAT general category value
> </>
>
> On Thu, May 12, 2016 at 11:25 AM, William Fenlason <bil...@gm...
> > wrote:
>
>> PS
>>
>> Perhaps a possible thing to do  in JFlex is to change line 90 of
>> LexParse.cup to
>>
>>         return Character.isJavaIdentifierPart(c) && c > 31;
>>
>> although having to code around what is (imho) a Java flaw is distasteful.
>>
>> Bill
>>
>>
>>
>> On Thu, May 12, 2016 at 9:24 AM, Gerwin Klein <Ger...@ni...>
>> wrote:
>>
>>> Sorry, I did receive it but got bogged down in other work and haven’t
>>> had a chance to look at it yet. Should have at least let you know..
>>>
>>> I should be able to look at it this weekend.
>>>
>>> Cheers,
>>> Gerwin
>>>
>>>
>>>
>>> On 12.05.2016, at 23:03, William Fenlason <bil...@gm...>
>>> wrote:
>>>
>>> Gerwin,
>>>
>>> Could you help me understand the status of this?
>>>
>>> At the end of April I sent you a small test case (6 files, including
>>> grammar, test driver, etc.) which I think demonstrates this problem.  Since
>>> I haven't heard back and because I sent it off list, I'm wondering if you
>>> received it, or if it somehow ended up in a spam folder?  Or is the
>>> situation that you have not been able to devote any time to this?
>>>
>>> I used a string reader to avoid any encoding issues, and added a test to
>>> insure that the string reader was delivering the control characters as
>>> expected.  My initial conclusion is that the processing of jletterdigit
>>> possibly has a flaw in which a subset of the ASCII control characters are
>>> included.  I haven't tried to confirm the situation in the JFlex source
>>> yet.  No doubt you would be much more efficient than I in figuring this
>>> out, but I'll give it a try as time permits.
>>>
>>> Best,
>>>
>>> Bill Fenlason
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Apr 28, 2016 at 7:22 AM, Gerwin Klein <Ger...@ni...
>>> > wrote:
>>>
>>>> Hi William,
>>>>
>>>> this does sound like it could be a bug, yes.
>>>>
>>>> Do you have a small test spec and input with expected output? I’d like
>>>> to try to reproduce across different versions, may be I can see what is
>>>> going on.
>>>>
>>>> A common pitfall with such characters is the encoding, both of the spec
>>>> file for JFlex and the input file to the compiled scanner. If you’re using
>>>> the unicode escape sequences, the former shouldn’t matter, but the latter
>>>> still might.
>>>>
>>>> Cheers,
>>>> Gerwin
>>>>
>>>> On 26 Apr 2016, at 13:29, William Fenlason <bil...@gm...>
>>>> wrote:
>>>>
>>>> RL1.1 Hex Notation
>>>>
>>>> *To meet this requirement, an implementation shall supply a mechanism
>>>> for specifying any Unicode code point (from U+0000 to U+10FFFF), using the
>>>> hexadecimal code point representation.*
>>>>
>>>> JFlex conforms. Syntax is provided to express values across the whole
>>>> range, via \uXXXX, where XXXX is a 4-digit hex value; \Uyyyyyy, where
>>>> yyyyyy is a 6-digit hex value; and \u{X+( X+)*}, where X+ is a 1-6
>>>> digit hex value.
>>>>
>>>>
>>>> -------------------------------------------------------------------------------------------------
>>>>
>>>> If I understand it correctly, the above (taken from the JFlex User
>>>> Manual) implies that all hex characters from \U0000 through \U10FFFF may be
>>>> used in a lexical specification.  I don't think that is the case, and this
>>>> is why.
>>>>
>>>> As we know, <<EOF>> cannot be used for look ahead processing.  It has
>>>> been suggested here that one way to simulate it is to append a unique
>>>> character to the end of the file, use it for look ahead, and then discard
>>>> it.  That approach was adopted.
>>>>
>>>> We developed an extension of java.io.Reader which allows any specified
>>>> character to be transparently appended to the end of the file (Eclipse
>>>> document, actually), and also a substitute character to be returned in case
>>>> the specified character occurs in the file.
>>>>
>>>> It seemed that a reasonable choice for an EOF character was to use one
>>>> of the ASCII control characters from \x00 thru \x1F, avoiding the commonly
>>>> used ones like \x00 and \x07 thru \x0D.  Initially, ETX (\x03) and EOT
>>>> (\x04) appeared to be good alternatives.
>>>>
>>>> Initial testing did not bear this out - in a test case, two versions of
>>>> JFlex (1.4.3 and 1.6.1) appended these characters to other tokens rather
>>>> than recognizing them as separate tokens.  Additional testing convinced us
>>>> that of the reasonable control character choices, only File Separator (FS -
>>>> \x1C) and Group Separator (GS - \x1D) work as expected.
>>>>
>>>> Why should some control characters work, and others not work?  My
>>>> suspicion is that somewhere in the JFlex code there are specific character
>>>> dependencies in the ASCII control character range.
>>>>
>>>> I believe that this is a bug, either in the code or in the above
>>>> documentation, and is contrary to the idea that any hex character may be
>>>> used in a specification.
>>>>
>>>> Am I mis-reading this documentation?  Do others agree that this is a
>>>> bug to be fixed?
>>>>
>>>> I've downloaded the JFlex source and am willing to look for the cause,
>>>> but I have no idea where to start exploring.  Does anyone have suggestions?
>>>>
>>>> Obviously \x1C as the EOF character is a pragmatic solution "because it
>>>> works", but that seems a bit of a kludge..
>>>>
>>>> Bill Fenlason
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Find and fix application performance issues faster with Applications
>>>> Manager
>>>> Applications Manager provides deep performance insights into multiple
>>>> tiers of
>>>> your business applications. It resolves application problems quickly and
>>>> reduces your MTTR. Get your free trial!
>>>> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z--
>>>> jflex-users mailing list
>>>> https://lists.sourceforge.net/lists/listinfo/jflex-users
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> The information in this e-mail may be confidential and subject to legal
>>>> professional privilege and/or copyright. National ICT Australia Limited
>>>> accepts no liability for any damage caused by this email or its attachments.
>>>>
>>>
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Mobile security can be enabling, not merely restricting. Employees who
>> bring their own devices (BYOD) to work are irked by the imposition of MDM
>> restrictions. Mobile Device Manager Plus allows you to control only the
>> apps on BYO-devices by containerizing them, leaving personal data
>> untouched!
>> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
>> --
>> jflex-users mailing list
>> https://lists.sourceforge.net/lists/listinfo/jflex-users
>>
>>
>

Re: [jflex-users] Possible bug?

From: Lee C. <le...@pn...> - 2016-05-13 17:55:16

This appears to be by (weird) design.  My guess is that a call to
isIdentifierIgnorable() would be a better test then > 31.

The Oracle JavaSE-7 documents this behavior for isJavaIdentifierPart() -

<>
A character may be part of a Java identifier if any of the following are
true:
...
- isIdentifierIgnorable(codePoint) returns true for the character

</>

And under  isIdentifierIgnorable(char ch) we have -

<>
Determines if the specified character should be regarded as an ignorable
character in a Java identifier or a Unicode identifier.
The following Unicode characters are ignorable in a Java identifier or a
Unicode identifier:

ISO control characters that are not whitespace
'\u0000' through '\u0008'
'\u000E' through '\u001B'
'\u007F' through '\u009F'
all characters that have the FORMAT general category value
</>

On Thu, May 12, 2016 at 11:25 AM, William Fenlason <bil...@gm...>
wrote:

> PS
>
> Perhaps a possible thing to do  in JFlex is to change line 90 of
> LexParse.cup to
>
>         return Character.isJavaIdentifierPart(c) && c > 31;
>
> although having to code around what is (imho) a Java flaw is distasteful.
>
> Bill
>
>
>
> On Thu, May 12, 2016 at 9:24 AM, Gerwin Klein <Ger...@ni...>
> wrote:
>
>> Sorry, I did receive it but got bogged down in other work and haven’t had
>> a chance to look at it yet. Should have at least let you know..
>>
>> I should be able to look at it this weekend.
>>
>> Cheers,
>> Gerwin
>>
>>
>>
>> On 12.05.2016, at 23:03, William Fenlason <bil...@gm...> wrote:
>>
>> Gerwin,
>>
>> Could you help me understand the status of this?
>>
>> At the end of April I sent you a small test case (6 files, including
>> grammar, test driver, etc.) which I think demonstrates this problem.  Since
>> I haven't heard back and because I sent it off list, I'm wondering if you
>> received it, or if it somehow ended up in a spam folder?  Or is the
>> situation that you have not been able to devote any time to this?
>>
>> I used a string reader to avoid any encoding issues, and added a test to
>> insure that the string reader was delivering the control characters as
>> expected.  My initial conclusion is that the processing of jletterdigit
>> possibly has a flaw in which a subset of the ASCII control characters are
>> included.  I haven't tried to confirm the situation in the JFlex source
>> yet.  No doubt you would be much more efficient than I in figuring this
>> out, but I'll give it a try as time permits.
>>
>> Best,
>>
>> Bill Fenlason
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Apr 28, 2016 at 7:22 AM, Gerwin Klein <Ger...@ni...>
>> wrote:
>>
>>> Hi William,
>>>
>>> this does sound like it could be a bug, yes.
>>>
>>> Do you have a small test spec and input with expected output? I’d like
>>> to try to reproduce across different versions, may be I can see what is
>>> going on.
>>>
>>> A common pitfall with such characters is the encoding, both of the spec
>>> file for JFlex and the input file to the compiled scanner. If you’re using
>>> the unicode escape sequences, the former shouldn’t matter, but the latter
>>> still might.
>>>
>>> Cheers,
>>> Gerwin
>>>
>>> On 26 Apr 2016, at 13:29, William Fenlason <bil...@gm...>
>>> wrote:
>>>
>>> RL1.1 Hex Notation
>>>
>>> *To meet this requirement, an implementation shall supply a mechanism
>>> for specifying any Unicode code point (from U+0000 to U+10FFFF), using the
>>> hexadecimal code point representation.*
>>>
>>> JFlex conforms. Syntax is provided to express values across the whole
>>> range, via \uXXXX, where XXXX is a 4-digit hex value; \Uyyyyyy, where
>>> yyyyyy is a 6-digit hex value; and \u{X+( X+)*}, where X+ is a 1-6
>>> digit hex value.
>>>
>>>
>>> -------------------------------------------------------------------------------------------------
>>>
>>> If I understand it correctly, the above (taken from the JFlex User
>>> Manual) implies that all hex characters from \U0000 through \U10FFFF may be
>>> used in a lexical specification.  I don't think that is the case, and this
>>> is why.
>>>
>>> As we know, <<EOF>> cannot be used for look ahead processing.  It has
>>> been suggested here that one way to simulate it is to append a unique
>>> character to the end of the file, use it for look ahead, and then discard
>>> it.  That approach was adopted.
>>>
>>> We developed an extension of java.io.Reader which allows any specified
>>> character to be transparently appended to the end of the file (Eclipse
>>> document, actually), and also a substitute character to be returned in case
>>> the specified character occurs in the file.
>>>
>>> It seemed that a reasonable choice for an EOF character was to use one
>>> of the ASCII control characters from \x00 thru \x1F, avoiding the commonly
>>> used ones like \x00 and \x07 thru \x0D.  Initially, ETX (\x03) and EOT
>>> (\x04) appeared to be good alternatives.
>>>
>>> Initial testing did not bear this out - in a test case, two versions of
>>> JFlex (1.4.3 and 1.6.1) appended these characters to other tokens rather
>>> than recognizing them as separate tokens.  Additional testing convinced us
>>> that of the reasonable control character choices, only File Separator (FS -
>>> \x1C) and Group Separator (GS - \x1D) work as expected.
>>>
>>> Why should some control characters work, and others not work?  My
>>> suspicion is that somewhere in the JFlex code there are specific character
>>> dependencies in the ASCII control character range.
>>>
>>> I believe that this is a bug, either in the code or in the above
>>> documentation, and is contrary to the idea that any hex character may be
>>> used in a specification.
>>>
>>> Am I mis-reading this documentation?  Do others agree that this is a bug
>>> to be fixed?
>>>
>>> I've downloaded the JFlex source and am willing to look for the cause,
>>> but I have no idea where to start exploring.  Does anyone have suggestions?
>>>
>>> Obviously \x1C as the EOF character is a pragmatic solution "because it
>>> works", but that seems a bit of a kludge..
>>>
>>> Bill Fenlason
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Find and fix application performance issues faster with Applications
>>> Manager
>>> Applications Manager provides deep performance insights into multiple
>>> tiers of
>>> your business applications. It resolves application problems quickly and
>>> reduces your MTTR. Get your free trial!
>>> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z--
>>> jflex-users mailing list
>>> https://lists.sourceforge.net/lists/listinfo/jflex-users
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> The information in this e-mail may be confidential and subject to legal
>>> professional privilege and/or copyright. National ICT Australia Limited
>>> accepts no liability for any damage caused by this email or its attachments.
>>>
>>
>>
>>
>
>
> ------------------------------------------------------------------------------
> Mobile security can be enabling, not merely restricting. Employees who
> bring their own devices (BYOD) to work are irked by the imposition of MDM
> restrictions. Mobile Device Manager Plus allows you to control only the
> apps on BYO-devices by containerizing them, leaving personal data
> untouched!
> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
> --
> jflex-users mailing list
> https://lists.sourceforge.net/lists/listinfo/jflex-users
>
>

Re: [jflex-users] Possible bug?

From: William F. <bil...@gm...> - 2016-05-12 18:26:02

PS

Perhaps a possible thing to do  in JFlex is to change line 90 of
LexParse.cup to

        return Character.isJavaIdentifierPart(c) && c > 31;

although having to code around what is (imho) a Java flaw is distasteful.

Bill



On Thu, May 12, 2016 at 9:24 AM, Gerwin Klein <Ger...@ni...>
wrote:

> Sorry, I did receive it but got bogged down in other work and haven’t had
> a chance to look at it yet. Should have at least let you know..
>
> I should be able to look at it this weekend.
>
> Cheers,
> Gerwin
>
>
>
> On 12.05.2016, at 23:03, William Fenlason <bil...@gm...> wrote:
>
> Gerwin,
>
> Could you help me understand the status of this?
>
> At the end of April I sent you a small test case (6 files, including
> grammar, test driver, etc.) which I think demonstrates this problem.  Since
> I haven't heard back and because I sent it off list, I'm wondering if you
> received it, or if it somehow ended up in a spam folder?  Or is the
> situation that you have not been able to devote any time to this?
>
> I used a string reader to avoid any encoding issues, and added a test to
> insure that the string reader was delivering the control characters as
> expected.  My initial conclusion is that the processing of jletterdigit
> possibly has a flaw in which a subset of the ASCII control characters are
> included.  I haven't tried to confirm the situation in the JFlex source
> yet.  No doubt you would be much more efficient than I in figuring this
> out, but I'll give it a try as time permits.
>
> Best,
>
> Bill Fenlason
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Apr 28, 2016 at 7:22 AM, Gerwin Klein <Ger...@ni...>
> wrote:
>
>> Hi William,
>>
>> this does sound like it could be a bug, yes.
>>
>> Do you have a small test spec and input with expected output? I’d like to
>> try to reproduce across different versions, may be I can see what is going
>> on.
>>
>> A common pitfall with such characters is the encoding, both of the spec
>> file for JFlex and the input file to the compiled scanner. If you’re using
>> the unicode escape sequences, the former shouldn’t matter, but the latter
>> still might.
>>
>> Cheers,
>> Gerwin
>>
>> On 26 Apr 2016, at 13:29, William Fenlason <bil...@gm...>
>> wrote:
>>
>> RL1.1 Hex Notation
>>
>> *To meet this requirement, an implementation shall supply a mechanism for
>> specifying any Unicode code point (from U+0000 to U+10FFFF), using the
>> hexadecimal code point representation.*
>>
>> JFlex conforms. Syntax is provided to express values across the whole
>> range, via \uXXXX, where XXXX is a 4-digit hex value; \Uyyyyyy, where
>> yyyyyy is a 6-digit hex value; and \u{X+( X+)*}, where X+ is a 1-6 digit
>> hex value.
>>
>>
>> -------------------------------------------------------------------------------------------------
>>
>> If I understand it correctly, the above (taken from the JFlex User
>> Manual) implies that all hex characters from \U0000 through \U10FFFF may be
>> used in a lexical specification.  I don't think that is the case, and this
>> is why.
>>
>> As we know, <<EOF>> cannot be used for look ahead processing.  It has
>> been suggested here that one way to simulate it is to append a unique
>> character to the end of the file, use it for look ahead, and then discard
>> it.  That approach was adopted.
>>
>> We developed an extension of java.io.Reader which allows any specified
>> character to be transparently appended to the end of the file (Eclipse
>> document, actually), and also a substitute character to be returned in case
>> the specified character occurs in the file.
>>
>> It seemed that a reasonable choice for an EOF character was to use one of
>> the ASCII control characters from \x00 thru \x1F, avoiding the commonly
>> used ones like \x00 and \x07 thru \x0D.  Initially, ETX (\x03) and EOT
>> (\x04) appeared to be good alternatives.
>>
>> Initial testing did not bear this out - in a test case, two versions of
>> JFlex (1.4.3 and 1.6.1) appended these characters to other tokens rather
>> than recognizing them as separate tokens.  Additional testing convinced us
>> that of the reasonable control character choices, only File Separator (FS -
>> \x1C) and Group Separator (GS - \x1D) work as expected.
>>
>> Why should some control characters work, and others not work?  My
>> suspicion is that somewhere in the JFlex code there are specific character
>> dependencies in the ASCII control character range.
>>
>> I believe that this is a bug, either in the code or in the above
>> documentation, and is contrary to the idea that any hex character may be
>> used in a specification.
>>
>> Am I mis-reading this documentation?  Do others agree that this is a bug
>> to be fixed?
>>
>> I've downloaded the JFlex source and am willing to look for the cause,
>> but I have no idea where to start exploring.  Does anyone have suggestions?
>>
>> Obviously \x1C as the EOF character is a pragmatic solution "because it
>> works", but that seems a bit of a kludge..
>>
>> Bill Fenlason
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Find and fix application performance issues faster with Applications
>> Manager
>> Applications Manager provides deep performance insights into multiple
>> tiers of
>> your business applications. It resolves application problems quickly and
>> reduces your MTTR. Get your free trial!
>> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z--
>> jflex-users mailing list
>> https://lists.sourceforge.net/lists/listinfo/jflex-users
>>
>>
>>
>> ------------------------------
>>
>> The information in this e-mail may be confidential and subject to legal
>> professional privilege and/or copyright. National ICT Australia Limited
>> accepts no liability for any damage caused by this email or its attachments.
>>
>
>
>

Re: [jflex-users] Possible bug?

From: William F. <bil...@gm...> - 2016-05-12 17:16:44

Hi Gerwin,

After looking at the very readable JFlex code, I could see that the problem
was not with JFlex.

The simple case below shows the root cause of the problem

Is there any reasonable explanation for why many of the ASCII control
characters are considered to be Java letters or digits?

My gut tells me this is not what the Java designers had in mind.  Maybe
this is a question for Oracle?

Bill Fenlason

--------------------------------------------------------------------------------------------------

public class Main {
   public static void main(String[] args) {
      char c ;

      for (int i = 0; i < 64; i += 1) {
          c = (char) i;

      boolean b = Character.isJavaIdentifierPart(c);

      System.out.println("" + i +" (x" + x(i)+ ") " + (i > 31? c : " ")
              + (i < 10?" ":"")
              + " is java identifier part: " + b );
      }
   }

   static String x (int i) {
       String s = "0123456789ABCDEF";
       if (i <256)
           return "" + s.charAt(i/16) + s.charAt(i&15);
       return x(i/256) + x(i&255);
   }
}

/* --- results ----

0 (x00)    is java identifier part: true
1 (x01)    is java identifier part: true
2 (x02)    is java identifier part: true
3 (x03)    is java identifier part: true
4 (x04)    is java identifier part: true
5 (x05)    is java identifier part: true
6 (x06)    is java identifier part: true
7 (x07)    is java identifier part: true
8 (x08)    is java identifier part: true
9 (x09)    is java identifier part: false
10 (x0A)   is java identifier part: false
11 (x0B)   is java identifier part: false
12 (x0C)   is java identifier part: false
13 (x0D)   is java identifier part: false
14 (x0E)   is java identifier part: true
15 (x0F)   is java identifier part: true
16 (x10)   is java identifier part: true
17 (x11)   is java identifier part: true
18 (x12)   is java identifier part: true
19 (x13)   is java identifier part: true
20 (x14)   is java identifier part: true
21 (x15)   is java identifier part: true
22 (x16)   is java identifier part: true
23 (x17)   is java identifier part: true
24 (x18)   is java identifier part: true
25 (x19)   is java identifier part: true
26 (x1A)   is java identifier part: true
27 (x1B)   is java identifier part: true
28 (x1C)   is java identifier part: false
29 (x1D)   is java identifier part: false
30 (x1E)   is java identifier part: false
31 (x1F)   is java identifier part: false
32 (x20)   is java identifier part: false
33 (x21) ! is java identifier part: false
34 (x22) " is java identifier part: false
35 (x23) # is java identifier part: false
36 (x24) $ is java identifier part: true
37 (x25) % is java identifier part: false
38 (x26) & is java identifier part: false
39 (x27) ' is java identifier part: false
40 (x28) ( is java identifier part: false
41 (x29) ) is java identifier part: false
42 (x2A) * is java identifier part: false
43 (x2B) + is java identifier part: false
44 (x2C) , is java identifier part: false
45 (x2D) - is java identifier part: false
46 (x2E) . is java identifier part: false
47 (x2F) / is java identifier part: false
48 (x30) 0 is java identifier part: true
49 (x31) 1 is java identifier part: true
50 (x32) 2 is java identifier part: true
51 (x33) 3 is java identifier part: true
52 (x34) 4 is java identifier part: true
53 (x35) 5 is java identifier part: true
54 (x36) 6 is java identifier part: true
55 (x37) 7 is java identifier part: true
56 (x38) 8 is java identifier part: true
57 (x39) 9 is java identifier part: true
58 (x3A) : is java identifier part: false
59 (x3B) ; is java identifier part: false
60 (x3C) < is java identifier part: false
61 (x3D) = is java identifier part: false
62 (x3E) > is java identifier part: false
63 (x3F) ? is java identifier part: false

*/



On Thu, May 12, 2016 at 9:24 AM, Gerwin Klein <Ger...@ni...>
wrote:

> Sorry, I did receive it but got bogged down in other work and haven’t had
> a chance to look at it yet. Should have at least let you know..
>
> I should be able to look at it this weekend.
>
> Cheers,
> Gerwin
>
>
>
> On 12.05.2016, at 23:03, William Fenlason <bil...@gm...> wrote:
>
> Gerwin,
>
> Could you help me understand the status of this?
>
> At the end of April I sent you a small test case (6 files, including
> grammar, test driver, etc.) which I think demonstrates this problem.  Since
> I haven't heard back and because I sent it off list, I'm wondering if you
> received it, or if it somehow ended up in a spam folder?  Or is the
> situation that you have not been able to devote any time to this?
>
> I used a string reader to avoid any encoding issues, and added a test to
> insure that the string reader was delivering the control characters as
> expected.  My initial conclusion is that the processing of jletterdigit
> possibly has a flaw in which a subset of the ASCII control characters are
> included.  I haven't tried to confirm the situation in the JFlex source
> yet.  No doubt you would be much more efficient than I in figuring this
> out, but I'll give it a try as time permits.
>
> Best,
>
> Bill Fenlason
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Apr 28, 2016 at 7:22 AM, Gerwin Klein <Ger...@ni...>
> wrote:
>
>> Hi William,
>>
>> this does sound like it could be a bug, yes.
>>
>> Do you have a small test spec and input with expected output? I’d like to
>> try to reproduce across different versions, may be I can see what is going
>> on.
>>
>> A common pitfall with such characters is the encoding, both of the spec
>> file for JFlex and the input file to the compiled scanner. If you’re using
>> the unicode escape sequences, the former shouldn’t matter, but the latter
>> still might.
>>
>> Cheers,
>> Gerwin
>>
>> On 26 Apr 2016, at 13:29, William Fenlason <bil...@gm...>
>> wrote:
>>
>> RL1.1 Hex Notation
>>
>> *To meet this requirement, an implementation shall supply a mechanism for
>> specifying any Unicode code point (from U+0000 to U+10FFFF), using the
>> hexadecimal code point representation.*
>>
>> JFlex conforms. Syntax is provided to express values across the whole
>> range, via \uXXXX, where XXXX is a 4-digit hex value; \Uyyyyyy, where
>> yyyyyy is a 6-digit hex value; and \u{X+( X+)*}, where X+ is a 1-6 digit
>> hex value.
>>
>>
>> -------------------------------------------------------------------------------------------------
>>
>> If I understand it correctly, the above (taken from the JFlex User
>> Manual) implies that all hex characters from \U0000 through \U10FFFF may be
>> used in a lexical specification.  I don't think that is the case, and this
>> is why.
>>
>> As we know, <<EOF>> cannot be used for look ahead processing.  It has
>> been suggested here that one way to simulate it is to append a unique
>> character to the end of the file, use it for look ahead, and then discard
>> it.  That approach was adopted.
>>
>> We developed an extension of java.io.Reader which allows any specified
>> character to be transparently appended to the end of the file (Eclipse
>> document, actually), and also a substitute character to be returned in case
>> the specified character occurs in the file.
>>
>> It seemed that a reasonable choice for an EOF character was to use one of
>> the ASCII control characters from \x00 thru \x1F, avoiding the commonly
>> used ones like \x00 and \x07 thru \x0D.  Initially, ETX (\x03) and EOT
>> (\x04) appeared to be good alternatives.
>>
>> Initial testing did not bear this out - in a test case, two versions of
>> JFlex (1.4.3 and 1.6.1) appended these characters to other tokens rather
>> than recognizing them as separate tokens.  Additional testing convinced us
>> that of the reasonable control character choices, only File Separator (FS -
>> \x1C) and Group Separator (GS - \x1D) work as expected.
>>
>> Why should some control characters work, and others not work?  My
>> suspicion is that somewhere in the JFlex code there are specific character
>> dependencies in the ASCII control character range.
>>
>> I believe that this is a bug, either in the code or in the above
>> documentation, and is contrary to the idea that any hex character may be
>> used in a specification.
>>
>> Am I mis-reading this documentation?  Do others agree that this is a bug
>> to be fixed?
>>
>> I've downloaded the JFlex source and am willing to look for the cause,
>> but I have no idea where to start exploring.  Does anyone have suggestions?
>>
>> Obviously \x1C as the EOF character is a pragmatic solution "because it
>> works", but that seems a bit of a kludge..
>>
>> Bill Fenlason
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Find and fix application performance issues faster with Applications
>> Manager
>> Applications Manager provides deep performance insights into multiple
>> tiers of
>> your business applications. It resolves application problems quickly and
>> reduces your MTTR. Get your free trial!
>> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z--
>> jflex-users mailing list
>> https://lists.sourceforge.net/lists/listinfo/jflex-users
>>
>>
>>
>> ------------------------------
>>
>> The information in this e-mail may be confidential and subject to legal
>> professional privilege and/or copyright. National ICT Australia Limited
>> accepts no liability for any damage caused by this email or its attachments.
>>
>
>
>

Re: [jflex-users] Possible bug?

From: Gerwin K. <Ger...@ni...> - 2016-05-12 13:24:36

Sorry, I did receive it but got bogged down in other work and haven’t had a chance to look at it yet. Should have at least let you know..

I should be able to look at it this weekend.

Cheers,
Gerwin

On 12.05.2016, at 23:03, William Fenlason <bil...@gm...<mailto:bil...@gm...>> wrote:

Gerwin,

Could you help me understand the status of this?

Best,

Bill Fenlason

On Thu, Apr 28, 2016 at 7:22 AM, Gerwin Klein <Ger...@ni...<mailto:Ger...@ni...>> wrote:
Hi William,

this does sound like it could be a bug, yes.

Do you have a small test spec and input with expected output? I’d like to try to reproduce across different versions, may be I can see what is going on.

Cheers,
Gerwin

On 26 Apr 2016, at 13:29, William Fenlason <bil...@gm...<mailto:bil...@gm...>> wrote:

RL1.1 Hex Notation

To meet this requirement, an implementation shall supply a mechanism for specifying any Unicode code point (from U+0000 to U+10FFFF), using the hexadecimal code point representation.

-------------------------------------------------------------------------------------------------

Why should some control characters work, and others not work? My suspicion is that somewhere in the JFlex code there are specific character dependencies in the ASCII control character range.

I believe that this is a bug, either in the code or in the above documentation, and is contrary to the idea that any hex character may be used in a specification.

Am I mis-reading this documentation? Do others agree that this is a bug to be fixed?

I've downloaded the JFlex source and am willing to look for the cause, but I have no idea where to start exploring. Does anyone have suggestions?

Obviously \x1C as the EOF character is a pragmatic solution "because it works", but that seems a bit of a kludge..

Bill Fenlason

________________________________

Re: [jflex-users] Possible bug?

From: William F. <bil...@gm...> - 2016-05-12 13:03:50

Gerwin,

Could you help me understand the status of this?

At the end of April I sent you a small test case (6 files, including
grammar, test driver, etc.) which I think demonstrates this problem.  Since
I haven't heard back and because I sent it off list, I'm wondering if you
received it, or if it somehow ended up in a spam folder?  Or is the
situation that you have not been able to devote any time to this?

I used a string reader to avoid any encoding issues, and added a test to
insure that the string reader was delivering the control characters as
expected.  My initial conclusion is that the processing of jletterdigit
possibly has a flaw in which a subset of the ASCII control characters are
included.  I haven't tried to confirm the situation in the JFlex source
yet.  No doubt you would be much more efficient than I in figuring this
out, but I'll give it a try as time permits.

Best,

Bill Fenlason













On Thu, Apr 28, 2016 at 7:22 AM, Gerwin Klein <Ger...@ni...>
wrote:

> Hi William,
>
> this does sound like it could be a bug, yes.
>
> Do you have a small test spec and input with expected output? I’d like to
> try to reproduce across different versions, may be I can see what is going
> on.
>
> A common pitfall with such characters is the encoding, both of the spec
> file for JFlex and the input file to the compiled scanner. If you’re using
> the unicode escape sequences, the former shouldn’t matter, but the latter
> still might.
>
> Cheers,
> Gerwin
>
> On 26 Apr 2016, at 13:29, William Fenlason <bil...@gm...> wrote:
>
> RL1.1 Hex Notation
>
> *To meet this requirement, an implementation shall supply a mechanism for
> specifying any Unicode code point (from U+0000 to U+10FFFF), using the
> hexadecimal code point representation.*
>
> JFlex conforms. Syntax is provided to express values across the whole
> range, via \uXXXX, where XXXX is a 4-digit hex value; \Uyyyyyy, where
> yyyyyy is a 6-digit hex value; and \u{X+( X+)*}, where X+ is a 1-6 digit
> hex value.
>
>
> -------------------------------------------------------------------------------------------------
>
> If I understand it correctly, the above (taken from the JFlex User Manual)
> implies that all hex characters from \U0000 through \U10FFFF may be used in
> a lexical specification.  I don't think that is the case, and this is why.
>
> As we know, <<EOF>> cannot be used for look ahead processing.  It has been
> suggested here that one way to simulate it is to append a unique character
> to the end of the file, use it for look ahead, and then discard it.  That
> approach was adopted.
>
> We developed an extension of java.io.Reader which allows any specified
> character to be transparently appended to the end of the file (Eclipse
> document, actually), and also a substitute character to be returned in case
> the specified character occurs in the file.
>
> It seemed that a reasonable choice for an EOF character was to use one of
> the ASCII control characters from \x00 thru \x1F, avoiding the commonly
> used ones like \x00 and \x07 thru \x0D.  Initially, ETX (\x03) and EOT
> (\x04) appeared to be good alternatives.
>
> Initial testing did not bear this out - in a test case, two versions of
> JFlex (1.4.3 and 1.6.1) appended these characters to other tokens rather
> than recognizing them as separate tokens.  Additional testing convinced us
> that of the reasonable control character choices, only File Separator (FS -
> \x1C) and Group Separator (GS - \x1D) work as expected.
>
> Why should some control characters work, and others not work?  My
> suspicion is that somewhere in the JFlex code there are specific character
> dependencies in the ASCII control character range.
>
> I believe that this is a bug, either in the code or in the above
> documentation, and is contrary to the idea that any hex character may be
> used in a specification.
>
> Am I mis-reading this documentation?  Do others agree that this is a bug
> to be fixed?
>
> I've downloaded the JFlex source and am willing to look for the cause, but
> I have no idea where to start exploring.  Does anyone have suggestions?
>
> Obviously \x1C as the EOF character is a pragmatic solution "because it
> works", but that seems a bit of a kludge..
>
> Bill Fenlason
>
>
>
> ------------------------------------------------------------------------------
> Find and fix application performance issues faster with Applications
> Manager
> Applications Manager provides deep performance insights into multiple
> tiers of
> your business applications. It resolves application problems quickly and
> reduces your MTTR. Get your free trial!
> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z--
> jflex-users mailing list
> https://lists.sourceforge.net/lists/listinfo/jflex-users
>
>
>
> ------------------------------
>
> The information in this e-mail may be confidential and subject to legal
> professional privilege and/or copyright. National ICT Australia Limited
> accepts no liability for any damage caused by this email or its attachments.
>

Re: [jflex-users] Possible bug?

From: Gerwin K. <Ger...@ni...> - 2016-04-28 11:22:50

Hi William,

this does sound like it could be a bug, yes.

Do you have a small test spec and input with expected output? I’d like to try to reproduce across different versions, may be I can see what is going on.

Cheers,
Gerwin

On 26 Apr 2016, at 13:29, William Fenlason <bil...@gm...<mailto:bil...@gm...>> wrote:

RL1.1 Hex Notation

To meet this requirement, an implementation shall supply a mechanism for specifying any Unicode code point (from U+0000 to U+10FFFF), using the hexadecimal code point representation.

-------------------------------------------------------------------------------------------------

Why should some control characters work, and others not work? My suspicion is that somewhere in the JFlex code there are specific character dependencies in the ASCII control character range.

I believe that this is a bug, either in the code or in the above documentation, and is contrary to the idea that any hex character may be used in a specification.

Am I mis-reading this documentation? Do others agree that this is a bug to be fixed?

I've downloaded the JFlex source and am willing to look for the cause, but I have no idea where to start exploring. Does anyone have suggestions?

Obviously \x1C as the EOF character is a pragmatic solution "because it works", but that seems a bit of a kludge..

Bill Fenlason

________________________________

[jflex-users] Possible bug?

From: William F. <bil...@gm...> - 2016-04-26 03:29:54

RL1.1 Hex Notation

*To meet this requirement, an implementation shall supply a mechanism for
specifying any Unicode code point (from U+0000 to U+10FFFF), using the
hexadecimal code point representation.*

JFlex conforms. Syntax is provided to express values across the whole
range, via \uXXXX, where XXXX is a 4-digit hex value; \Uyyyyyy, where yyyyyy
is a 6-digit hex value; and \u{X+( X+)*}, where X+ is a 1-6 digit hex value.

-------------------------------------------------------------------------------------------------

If I understand it correctly, the above (taken from the JFlex User Manual)
implies that all hex characters from \U0000 through \U10FFFF may be used in
a lexical specification. I don't think that is the case, and this is why.

As we know, <<EOF>> cannot be used for look ahead processing. It has been
suggested here that one way to simulate it is to append a unique character
to the end of the file, use it for look ahead, and then discard it. That
approach was adopted.

We developed an extension of java.io.Reader which allows any specified
character to be transparently appended to the end of the file (Eclipse
document, actually), and also a substitute character to be returned in case
the specified character occurs in the file.

It seemed that a reasonable choice for an EOF character was to use one of
the ASCII control characters from \x00 thru \x1F, avoiding the commonly
used ones like \x00 and \x07 thru \x0D. Initially, ETX (\x03) and EOT
(\x04) appeared to be good alternatives.

Initial testing did not bear this out - in a test case, two versions of
JFlex (1.4.3 and 1.6.1) appended these characters to other tokens rather
than recognizing them as separate tokens. Additional testing convinced us
that of the reasonable control character choices, only File Separator (FS -
\x1C) and Group Separator (GS - \x1D) work as expected.

Why should some control characters work, and others not work? My suspicion
is that somewhere in the JFlex code there are specific character
dependencies in the ASCII control character range.

I believe that this is a bug, either in the code or in the above
documentation, and is contrary to the idea that any hex character may be
used in a specification.

Am I mis-reading this documentation? Do others agree that this is a bug to
be fixed?

I've downloaded the JFlex source and am willing to look for the cause, but
I have no idea where to start exploring. Does anyone have suggestions?

Obviously \x1C as the EOF character is a pragmatic solution "because it
works", but that seems a bit of a kludge..

Bill Fenlason

Re: [jflex-users] Download page question / bug?

From: Gerwin K. <Ger...@ni...> - 2016-03-28 00:19:03

Thanks for reporting that. It should now be fixed.

Cheers,
Gerwin



On 28.03.2016, at 08:48, William Fenlason <bil...@gm...<mailto:bil...@gm...>> wrote:

On the download page (http://jflex.de/download.html), the JFlex Maven plugin shows the name: jflex-maven-plugin-1.6.1.zip two times.

Each of the two (identical) download buttons actually download the same file: jflex-maven-1.6.1.tar.gz

I would assume that the first button should read: jflex-maven-plugin-1.6.1.tar.gz,
and the second button should download the file: jflex-maven-1.6.1.zip (assuming it exists).

I noticed this because I was trying to download the zip file.

Bill Fenlason



------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140--
jflex-users mailing list
https://lists.sourceforge.net/lists/listinfo/jflex-users


________________________________

The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.

[jflex-users] Download page question / bug?

From: William F. <bil...@gm...> - 2016-03-27 21:48:18

On the download page (http://jflex.de/download.html), the JFlex Maven
plugin shows the name: jflex-maven-plugin-1.6.1.zip two times.

Each of the two (identical) download buttons actually download the same
file: jflex-maven-1.6.1.tar.gz

I would assume that the first button should read:
jflex-maven-plugin-1.6.1.tar.gz,
and the second button should download the file: jflex-maven-1.6.1.zip
(assuming it exists).

I noticed this because I was trying to download the zip file.

Bill Fenlason

Re: [jflex-users] Problems with lexical start states in JFLEX

From: Steve R. <sa...@gm...> - 2016-02-26 14:46:41

Hi Ralph,

I’m guessing that you have rules to match & ignore whitespace in the default state, but since you don’t have one of those for ISTATUS_STATE, the space after ISTATUS blocks recognition of “ACTIVE”.

Steve

> On Feb 25, 2016, at 8:08 AM, Ralph Stommel <r.s...@co...> wrote:
> 
> Dear JFLEX-Users,
>  
> I am using JFLEX together with BYACC. It has been working perfectly in all my projects so far.
> However, in order to prevent my JFLEX scanner from recognizing a generic quoted string after having recognized a tokenISTATUS I have specified the following exclusive lexical start state scenario:
>  
> %%
>  
> %byaccj
> %ignorecase
> %xstate ISTATUS_STATE
>  
> …
>  
> ACTIVE = (active)|([\"](active)[\"])
> …
> QUOTED_STRING = ([\"][^\n\r]*(\"\")*[^\n\r]*[\"])
> %%
> …
> <ISTATUS_STATE>{ACTIVE} {yyparser.yylval = new ParserVal(yytext()); yybegin(YYINITIAL); return Parser.ACTIVE;}
> …
> {ISTATUS} {yyparser.yylval = new ParserVal(yytext()); yybegin(ISTATUS_STATE); return Parser.ISTATUS;}
> …
> {QUOTED_STRING} {yyparser.yylval = new ParserVal(yytext()); return Parser.QUOTED_STRING;}
> …
>  
> The string that is parsed looks as follows:
> … ISTATUS  “ACTIVE” …
> I.e. the quoted string “ACTIVE” is directly following the token ISTATUS.
> When debugging the lexer I can see that yybegin(ISTATUS_STATE) is set after recognizing the ISTATUS token. 
> But then the “ACTIVE” string is not recognized and the lexer terminates with zzScanError(ZZ_NO_MATCH) instead;
> Without the lexical state spec the ACTIVE token is recognized by the lexer.
>  
> Does anyone see where I am wrong in my usage scenario above or would anyone know how to make this work?
> Many thanks in advance for your help.
>  
> Ralph
>  
>  
>  
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140--
> jflex-users mailing list
> https://lists.sourceforge.net/lists/listinfo/jflex-users

[jflex-users] Problems with lexical start states in JFLEX

From: Ralph S. <r.s...@co...> - 2016-02-25 13:22:02

Dear JFLEX-Users,

I am using JFLEX together with BYACC. It has been working perfectly in all my projects so far.
However, in order to prevent my JFLEX scanner from recognizing a generic quoted string after having recognized a token ISTATUS I have specified the following exclusive lexical start state scenario:

%%

%byaccj
%ignorecase
%xstate ISTATUS_STATE

...

ACTIVE = (active)|([\"](active)[\"])
...
QUOTED_STRING = ([\"][^\n\r]*(\"\")*[^\n\r]*[\"])
%%
...
<ISTATUS_STATE>{ACTIVE} {yyparser.yylval = new ParserVal(yytext()); yybegin(YYINITIAL); return Parser.ACTIVE;}
...
{ISTATUS} {yyparser.yylval = new ParserVal(yytext()); yybegin(ISTATUS_STATE); return Parser.ISTATUS;}
...
{QUOTED_STRING} {yyparser.yylval = new ParserVal(yytext()); return Parser.QUOTED_STRING;}
...

The string that is parsed looks as follows:
... ISTATUS  "ACTIVE" ...
I.e. the quoted string "ACTIVE" is directly following the token ISTATUS.
When debugging the lexer I can see that yybegin(ISTATUS_STATE) is set after recognizing the ISTATUS token.
But then the "ACTIVE" string is not recognized and the lexer terminates with zzScanError(ZZ_NO_MATCH) instead;
Without the lexical state spec the ACTIVE token is recognized by the lexer.

Does anyone see where I am wrong in my usage scenario above or would anyone know how to make this work?
Many thanks in advance for your help.

Ralph

[jflex-users] Assistance with matching

From: <de....@io...> - 2015-11-14 22:12:51

Hello.

I'm trying to break up a file into words, "$$" and "\". Specifically, a
"word" is any non-whitespace character. An input such as:

  "a b c$$ $$ \ e \\\"

.. Would yield the tokens:

  1. a
  2. b
  3. c
  4. $$
  5. $$
  6. \
  7. e
  8. \
  9. \
  10. \

I'm having problems coming up with a pattern or set of patterns that
will achieve this, however.

The obvious definition, such as:

Word    = \P{Whitespace}+
Space   = \p{Whitespace}+
Command = \p{Alpha}+
Slash   = \\
Dollars = "$$"

%%

<YYINITIAL> {
  { Space } { /* Ignore */ }

  { Slash } {
    throw new RuntimeException("Slash");
  }

  { Dollars } {
    throw new RuntimeException("Dollars");
  }

  { Word } {
    final TokenText.Builder b = TokenText.builder();
    b.position(this.position());
    b.name(this.yytext());
    return b.build();
  }
}

... Will obviously not work, because although " $$ " and " \ "
will be matched by the Slash and Dollars patterns, an input such as
"f$$" will be matched by the Word pattern, rather than yielding two
tokens "f" and "$$".

What is the simplest way to achieve this with jflex?

M

Re: [jflex-users] Assistance with matching

From: <de....@io...> - 2015-11-14 22:06:18

On 2015-11-14T21:36:03 +0000
<de....@io...> wrote:

> I'm trying to break up a file into words, "$$" and "\". Specifically, a
> "word" is any non-whitespace character. An input such as:

Sorry, that should have read: A "word" is any sequence of one or more
non-whitespace characters.

M

Re: [jflex-users] Problems using the "at the beginning of line" char

From: Gerwin K. <Ger...@ni...> - 2015-03-01 22:17:11

That’s right.

If it can be both at the beginning of the line or not, you could just define

    comment = ;[a-zA-Z0-9]+

Cheers,
Gerwin

On 02.03.2015, at 02:39, master <ma...@lu...<mailto:ma...@lu...>> wrote:

Am 01.03.2015 um 15:22 schrieb master:
Hi,
I'am new to JFlex and started with a simple scanner for assembler text files.
A comment line in this assembler starts with ';' and kann appear at the beginning of a line, or somewhere after an expression.
In the macro section I defined a macro for comments as follows:


comment = ^;([a-zA-Z0-9]+) | (;[a-zA-Z0-9]+)


But running jFlex always stated:

Syntax error.
comment = ^;([a-zA-Z0-9]+) | (;[a-zA-Z0-9]+)
         ^

where he points to the '^' sign.

Why can I not use the legal '^' sign here for referencing the beginning of a line?

Best regards









------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/



--
jflex-users mailing list
https://lists.sourceforge.net/lists/listinfo/jflex-users


Hi,

after reading the manual again, I found the reason in  chapter 4.2.11 Macrodefinition:
 ... must not contain the ^, / or $ operators.
):

Best regards

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/--
jflex-users mailing list
https://lists.sourceforge.net/lists/listinfo/jflex-users


________________________________

The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.

Re: [jflex-users] Problems using the "at the beginning of line" char

From: master <ma...@lu...> - 2015-03-01 15:41:13

Am 01.03.2015 um 15:22 schrieb master:
> Hi,
> I'am new to JFlex and started with a simple scanner for assembler text 
> files.
> A comment line in this assembler starts with ';' and kann appear at 
> the beginning of a line, or somewhere after an expression.
> In the macro section I defined a macro for comments as follows:
>
> comment = ^;([a-zA-Z0-9]+) | (;[a-zA-Z0-9]+)
>
> But running jFlex always stated:
>
> Konsole output
> Syntax error.
> comment = ^;([a-zA-Z0-9]+) | (;[a-zA-Z0-9]+)
>          ^
>
> where he points to the '^' sign.
>
> Why can I not use the legal '^' sign here for referencing the 
> beginning of a line?
>
> Best regards
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
>
>
> --
> jflex-users mailing list
> https://lists.sourceforge.net/lists/listinfo/jflex-users
Hi,

after reading the manual again, I found the reason in  chapter 4.2.11 
Macrodefinition:
  ... must not contain the ^, / or $ operators.
):

Best regards

12 messages has been excluded from this view by a project administrator.

Flat | Threaded

1 2 3 .. 20 > >> (Page 1 of 20)