jflex-users Mailing List for JFlex (Page 5)
The fast lexer generator for Java
Brought to you by:
lsf37,
steve_rowe
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(5) |
Sep
(1) |
Oct
(5) |
Nov
|
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(3) |
Feb
(12) |
Mar
(14) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
(3) |
Dec
(6) |
2003 |
Jan
(8) |
Feb
(5) |
Mar
(7) |
Apr
(2) |
May
(5) |
Jun
|
Jul
(5) |
Aug
(4) |
Sep
(7) |
Oct
|
Nov
(21) |
Dec
(7) |
2004 |
Jan
(6) |
Feb
(5) |
Mar
|
Apr
(1) |
May
(10) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
(4) |
Oct
|
Nov
(2) |
Dec
(2) |
2005 |
Jan
(13) |
Feb
(2) |
Mar
(6) |
Apr
(4) |
May
(2) |
Jun
|
Jul
(4) |
Aug
(12) |
Sep
(3) |
Oct
(6) |
Nov
(1) |
Dec
|
2006 |
Jan
(7) |
Feb
(3) |
Mar
(11) |
Apr
(5) |
May
(1) |
Jun
(2) |
Jul
(2) |
Aug
|
Sep
(13) |
Oct
|
Nov
(3) |
Dec
(6) |
2007 |
Jan
(1) |
Feb
(4) |
Mar
(2) |
Apr
|
May
(4) |
Jun
(11) |
Jul
(2) |
Aug
(4) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
2008 |
Jan
(1) |
Feb
(4) |
Mar
(7) |
Apr
|
May
(8) |
Jun
(1) |
Jul
(2) |
Aug
(4) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
2009 |
Jan
(3) |
Feb
(10) |
Mar
(6) |
Apr
|
May
(6) |
Jun
(8) |
Jul
(7) |
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
(4) |
2010 |
Jan
|
Feb
|
Mar
|
Apr
(15) |
May
|
Jun
(7) |
Jul
|
Aug
(5) |
Sep
|
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
(7) |
May
(2) |
Jun
|
Jul
(2) |
Aug
(4) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(2) |
Jul
|
Aug
(6) |
Sep
|
Oct
|
Nov
(3) |
Dec
|
2014 |
Jan
(8) |
Feb
(3) |
Mar
(5) |
Apr
|
May
(7) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(4) |
Dec
|
2015 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
|
2016 |
Jan
(1) |
Feb
(3) |
Mar
(3) |
Apr
(2) |
May
(7) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
From: Elli A. <el...@su...> - 2010-04-28 00:48:25
|
Hi, I am writing a lexer specification that looks for some tokens in text. I also need to get the text between the tokens. Currently, what I see in the examples is something like this: <YYINITIAL> . { out.append(yytext()); } The problem with this is that it handles every character separately, and I would like to get all of them at once. Is it possible to get the unmatched characters in the same way a sax parser returns characters, something like: (char[] ch, int start, int length) In this case my code receives a character array with indexes to the start and end. Those are currently private variables (zz prefix) so I don't want to use them if there is a better alternative. regexp such as .* will probably match the tokens as well. Thanks for any help! |
From: Steve R. <sa...@od...> - 2010-04-21 04:04:10
|
Hi Romildo, Looks great! One small thing I noticed: in the following line, COMMENT is highlit while the other states are not; the states in the next three lines are not highlit either: %state COMMENT, STATELIST, MACROS, REGEXPSTART I think these should be consistent: either all highlit, or none. But this is minor. Thanks for working on this. Steve José Romildo Malaquias wrote: > On Mon, Apr 19, 2010 at 05:08:45PM -0300, José Romildo Malaquias wrote: >> I have just submitted a patch in Pygments track system: >> >> http://dev.pocoo.org/projects/pygments/ticket/495 >> >> Maybe you want to take a look at it. There are two formatted examples >> there, taken from the examples subdirectory of the JFlex distribution. >> >> The lexer is not fully correct, as in my mind there were some open >> issues regarding JFlex input file syntax. Later on I will submit a new, >> more correct patch. > > I have updated the patch for pygments to highlight JFlex input files. > > Attached is the resulting LexScan.flex file (the JFlex lexical > specification of the JFlex input language) highlighted by pygments using > the patch. > > What do you think about the result? Any sugestions are welcome. > > Romildo > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > > > ------------------------------------------------------------------------ > > -- > jflex-users mailing list > https://lists.sourceforge.net/lists/listinfo/jflex-users |
From: José R. M. <j.r...@gm...> - 2010-04-21 01:21:01
|
On Mon, Apr 19, 2010 at 05:08:45PM -0300, José Romildo Malaquias wrote: > I have just submitted a patch in Pygments track system: > > http://dev.pocoo.org/projects/pygments/ticket/495 > > Maybe you want to take a look at it. There are two formatted examples > there, taken from the examples subdirectory of the JFlex distribution. > > The lexer is not fully correct, as in my mind there were some open > issues regarding JFlex input file syntax. Later on I will submit a new, > more correct patch. I have updated the patch for pygments to highlight JFlex input files. Attached is the resulting LexScan.flex file (the JFlex lexical specification of the JFlex input language) highlighted by pygments using the patch. What do you think about the result? Any sugestions are welcome. Romildo |
From: Steve R. <sa...@od...> - 2010-04-19 23:21:44
|
José Romildo Malaquias wrote: > On Mon, Apr 19, 2010 at 12:25:08PM -0400, Steve Rowe wrote: > [...] >> > 3. Can a JFlex comment appear anywhere a space is allowed? >> >> Not everywhere; for your specific questions, though: >> >> > For instance, are the following allowed? >> > >> > %% // end of section 1 >> > >> > %{ // the following will be inserted into the generated class >> >> Yes, both of the above are allowed. After '%%' and '%{' all text is >> ignored until end of line - no comment syntax is required. So these are >> allowed too: >> >> %% end of section 1 >> >> %{ the following will be inserted into the generated class > > I suppose this applies to every option of the form > > %[a-z]{ > %[a-z]} > > like %eofval{ and %eofval}. Right? Yes. Starting at line 274 of LexScan.flex: <MACROS> ("%{"|"%init{"|"%initthrow{"|"%eof{"|"%eofthrow{"|"%yylexthrow{"|"%eofval{").*{NL} { string.setLength(0); yybegin(COPY); } <COPY> { "%}".*{NL} { classCode = conc(classCode,string); yybegin(MACROS); } "%init}".*{NL} { initCode = conc(initCode,string); yybegin(MACROS); } "%initthrow}".*{NL} { initThrow = concExc(initThrow,string); yybegin(MACROS); } "%eof}".*{NL} { eofCode = conc(eofCode,string); yybegin(MACROS); } "%eofthrow}".*{NL} { eofThrow = concExc(eofThrow,string); yybegin(MACROS); } "%yylexthrow}".*{NL} { lexThrow = concExc(lexThrow,string); yybegin(MACROS); } "%eofval}".*{NL} { eofVal = string.toString(); yybegin(MACROS); } .*{NL} { string.append(yytext()); } Steve |
From: Steve R. <sa...@od...> - 2010-04-19 23:16:13
|
Hi Romildo, José Romildo Malaquias wrote: > On Mon, Apr 19, 2010 at 12:25:08PM -0400, Steve Rowe wrote: > [...] >> > 2. In macro definitions, how does one know where one regular >> > expression finishes? >> >> Within the macro section of a spec, newlines terminate regular >> expressions - from the above-linked LexScan.flex: >> >> 500 {NL} { if (macroDefinition) { yybegin(MACROS); } >> return symbol(REGEXPEND); } > > Then there is an error in examples/java/java.flex in the distribution: > > /* comments */ > Comment = {TraditionalComment} | {EndOfLineComment} | > {DocumentationComment} > > Here the macro definition is not supposed to be in a unique line. I was wrong; starting at line 464 in LexScan.flex: 464 {WSPNL}*"|"{WSP}*$ { if (macroDefinition) { 465 yybegin(EATWSPNL); 466 return symbol(BAR); 467 } That is, when '|' occurs at end-of-line within a macro definition, the newline is ignored. Steve |
From: Steve R. <sa...@od...> - 2010-04-19 23:10:47
|
Hi Romildo, José Romildo Malaquias wrote: > Taking a quick look at LexScan.flex, it seems that any character, except > newline, is accepted inside a character class. > > The manual, on section 4.3.1, excludes 21 meta characters (the Character > non terminal in the grammar). Inside character classes, ']', '^' and '-' are all meta characters (not accepted as regular class members). You're right about the others, though, AFAICT. > 1) Why is [ and ] balanced inside a character class? I think it's to allow for the possibility of having nested character classes (though I don't understand exactly how that works in the CUP parser). > 2) What is the meaning of a macro inside a characater class? When a macro is encountered inside a character class, an error is thrown with the message "Macros in character classes are not supported." > 3) What is the meaning of a string (sequence of characters between > double quotes) inside a character class? Strings inside character classes are treated as if each character in the string were directly included (without the double quotes) in the character class. Steve |
From: José R. M. <j.r...@gm...> - 2010-04-19 22:05:11
|
On Mon, Apr 19, 2010 at 12:25:08PM -0400, Steve Rowe wrote: [...] > > > 3. Can a JFlex comment appear anywhere a space is allowed? > > Not everywhere; for your specific questions, though: > > > For instance, are the following allowed? > > > > %% // end of section 1 > > > > %{ // the following will be inserted into the generated class > > Yes, both of the above are allowed. After '%%' and '%{' all text is > ignored until end of line - no comment syntax is required. So these are > allowed too: > > %% end of section 1 > > %{ the following will be inserted into the generated class I suppose this applies to every option of the form %[a-z]{ %[a-z]} like %eofval{ and %eofval}. Right? Romildo |
From: José R. M. <j.r...@gm...> - 2010-04-19 21:58:10
|
On Mon, Apr 19, 2010 at 12:25:08PM -0400, Steve Rowe wrote: [...] > > 2. In macro definitions, how does one know where one regular > > expression finishes? > > Within the macro section of a spec, newlines terminate regular > expressions - from the above-linked LexScan.flex: > > 500 {NL} { if (macroDefinition) { yybegin(MACROS); } > return symbol(REGEXPEND); } Then there is an error in examples/java/java.flex in the distribution: /* comments */ Comment = {TraditionalComment} | {EndOfLineComment} | {DocumentationComment} Here the macro definition is not supposed to be in a unique line. Romildo |
From: José R. M. <j.r...@gm...> - 2010-04-19 21:21:08
|
I have another question. Taking a quick look at LexScan.flex, it seems that any character, except newline, is accepted inside a character class. The manual, on section 4.3.1, excludes 21 meta characters (the Character non terminal in the grammar). Some questions: 1) Why is [ and ] balanced inside a character class? 2) What is the meaning of a macro inside a characater class? 3) What is the meaning of a string (sequence of characters between double quotes) inside a character class? Romildo |
From: José R. M. <j.r...@gm...> - 2010-04-19 20:06:17
|
Hi Steve. Thanks for the answers. I have just submitted a patch in Pygments track system: http://dev.pocoo.org/projects/pygments/ticket/495 Maybe you want to take a look at it. There are two formatted examples there, taken from the examples subdirectory of the JFlex distribution. The lexer is not fully correct, as in my mind there were some open issues regarding JFlex input file syntax. Later on I will submit a new, more correct patch. Romildo |
From: Steve R. <sa...@od...> - 2010-04-19 16:25:19
|
Hi Romildo, You can see the latest version (for as yet unreleased JFlex version 1.5) of the JFlex scanner spec here: http://jflex.svn.sourceforge.net/viewvc/jflex/trunk/jflex/src/main/jflex/LexScan.flex?revision=586&view=markup I recall that ViewVC (the Subversion & CVS repository browser used by SourceForge) switched to Pygments recently for its syntax highlighting. It would be very nice if the above link resulted in a syntax-highlighted display :). The latest version of the CUP parser for JFlex specs is here: http://jflex.svn.sourceforge.net/viewvc/jflex/trunk/jflex/src/main/cup/LexParse.cup?revision=585&view=markup José Romildo Malaquias wrote: > 1. The token '|' used as a regex operator for union, and as an action > makes the grammar for the syntax of lexical rules (session 4.3.1 of > the manual) ambiguous. From the above-linked grammar spec (LexScan.flex): 460 <REGEXP> { ... 464 {WSPNL}*"|"{WSP}*$ { if (macroDefinition) { ... 477 {WSPNL}*"|" { return symbol(BAR); } ... A '|'-action (line #464) must be followed by optional whitespace and EOL; other uses are interpreted as unions. In the manual at the end of section 4.3.2, the interpretation and utility of the '|'-action is spelled out more fully. > 2. In macro definitions, how does one know where one regular > expression finishes? Within the macro section of a spec, newlines terminate regular expressions - from the above-linked LexScan.flex: 500 {NL} { if (macroDefinition) { yybegin(MACROS); } return symbol(REGEXPEND); } > 3. Can a JFlex comment appear anywhere a space is allowed? Not everywhere; for your specific questions, though: > For instance, are the following allowed? > > %% // end of section 1 > > %{ // the following will be inserted into the generated class Yes, both of the above are allowed. After '%%' and '%{' all text is ignored until end of line - no comment syntax is required. So these are allowed too: %% end of section 1 %{ the following will be inserted into the generated class > macro1 /* this is a macro */ = /* definition */ regex Yes, both of the above comments (before and after the '=') are allowed. See line 401 (before the '=') and line 502 (after the '=') in LexScan.flex. > < /*comment1*/ YYINITIAL, /*comment2*/ STR > re { action1 } Comments in lexical state lists are not allowed. See the <STATES> section in LexScan.flex, starting at line 448. > (a /*comment3*/ b | c) d { action2 } > > regex1 / /*comment3*/ regex2 { action3 } Yes, these are both allowed. See line 502 in LexScan.flex. Steve |
From: José R. M. <j.r...@gm...> - 2010-04-19 13:37:37
|
When writing a lexer for JFlex input files for use with Pygments (http://pygments.org), a generic syntax highlighter, some questions arose. 1. The token '|' used as a regex operator for union, and as an action makes the grammar for the syntax of lexical rules (session 4.3.1 of the manual) ambiguous. For instance, how should the following lexical rules be interpreted? re1 | re2 { action } a) as one lexical rule: (re1|re2) { action } b) as two lexical rules with the action '|' for the first one. 2. In macro definitions, how does one know where one regular expression finishes? For instance, the following snippet in the second session of the specification a = re1 b = re2 could mean one single macro definition a = (re1 b = re2) as well as two macro definitoins: 'a' and 'b' 3. Can a JFlex comment appear anywhere a space is allowed? For instance, are the following allowed? %% // end of section 1 %{ // the following will be inserted into the generated class some java code goes here %} macro1 /* this is a macro */ = /* definition */ regex %% < /*comment1*/ YYINITIAL, /*comment2*/ STR > re { action1 } (a /*comment3*/ b | c) d { action2 } regex1 / /*comment3*/ regex2 { action3 } Romildo |
From: Denis W. <ddw...@gm...> - 2009-12-29 23:35:48
|
Thanks, Stowe. It was really helpful. Best Regards, Denis Weerasiri On Mon, Dec 28, 2009 at 8:41 PM, Steve Rowe <sa...@od...> wrote: > Hi Denis, > > You appear to be directly using the syntax from the XML specification, > and this won't work. The first issue I can see is the use of '-' as a > regular language set subtraction operator - JFlex does not support this > syntax. Check out the documentation, and look for the '!' operator and > a description of using it to do something similar. > > The Symbol class's toString() method is: > > public int sym; > [...] > public String toString() { return "#"+sym; } > > and it sounds like you want to print out the Symbol name rather than the > integer code for it, but AFAICT, there is no out-of-the-box way to do > this. Maybe you could subclass Symbol and add the int->String mappings > there, along with a toString() method that references the mappings? > > When you create symbols, I think you should be using the constructor > that carries a value, and pass in the matched text, when you're > interested in the text. > > I again encourage you to take a look at the IntelliJ IDEA xml grammar I > sent the link for in my last email - it's Apache2 licensed, so you are > free to use it for anything you want. > > Steve > > Denis Weerasiri wrote: > > My Flex file is attached here. > > > > My required output is something like as follows. for the test code I've > > mentioned in previous mail. > > > > stringtype : ANYCONTENTS > > id : VALUE > > h : PREFIX > > NAME : VALUE > > > > Best Regards, > > Denis > > > > On Mon, Dec 28, 2009 at 7:16 PM, Steve Rowe <sa...@od... > > <mailto:sa...@od...>> wrote: > > > > Hi Denis, > > > > For some reason your grammar file attachment didn't come through - > > can you resend it inline? > > > > It's not clear to me exactly what the problem is - can you show what > > you *want* the output to be? > > > > You may find it useful to see other implementations of JFlex XML > > lexers - here's the one used by IntelliJ IDEA's Community Edition: > > > > < > http://git.jetbrains.org/?p=idea/community.git;a=blob_plain;f=xml/impl/src/com/intellij/lexer/_XmlLexer.flex > > > > > > Steve > > > > > > Denis Weerasiri wrote: > > > > Hi all, > > I wrote a .flex to tokenize an XML document. Tokens will be like > > elements, attributes, values etc. in the XML document. My .flex > > is attached here. > > I used the following code test the input. I would be happy if > > anyone give me a clue on how to resolve this issue. > > Best Regards, > > Denis. > > > > String text = "*<stringtype id=\"h:NAME\">\n" + > > " </stringtype>*"; > > try { > > InputStream is = new > > ByteArrayInputStream(text.getBytes("UTF-8")); > > Scanner sc = new Scanner(is); > > try { > > for(int i=0; i <45; i++) > > System.out.println(sc.yytext() + ":" + > > sc.next_token()); > > }catch (Exception ex) { > > ex.printStackTrace(); > > } > > } catch (UnsupportedEncodingException e) { > > e.printStackTrace(); > > } catch (java.io.IOException e) { > > e.printStackTrace(); //To change body of catch > > statement use File | Settings | File Templates. > > } > > > > The output is always like (I wanna show that tokenizing happens > > at character level.) > > > > :#106 > > <:#104 > > s:#104 > > t:#104 > > r:#104 > > i:#104 > > n:#104 > > g:#104 > > t:#104 > > y:#104 > > p:#104 > > e:#104 > > i:#104 > > d:#106 > > =:#106 > > ":#104 > > h:#106 > > ::#104 > > N:#104 > > A:#104 > > M:#104 > > E:#106 > > ":#106 > > >:#106 > > <:#106 > > /:#104 > > s:#104 > > t:#104 > > r:#104 > > i:#104 > > n:#104 > > g:#104 > > t:#104 > > y:#104 > > p:#104 > > e:#106 > > > > > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and > easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev > -- > jflex-users mailing list > https://lists.sourceforge.net/lists/listinfo/jflex-users > |
From: Steve R. <sa...@od...> - 2009-12-28 15:26:52
|
Hi Denis, You appear to be directly using the syntax from the XML specification, and this won't work. The first issue I can see is the use of '-' as a regular language set subtraction operator - JFlex does not support this syntax. Check out the documentation, and look for the '!' operator and a description of using it to do something similar. The Symbol class's toString() method is: public int sym; [...] public String toString() { return "#"+sym; } and it sounds like you want to print out the Symbol name rather than the integer code for it, but AFAICT, there is no out-of-the-box way to do this. Maybe you could subclass Symbol and add the int->String mappings there, along with a toString() method that references the mappings? When you create symbols, I think you should be using the constructor that carries a value, and pass in the matched text, when you're interested in the text. I again encourage you to take a look at the IntelliJ IDEA xml grammar I sent the link for in my last email - it's Apache2 licensed, so you are free to use it for anything you want. Steve Denis Weerasiri wrote: > My Flex file is attached here. > > My required output is something like as follows. for the test code I've > mentioned in previous mail. > > stringtype : ANYCONTENTS > id : VALUE > h : PREFIX > NAME : VALUE > > Best Regards, > Denis > > On Mon, Dec 28, 2009 at 7:16 PM, Steve Rowe <sa...@od... > <mailto:sa...@od...>> wrote: > > Hi Denis, > > For some reason your grammar file attachment didn't come through - > can you resend it inline? > > It's not clear to me exactly what the problem is - can you show what > you *want* the output to be? > > You may find it useful to see other implementations of JFlex XML > lexers - here's the one used by IntelliJ IDEA's Community Edition: > > <http://git.jetbrains.org/?p=idea/community.git;a=blob_plain;f=xml/impl/src/com/intellij/lexer/_XmlLexer.flex> > > Steve > > > Denis Weerasiri wrote: > > Hi all, > I wrote a .flex to tokenize an XML document. Tokens will be like > elements, attributes, values etc. in the XML document. My .flex > is attached here. > I used the following code test the input. I would be happy if > anyone give me a clue on how to resolve this issue. > Best Regards, > Denis. > > String text = "*<stringtype id=\"h:NAME\">\n" + > " </stringtype>*"; > try { > InputStream is = new > ByteArrayInputStream(text.getBytes("UTF-8")); > Scanner sc = new Scanner(is); > try { > for(int i=0; i <45; i++) > System.out.println(sc.yytext() + ":" + > sc.next_token()); > }catch (Exception ex) { > ex.printStackTrace(); > } > } catch (UnsupportedEncodingException e) { > e.printStackTrace(); > } catch (java.io.IOException e) { > e.printStackTrace(); //To change body of catch > statement use File | Settings | File Templates. > } > > The output is always like (I wanna show that tokenizing happens > at character level.) > > :#106 > <:#104 > s:#104 > t:#104 > r:#104 > i:#104 > n:#104 > g:#104 > t:#104 > y:#104 > p:#104 > e:#104 > i:#104 > d:#106 > =:#106 > ":#104 > h:#106 > ::#104 > N:#104 > A:#104 > M:#104 > E:#106 > ":#106 > >:#106 > <:#106 > /:#104 > s:#104 > t:#104 > r:#104 > i:#104 > n:#104 > g:#104 > t:#104 > y:#104 > p:#104 > e:#106 |
From: Denis W. <ddw...@gm...> - 2009-12-28 07:19:15
|
Hi all, I wrote a .flex to tokenize an XML document. Tokens will be like elements, attributes, values etc. in the XML document. My .flex is attached here. I used the following code test the input. I would be happy if anyone give me a clue on how to resolve this issue. Best Regards, Denis. String text = "*<stringtype id=\"h:NAME\">\n" + " </stringtype>*"; try { InputStream is = new ByteArrayInputStream(text.getBytes("UTF-8")); Scanner sc = new Scanner(is); try { for(int i=0; i <45; i++) System.out.println(sc.yytext() + ":" + sc.next_token()); }catch (Exception ex) { ex.printStackTrace(); } } catch (UnsupportedEncodingException e) { e.printStackTrace(); } catch (java.io.IOException e) { e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates. } The output is always like (I wanna show that tokenizing happens at character level.) :#106 <:#104 s:#104 t:#104 r:#104 i:#104 n:#104 g:#104 t:#104 y:#104 p:#104 e:#104 i:#104 d:#106 =:#106 ":#104 h:#106 ::#104 N:#104 A:#104 M:#104 E:#106 ":#106 >:#106 <:#106 /:#104 s:#104 t:#104 r:#104 i:#104 n:#104 g:#104 t:#104 y:#104 p:#104 e:#106 |
From: Denis W. <ddw...@gm...> - 2009-11-29 18:50:45
|
And another simple question. Does JFlex support for the following kind of variable character length unicodes? CHAR = \u9 | \uA | \uD | [\u20-\uD7FF] | [\uE000-\uFFFD] | [\u10000-\u10FFFF] Cherrs, Dhananjaya 2009/11/29 Peter L. Bird <pb...@co...> > You need to escape your quote characters. Try this: > > VALUE = '\"' ([^<&\"] | [0-9])* '\"' | "\'" ([^<&\'] | [0-9])* "\'" > > personally, i find the following more readable: > > VALUE = ([\"] ([^<&\"] | [0-9])* [\"]) | ([\'] ([^<&\'] | [0-9])* [\']) > > > Unless i'm mistaken, your inner patterns are redundant. That is, the > numeric characters*[0-9]*are part of the set specified in*[^<&\"] *( or [^<&\']). > The rewrite would look like: > > VALUE = ('\"' [^<&\"]* '\"') | ("\'" [^<&\']* "\'") > > cheers, > > peter bird > > > At 09:21 AM 11/29/2009, Denis Weerasiri wrote: > > Hi folks, > I'm new to JFlex and regex. > I get a syntax error when generate using .flex file. My .flex is sm thing > like as follows. > > %% > VALUE = '"' ([^<&"] | [0-9])* '"' | "'" ([^<&'] | [0-9])* "'" > %% > {VALUE} { return symbol(sym.VALUE); } > > > I get an error message as follows. > > Reading > "/home/hdd2/Acedemic/L4-S1/CS4410-CompTheory/project/Tokenizer.flex" > > Error in file > "/home/hdd2/Acedemic/L4-S1/CS4410-CompTheory/project/Tokenizer.flex" (line > 54): > Syntax error. > > VALUE = '"' ([^<&"] | [0-9])* '"' | "'" ([^<&'] | [0-9])* "'" > ^ > > Error in file > "/home/hdd2/Acedemic/L4-S1/CS4410-CompTheory/project/Tokenizer.flex" (line > 54): > Unterminated string at end of line. > > VALUE = '"' ([^<&"] | [0-9])* '"' | "'" ([^<&'] | [0-9])* "'" > ^ > 2 errors, 0 warnings. > > Generation aborted. > > Can anyone give me a clue to solve this? > > Best, > Dhananjaya. > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > -- > jflex-users mailing list > https://lists.sourceforge.net/lists/listinfo/jflex-users > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.426 / Virus Database: 270.14.87/2534 - Release Date: 11/29/09 > 07:49:00 > > |
From: Peter L. B. <pb...@co...> - 2009-11-29 17:48:26
|
You need to escape your quote characters. Try this: VALUE = '\"' ([^<&\"] | [0-9])* '\"' | "\'" ([^<&\'] | [0-9])* "\'" personally, i find the following more readable: VALUE = ([\"] ([^<&\"] | [0-9])* [\"]) | ([\'] ([^<&\'] | [0-9])* [\']) Unless i'm mistaken, your inner patterns are redundant. That is, the numeric characters[0-9]are part of the set specified in[^<&\"] (or [^<&\']). The rewrite would look like: VALUE = ('\"' [^<&\"]* '\"') | ("\'" [^<&\']* "\'") cheers, peter bird At 09:21 AM 11/29/2009, Denis Weerasiri wrote: >Hi folks, >I'm new to JFlex and regex. >I get a syntax error when generate using .flex file. My .flex is sm >thing like as follows. > >%% >VALUE = '"' ([^<&"] | [0-9])* '"' | "'" ([^<&'] | [0-9])* "'" >%% >{VALUE} { return symbol(sym.VALUE); } > > >I get an error message as follows. > >Reading "/home/hdd2/Acedemic/L4-S1/CS4410-CompTheory/project/Tokenizer.flex" > >Error in file >"/home/hdd2/Acedemic/L4-S1/CS4410-CompTheory/project/Tokenizer.flex" >(line 54): >Syntax error. > >VALUE = '"' ([^<&"] | [0-9])* '"' | "'" ([^<&'] | [0-9])* "'" > ^ > >Error in file >"/home/hdd2/Acedemic/L4-S1/CS4410-CompTheory/project/Tokenizer.flex" >(line 54): >Unterminated string at end of line. > >VALUE = '"' ([^<&"] | [0-9])* '"' | "'" ([^<&'] | [0-9])* "'" > ^ >2 errors, 0 warnings. > >Generation aborted. > >Can anyone give me a clue to solve this? > >Best, >Dhananjaya. > >------------------------------------------------------------------------------ >Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day >trial. Simplify your report design, integration and deployment - and focus on >what you do best, core application coding. Discover what's new with >Crystal Reports now. http://p.sf.net/sfu/bobj-july >-- >jflex-users mailing list >https://lists.sourceforge.net/lists/listinfo/jflex-users > > >No virus found in this incoming message. >Checked by AVG - www.avg.com >Version: 8.5.426 / Virus Database: 270.14.87/2534 - Release Date: >11/29/09 07:49:00 |
From: Denis W. <ddw...@gm...> - 2009-11-29 14:21:47
|
Hi folks, I'm new to JFlex and regex. I get a syntax error when generate using .flex file. My .flex is sm thing like as follows. %% VALUE = '"' ([^<&"] | [0-9])* '"' | "'" ([^<&'] | [0-9])* "'" %% {VALUE} { return symbol(sym.VALUE); } I get an error message as follows. Reading "/home/hdd2/Acedemic/L4-S1/CS4410-CompTheory/project/Tokenizer.flex" Error in file "/home/hdd2/Acedemic/L4-S1/CS4410-CompTheory/project/Tokenizer.flex" (line 54): Syntax error. VALUE = '"' ([^<&"] | [0-9])* '"' | "'" ([^<&'] | [0-9])* "'" ^ Error in file "/home/hdd2/Acedemic/L4-S1/CS4410-CompTheory/project/Tokenizer.flex" (line 54): Unterminated string at end of line. VALUE = '"' ([^<&"] | [0-9])* '"' | "'" ([^<&'] | [0-9])* "'" ^ 2 errors, 0 warnings. Generation aborted. Can anyone give me a clue to solve this? Best, Dhananjaya. |
From: Ulf D. <udi...@ya...> - 2009-07-18 05:05:43
|
Thanks so much, that did the trick. Cheers, Ulf --- On Fri, 7/17/09, Gerwin Klein <ger...@ni...> wrote: > From: Gerwin Klein <ger...@ni...> > Subject: Re: [jflex-users] problem handling the character hex FF > To: "Ulf Dittmer" <udi...@ya...> > Cc: jfl...@li... > Date: Friday, July 17, 2009, 7:41 PM > Hi Ulf, > > you need to use %16bit or %unicode instead of %8bit. > > Java represents characters internally as unicode and 0xFF > is apparently mapped to 711 in the character encoding that > is used. > > Cheers, > Gerwin |
From: Gerwin K. <ger...@ni...> - 2009-07-17 23:58:34
|
Hi Ulf, you need to use %16bit or %unicode instead of %8bit. Java represents characters internally as unicode and 0xFF is apparently mapped to 711 in the character encoding that is used. Cheers, Gerwin Ulf Dittmer wrote: > Hello- > > I'm trying to handle a file that contains the character hex FF, but that results in an exception: > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 711 > at JFlex_problem.yylex(JFlex_problem.java:503) > at JFlex_problem.main(JFlex_problem.java:237) > > 711 is the value of the zzInput variable, which is used as an index into an array of length 256. > > I thought declaring the parser as "8bit" should be sufficient to handle just about everything; am I missing something? I'm appending a minimal jflex file that exhibits the problem below. I'd be grateful on any clues on how to handle that character. > > Thanks in advance, > Ulf > > > > import java.io.*; > > %% > > %public > %class JFlex_problem > %8bit > %int > %apiprivate > > %{ > public JFlex_problem() { } > > public static void main (String args[]) throws Exception { > if (args == null) { > System.err.println("usage: java JFlex_problem file"); > System.exit(1); > } > JFlex_problem self = new JFlex_problem(); > self.zzReader = new FileReader(args[0]); > while ( !self.zzAtEOF ) self.yylex(); > } > > %} > > %% > > <YYINITIAL> { > . { > System.out.println("read char: "+(int)yytext().charAt(0)); > } > } > > .|\n { > /* ignore every character that's not recognizably part of a command */ > System.out.println("ignoring char: "+(int)yytext().charAt(0)); > } > > > > > > ------------------------------------------------------------------------------ > Enter the BlackBerry Developer Challenge > This is your chance to win up to $100,000 in prizes! For a limited time, > vendors submitting new applications to BlackBerry App World(TM) will have > the opportunity to enter the BlackBerry Developer Challenge. See full prize > details at: http://p.sf.net/sfu/Challenge > -- > jflex-users mailing list > https://lists.sourceforge.net/lists/listinfo/jflex-users |
From: Ulf D. <udi...@ya...> - 2009-07-17 19:00:11
|
Hello- I'm trying to handle a file that contains the character hex FF, but that results in an exception: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 711 at JFlex_problem.yylex(JFlex_problem.java:503) at JFlex_problem.main(JFlex_problem.java:237) 711 is the value of the zzInput variable, which is used as an index into an array of length 256. I thought declaring the parser as "8bit" should be sufficient to handle just about everything; am I missing something? I'm appending a minimal jflex file that exhibits the problem below. I'd be grateful on any clues on how to handle that character. Thanks in advance, Ulf import java.io.*; %% %public %class JFlex_problem %8bit %int %apiprivate %{ public JFlex_problem() { } public static void main (String args[]) throws Exception { if (args == null) { System.err.println("usage: java JFlex_problem file"); System.exit(1); } JFlex_problem self = new JFlex_problem(); self.zzReader = new FileReader(args[0]); while ( !self.zzAtEOF ) self.yylex(); } %} %% <YYINITIAL> { . { System.out.println("read char: "+(int)yytext().charAt(0)); } } .|\n { /* ignore every character that's not recognizably part of a command */ System.out.println("ignoring char: "+(int)yytext().charAt(0)); } |
From: Yuval O. <yu...@bl...> - 2009-07-15 04:41:27
|
Hello, I recently created a parser to extract domain names from host names, by converting the public suffix list (http://publicsuffix.org) into a JFlex file of about 4,000 rules. There are so many rules, because each country has slightly different conventions (xxx.com vs. xxx.co.uk, etc). Running JFlex on that file never finished, so I ended up having to partition it into multiple JFlex files. It worked out in this case, as I was able to hash on the last few characters of the domain name and pick the appropriate parser. Is there a better general way to deal with that many rules, or is that a limitation we just have to accept? Thanks, Yuval |