jflex-users Mailing List for JFlex (Page 4)
The fast lexer generator for Java
Brought to you by:
lsf37,
steve_rowe
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(5) |
Sep
(1) |
Oct
(5) |
Nov
|
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(3) |
Feb
(12) |
Mar
(14) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
(3) |
Dec
(6) |
2003 |
Jan
(8) |
Feb
(5) |
Mar
(7) |
Apr
(2) |
May
(5) |
Jun
|
Jul
(5) |
Aug
(4) |
Sep
(7) |
Oct
|
Nov
(21) |
Dec
(7) |
2004 |
Jan
(6) |
Feb
(5) |
Mar
|
Apr
(1) |
May
(10) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
(4) |
Oct
|
Nov
(2) |
Dec
(2) |
2005 |
Jan
(13) |
Feb
(2) |
Mar
(6) |
Apr
(4) |
May
(2) |
Jun
|
Jul
(4) |
Aug
(12) |
Sep
(3) |
Oct
(6) |
Nov
(1) |
Dec
|
2006 |
Jan
(7) |
Feb
(3) |
Mar
(11) |
Apr
(5) |
May
(1) |
Jun
(2) |
Jul
(2) |
Aug
|
Sep
(13) |
Oct
|
Nov
(3) |
Dec
(6) |
2007 |
Jan
(1) |
Feb
(4) |
Mar
(2) |
Apr
|
May
(4) |
Jun
(11) |
Jul
(2) |
Aug
(4) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
2008 |
Jan
(1) |
Feb
(4) |
Mar
(7) |
Apr
|
May
(8) |
Jun
(1) |
Jul
(2) |
Aug
(4) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
2009 |
Jan
(3) |
Feb
(10) |
Mar
(6) |
Apr
|
May
(6) |
Jun
(8) |
Jul
(7) |
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
(4) |
2010 |
Jan
|
Feb
|
Mar
|
Apr
(15) |
May
|
Jun
(7) |
Jul
|
Aug
(5) |
Sep
|
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
(7) |
May
(2) |
Jun
|
Jul
(2) |
Aug
(4) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(2) |
Jul
|
Aug
(6) |
Sep
|
Oct
|
Nov
(3) |
Dec
|
2014 |
Jan
(8) |
Feb
(3) |
Mar
(5) |
Apr
|
May
(7) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(4) |
Dec
|
2015 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
|
2016 |
Jan
(1) |
Feb
(3) |
Mar
(3) |
Apr
(2) |
May
(7) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(4) |
Nov
|
Dec
(1) |
From: Kevin K. <kj...@gm...> - 2011-08-07 07:33:01
|
I started writing a JFlex lexer spec by hand, and realized it would be much simpler and more maintainable to generate the lexer spec I desire from another sort of meta-spec. Basically, I want to describe the second and third sections of the lexer spec (options, macros, and lexical rules) in XML -- with the additional feature that regexp definitions will have names. Then I'll write a program that takes this XML and generates several things: - an enum containing the names of all the states - enums containing the names of all the regexps in each state - listener interfaces for each state, with methods corresponding to the names of the regexps. - also a state change listener interface - a JFlex lexer spec with: - addXxxListener(XxxListener) methods for each type of listener - actions that call the listener methods associated with their regexps The advantage of all this is that no application code would be embedded in the actions of the lexer spec. All the application code would be in the class(es) that implement the listener interfaces. This would make the action code much easier to write, debug, and refactor than if it were in the lexer spec. I can think of a few kinks that will need to be worked out, like how to handle regexps that are to be matched in more than one state. I suppose I could make it an option to generate a single enum with all the regexp names, or separate enums for each state. My only question before I do this... am I reinventing anyone's wheel? |
From: Kevin K. <kj...@gm...> - 2011-08-02 00:23:58
|
Has anyone ever written a skeleton file for a write-driven lexer? Instead of constructing my lexer with a Reader, I want it to have a write(...) method that fills its internal buffer and causes it to perform its actions on all written tokens. (Or, failing that, a yylex() method that throws some kind of buffer underflow exception, signaling the caller to refill the buffer by calling write(...) again.) |
From: Steve R. <sa...@od...> - 2011-07-31 03:03:24
|
Hi Swaroop, If you want to maintain state history, you have to do it yourself - the generated scanner does not maintain a state stack. State nesting is a notational convenience - it just means that the rules apply to all containing states. Only one state applies at any given time. For example, these two specification snippets are equivalent: <A> { expr3 { action } <B,C> expr4 { action } } The above is the same as: <A> expr3 { action} <A,B,C> expr4 { action} Steve On 7/30/2011 8:22 AM, Swaroop Rao wrote: > I was reading the JFlex documentation and in sec 4.3.3, there's an > example that looks like this: > ***** > %states A, B > %xstates C > %% > expr1 { yybegin(A); action } > <YYINITIAL, A> expr2 { action } > <A> { > expr3 { action } > <B,C> expr4 { action } > } > ***** > > My question is: how can the lexical analyzer be in both states A as > well as B or C? What does the line "<B, C> expr4" actually mean? In > the example, it is not shown how states B and/or C are triggered, so > it is difficult for me to visualize it. > > I am trying to use JFlex to solve a problem where I may encounter text > like "[a-z]". I start in a state called 'NEUTRAL'. When the lexical > analyzer sees the '[' character, I want to go to state A and when I > see the '-' character, I want to switch to state B. But, when I see > 'z', I want to switch back to state A, so that when I see the final > ']' character, I want to go back to state 'NEUTRAL'. So, how do I > remember that I was in state A before I switched to state B? Does FLex > maintain a stack of states that I have seen in the past? > > Sorry if this question is too naive. > > Regards, > Swaroop |
From: Swaroop R. <bac...@ya...> - 2011-07-30 12:22:27
|
I was reading the JFlex documentation and in sec 4.3.3, there's an example that looks like this: ***** %states A, B %xstates C %% expr1 { yybegin(A); action } <YYINITIAL, A> expr2 { action } <A> { expr3 { action } <B,C> expr4 { action } } ***** My question is: how can the lexical analyzer be in both states A as well as B or C? What does the line "<B, C> expr4" actually mean? In the example, it is not shown how states B and/or C are triggered, so it is difficult for me to visualize it. I am trying to use JFlex to solve a problem where I may encounter text like "[a-z]". I start in a state called 'NEUTRAL'. When the lexical analyzer sees the '[' character, I want to go to state A and when I see the '-' character, I want to switch to state B. But, when I see 'z', I want to switch back to state A, so that when I see the final ']' character, I want to go back to state 'NEUTRAL'. So, how do I remember that I was in state A before I switched to state B? Does FLex maintain a stack of states that I have seen in the past? Sorry if this question is too naive. Regards, Swaroop |
From: Thomas K. <ki...@in...> - 2011-05-23 13:33:17
|
Hi everyone, I'm trying to use the %include decleration in the lexical rules part of my grammar files. This does not seem to work though. I'm trying to do this inside of a rule. So for example: <STRING> { %include my/library/file.flex expr1 { action1 } expr2 { action2 } } This gives me a syntax error. Is there a reason this is not possible? Cheers, Thomas Kinnen |
From: <Ale...@in...> - 2011-05-19 15:11:44
|
Greetings JFlex users, Does anyone have a working example of AST generation with CUP? Any hints or pointers would be helpful, too. I have a more detailed question posted on Stack Overflow, link below: http://stackoverflow.com/questions/6033303/parse-tree-generation-with-java-cup I read JFlex and CUP documentation end-to-end, and aware of symbols with user-defined values and possibility of including user code in CUP grammar specification, e.g.: term ::= LPAREN expr:e RPAREN {: /* user code goes here */ RESULT = e; :} | NUMBER:n {: RESULT = n; :}; I just need some help tying it all together. Some working code would be most helpful. Thanks, Alex ******************************************************* This message (including any files transmitted with it) may contain confidential and/or proprietary information, is the property of Interactive Data Corporation and/or its subsidiaries, and is directed only to the addressee(s). If you are not the designated recipient or have reason to believe you received this message in error, please delete this message from your system and notify the sender immediately. An unintended recipient's disclosure, copying, distribution, or use of this message or any attachments is prohibited and may be unlawful. ******************************************************* |
From: <st1...@ai...> - 2011-04-26 12:54:03
|
This was what was wanted, ---------------------------------------- // The rules below have been inspired from // http://svn.fifesoft.com/viewvc-1.0.5/bin/cgi/viewvc.cgi/RSyntaxTextArea/trunk/src/org/fife/ui/rsyntaxtextarea/modes/CTokenMaker.flex?logsort=cvs&view=markup&root=RSyntaxTextArea&pathrev=12 letter = [A-Za-z] letter_or_underscore = ({letter}|[_]) digit = [0-9] url_gen_delim = ([:\/\?#\[\]@]) url_sub_delim = ([\!\$&'\(\)\*\+,;=]) url_unreserved = ({letter_or_underscore}|{digit}|[\-\.\~]) url_character = ({url_gen_delim}|{url_sub_delim}|{url_unreserved}|[%]) url_characters = ({url_character}*) url_end_character = ([\/\$]|{letter}|{digit}) dec_uri = (((https?|f(tp|ile))"://"|"www.")({url_characters}{url_end_character})?) ---------------------------------------- HTH, Ravi Quoting Steve Rowe <sa...@od...> on 04/26/2011 > Hi PCoder, > > JFlex doesn't support {n,} syntax. From the manual > <http://jflex.de/manual.html>: > > a{n} > > ... > > a{n,m} > > ... > > (no "a{n,}" here) > > The following (untested) should work: > > dec_uri = [a-zA-Z]{3}[a-zA-Z]*://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)? > > Your sub-expression "(/[\w- ./?%&=]*)?" appears to contain a space, > which is very likely not what you want. > > Steve > > On 4/24/2011 10:58 AM, Purple Coder wrote: >> Hi, >> >> I am trying hard to get my regular expression into my flex file (for >> grabbing the URIs in my input). JFlex doesn;t seem to like >> characters like {} and / in my regular expression. >> >> dec_uri = ([a-zA-Z]{3,})://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)? >> >> I am using the above line in my flex file. And flex doesn;t seem to >> like { in the above expression. The output is: >> >> Error in file "rql.flex" (line 72): >> Syntax error. >> >> dec_uri = ([a-zA-Z]'{'3,'}'):'/''/'([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)? >> >> ^ >> >> >> Error in file "rql.flex" (line 72): >> Unexpected character >> >> dec_uri = ([a-zA-Z]'{'3,'}'):'/''/'([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)? >> >> >> I would appreciate if someone could help me with this. I don't find >> sufficient information in the documentation. >> >> Thank you, >> PCoder >> >> >> ------------------------------------------------------------------------------ >> WhatsUp Gold - Download Free Network Management Software >> The most intuitive, comprehensive, and cost-effective network >> management toolset available today. Delivers lowest initial >> acquisition cost and overall TCO of any competing solution. >> http://p.sf.net/sfu/whatsupgold-sd >> >> >> -- >> jflex-users mailing list >> https://lists.sourceforge.net/lists/listinfo/jflex-users > > |
From: Steve R. <sa...@od...> - 2011-04-26 12:21:14
|
Hi PCoder, JFlex doesn't support {n,} syntax. From the manual <http://jflex.de/manual.html>: a{n} ... a{n,m} ... (no "a{n,}" here) The following (untested) should work: dec_uri = [a-zA-Z]{3}[a-zA-Z]*://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)? Your sub-expression "(/[\w- ./?%&=]*)?" appears to contain a space, which is very likely not what you want. Steve On 4/24/2011 10:58 AM, Purple Coder wrote: > Hi, > > I am trying hard to get my regular expression into my flex file (for > grabbing the URIs in my input). JFlex doesn;t seem to like characters > like {} and / in my regular expression. > > dec_uri = ([a-zA-Z]{3,})://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)? > > I am using the above line in my flex file. And flex doesn;t seem to > like { in the above expression. The output is: > > Error in file "rql.flex" (line 72): > Syntax error. > > dec_uri = ([a-zA-Z]'{'3,'}'):'/''/'([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)? > > ^ > > > Error in file "rql.flex" (line 72): > Unexpected character > > dec_uri = ([a-zA-Z]'{'3,'}'):'/''/'([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)? > > > I would appreciate if someone could help me with this. I don't find > sufficient information in the documentation. > > Thank you, > PCoder > > > ------------------------------------------------------------------------------ > WhatsUp Gold - Download Free Network Management Software > The most intuitive, comprehensive, and cost-effective network > management toolset available today. Delivers lowest initial > acquisition cost and overall TCO of any competing solution. > http://p.sf.net/sfu/whatsupgold-sd > > > -- > jflex-users mailing list > https://lists.sourceforge.net/lists/listinfo/jflex-users |
From: Purple C. <pur...@ya...> - 2011-04-24 14:59:04
|
Hi, I am trying hard to get my regular expression into my flex file (for grabbing the URIs in my input). JFlex doesn;t seem to like characters like {} and / in my regular expression. dec_uri = ([a-zA-Z]{3,})://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)? I am using the above line in my flex file. And flex doesn;t seem to like { in the above expression. The output is: Error in file "rql.flex" (line 72): Syntax error. dec_uri = ([a-zA-Z]'{'3,'}'):'/''/'([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)? ^ Error in file "rql.flex" (line 72): Unexpected character dec_uri = ([a-zA-Z]'{'3,'}'):'/''/'([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)? I would appreciate if someone could help me with this. I don't find sufficient information in the documentation. Thank you, PCoder |
From: Steve R. <sa...@od...> - 2011-04-22 14:33:11
|
Hi Ken, On 4/22/2011 8:23 AM, Ken...@th... wrote: > Thanks. The concern was mainly about calling yybegin(MyJFlexer.THISSTATE) > before yylex() had started, since the docs only mention calling it > in the middle of a lex. I wasn't sure whether that would mess up its > bookkeeping or something. Lexical state is held in zzLexicalState, which is set on construction to YYINITIAL. yybegin() directly sets zzLexicalState, and does not interfere with scanner operation, so there is no problem setting it outside of action code. If you always want to start your scanner in THISSTATE, you could put your yybegin(THISSTATE) call in an %init{ ... %init} block in your scanner specification, so you don't have to explicitly call it. Steve > ________________________________________ > From: Steve Rowe [sa...@od...] > Sent: Thursday, April 21, 2011 11:12 PM > To: jfl...@li... > Subject: Re: [jflex-users] Accessing grammar rules from Java > > Hi Ken, > > Looks fine to me. I do this (temporarily) when I'm debugging a new > grammar, so I can follow match paths. > > Was there some particular concern you had? That is, what are you > worried about going wrong? > > Steve > > On 4/21/2011 2:15 PM, Ken...@th... wrote: >> Hi, >> >> I have a situation in which I have a big long grammar with lots of macros& rules of various types, and it works fine. I also want to take a few of the rules and expose them individually to a caller through a Java API. Here's my method, which does seem to work, but I haven't seen people use states like this to expose rules externally. >> >> ====== in the grammar file ======= >> %class MyJFlexer >> <YYINITIAL> { >> ...various other rules... >> {MACRO_999} { return newToken("MACRO_999"); } >> ...various other rules... >> } >> <THISSTATE> { >> {MACRO_999} { return newToken("MACRO_999"); } >> } >> ============================= >> >> ====== in Java ================= >> lex = new MyJFlexer(new StringReader("foo")); >> lex.yybegin(MyJFlexer.THISSTATE); >> tok = lex.yylex(); >> assertEquals("MACRO_999", tok.type); >> ============================= >> >> Does that look kosher? >> >> Thanks. >> >> -Ken |
From: <Ken...@th...> - 2011-04-22 12:25:15
|
Thanks. The concern was mainly about calling yybegin(MyJFlexer.THISSTATE) before yylex() had started, since the docs only mention calling it in the middle of a lex. I wasn't sure whether that would mess up its bookkeeping or something. -Ken ________________________________________ From: Steve Rowe [sa...@od...] Sent: Thursday, April 21, 2011 11:12 PM To: jfl...@li... Subject: Re: [jflex-users] Accessing grammar rules from Java Hi Ken, Looks fine to me. I do this (temporarily) when I'm debugging a new grammar, so I can follow match paths. Was there some particular concern you had? That is, what are you worried about going wrong? Steve On 4/21/2011 2:15 PM, Ken...@th... wrote: > Hi, > > I have a situation in which I have a big long grammar with lots of macros& rules of various types, and it works fine. I also want to take a few of the rules and expose them individually to a caller through a Java API. Here's my method, which does seem to work, but I haven't seen people use states like this to expose rules externally. > > ====== in the grammar file ======= > %class MyJFlexer > <YYINITIAL> { > ...various other rules... > {MACRO_999} { return newToken("MACRO_999"); } > ...various other rules... > } > <THISSTATE> { > {MACRO_999} { return newToken("MACRO_999"); } > } > ============================= > > ====== in Java ================= > lex = new MyJFlexer(new StringReader("foo")); > lex.yybegin(MyJFlexer.THISSTATE); > tok = lex.yylex(); > assertEquals("MACRO_999", tok.type); > ============================= > > Does that look kosher? > > Thanks. > > -Ken ------------------------------------------------------------------------------ Fulfilling the Lean Software Promise Lean software platforms are now widely adopted and the benefits have been demonstrated beyond question. Learn why your peers are replacing JEE containers with lightweight application servers - and what you can gain from the move. http://p.sf.net/sfu/vmware-sfemails -- jflex-users mailing list https://lists.sourceforge.net/lists/listinfo/jflex-users |
From: Steve R. <sa...@od...> - 2011-04-22 04:40:29
|
Hi Ken, Looks fine to me. I do this (temporarily) when I'm debugging a new grammar, so I can follow match paths. Was there some particular concern you had? That is, what are you worried about going wrong? Steve On 4/21/2011 2:15 PM, Ken...@th... wrote: > Hi, > > I have a situation in which I have a big long grammar with lots of macros& rules of various types, and it works fine. I also want to take a few of the rules and expose them individually to a caller through a Java API. Here's my method, which does seem to work, but I haven't seen people use states like this to expose rules externally. > > ====== in the grammar file ======= > %class MyJFlexer > <YYINITIAL> { > ...various other rules... > {MACRO_999} { return newToken("MACRO_999"); } > ...various other rules... > } > <THISSTATE> { > {MACRO_999} { return newToken("MACRO_999"); } > } > ============================= > > ====== in Java ================= > lex = new MyJFlexer(new StringReader("foo")); > lex.yybegin(MyJFlexer.THISSTATE); > tok = lex.yylex(); > assertEquals("MACRO_999", tok.type); > ============================= > > Does that look kosher? > > Thanks. > > -Ken |
From: <Ken...@th...> - 2011-04-21 18:16:06
|
Hi, I have a situation in which I have a big long grammar with lots of macros & rules of various types, and it works fine. I also want to take a few of the rules and expose them individually to a caller through a Java API. Here's my method, which does seem to work, but I haven't seen people use states like this to expose rules externally. ====== in the grammar file ======= %class MyJFlexer <YYINITIAL> { ...various other rules... {MACRO_999} { return newToken("MACRO_999"); } ...various other rules... } <THISSTATE> { {MACRO_999} { return newToken("MACRO_999"); } } ============================= ====== in Java ================= lex = new MyJFlexer(new StringReader("foo")); lex.yybegin(MyJFlexer.THISSTATE); tok = lex.yylex(); assertEquals("MACRO_999", tok.type); ============================= Does that look kosher? Thanks. -Ken |
From: chris c. <hyp...@ho...> - 2010-08-23 13:06:38
|
Yes, it works. equals works perfectly! :D > From: leo...@ma... > To: hyp...@ho... > Subject: RE: [jflex-users] simple "apple" method? > Date: Mon, 23 Aug 2010 08:03:51 -0500 > > Hello, > > You should not do something like string=="apple", instead try > if(string.equals("apple")){ > //do something! > } > > Try it and let us know! > > > Regards... > > -- > Leonardo Gómez M > Conocimiento e Investigaciones Estratégicas > DWHA Telcel > T. 25813700 Xt 2967 > M 55 3467 0779 > -----Original Message----- > From: chris chia [mailto:hyp...@ho...] > Sent: Viernes, 20 de Agosto de 2010 11:18 a.m. > To: jfl...@li... > Subject: [jflex-users] simple "apple" method? > > Hi all, > I used yytext() in my JFlex in the rules section. > It can print the words when it matches the reg exp. > > However when i pass this yytext() into a method in java to process, the > result is weird. > > For example, > if the output of yytext() = "apple" > and when i pass it into > a.check(yytext()) where check is a if(string == "apple") do this... > > But the strange thing is it does not do anything. > > Is this yytext() returning a string or a linked list of char or something... > How can i write a check method to do something if it's a "apple". > > > |
From: Steve R. <sa...@od...> - 2010-08-22 19:00:34
|
String == String checks for referential equality: whether or not two String references point to the same in-memory String object. (All non-primitive Java instance variables are references.) You should have something like if(string.equals("apple") do this... <http://download.oracle.com/javase/6/docs/api/java/lang/String.html#equals%28java.lang.Object%29> chris chia wrote: > Hi all, > I used yytext() in my JFlex in the rules section. > It can print the words when it matches the reg exp. > > However when i pass this yytext() into a method in java to process, the > result is weird. > > For example, > if the output of yytext() = "apple" > and when i pass it into > a.check(yytext()) where check is a if(string == "apple") do this... > > But the strange thing is it does not do anything. > > Is this yytext() returning a string or a linked list of char or something... > How can i write a check method to do something if it's a "apple". |
From: chris c. <hyp...@ho...> - 2010-08-20 16:18:23
|
Hi all, I used yytext() in my JFlex in the rules section. It can print the words when it matches the reg exp. However when i pass this yytext() into a method in java to process, the result is weird. For example, if the output of yytext() = "apple" and when i pass it into a.check(yytext()) where check is a if(string == "apple") do this... But the strange thing is it does not do anything. Is this yytext() returning a string or a linked list of char or something... How can i write a check method to do something if it's a "apple". |
From: chris c. <hyp...@ho...> - 2010-08-20 12:47:44
|
New to jFlex, I wish to write certain user code but i am stuck. I wish to find the number of times the word "apple" appears in a text file... and anything beside apple. How can i write it in the usercode? I tried the following but it fails... appleWord= [apple] otherWord = [^apple] both failed to compile. Please guide. Thanks |
From: Sujen M. <su...@co...> - 2010-08-11 17:20:29
|
Hello All, I use JLex as lexical analyser generator and javacup as parser generator. I have a lex file with list of tokens, among which, one is BRACKETID Here is the regular expression for BRACKETID BRACKETID = \[[^\]\n\f\r\t]*\] This regex matches the string with enclosed square brackets and the string can contain any special character except extra ] within the enclosing square brackets. Now I want it to match any string with enclosed square brackets plus contains the special characters like ] also, but I want it to match if and only if ] character comes in pair. For e.g. The regex should not match the following: [Hello]] but if the string is something like [Hello]]], it should match. So for that , I came up with the regex \[(?:[^]]|]])+](?!]) and tested it with java regex library, seems like it worked. Now when I tried to use this regex for BRACKETID token in lex file, as below BRACKETID = \[(?:[^]]|]])+](?!]) and generate java file, jlex throws error and I think, this is due to limited set of regex characters in jlex, as it is too old library ( of 2003 ). I researched a bit and came to know abt JFlex which is a rewrite of jlex. so, does jflex supports such regex and also how do I directly map lex file to flex file to generate java file for javacup to generate parser. Any suggestion will be helpful. Thanks, Sujen |
From: Steve R. <sa...@od...> - 2010-06-15 16:51:51
|
Ken Williams wrote: > On 6/15/10 10:32 AM, "Steve Rowe" <sa...@od...> wrote: >> '$' does not match end-of-file. It is an end-of-line ('\n') lookahead. >> [^a-z] includes '\n', so that's why you're getting this warning. > > Makes sense. The strange thing is that I was already using this construct > for another similar rule and it didn't complain about that one: > > // Catch U.S. States that are also valid roman numerals > STATEFRAG="mi"|"dc"|"md" > ... > {STATEFRAG} / [^a-z] { return newTok("WORD"); } > {STATEFRAG} $ { return newTok("WORD"); } Hmm, that is strange. Seems like these rules should have the same problem. >> I *think* you can handle this situation by using three rules and a >> non-default lexical state: >> >> %state NONROMAN >> ... >> {ROMANPAT} / [^a-z] { return newTok("ROMAN"); } >> {ROMANPAT} / [a-z] { yypushback(yylength()); yybegin(NONROMAN); } >> {ROMANPAT} { return newTok("ROMAN"); } >> >> <NONROMAN,YYINITIAL> { >> ... // non-roman matching rules go here. >> } > > That would get pretty messy if I need to use the same technique for more > than one rule in the same grammar though. > > Speaking of messy, as a stopgap measure I ended up solving this by peeking > ahead in the stream (using some of the same techniques as in > JFlex.Emitter.emitLexFunctHeader() ) to see if the current match is followed > by a letter. Totally illegal & unmaintainable, but it does seem to work. I'm glad you figured it out. I'll look into adding end-of-file lookahead assertion, like Perl's /\Z/ and /\z/. Steve |
From: Ken W. <ken...@th...> - 2010-06-15 15:56:54
|
On 6/15/10 10:32 AM, "Steve Rowe" <sa...@od...> wrote: > '$' does not match end-of-file. It is an end-of-line ('\n') lookahead. > [^a-z] includes '\n', so that's why you're getting this warning. Makes sense. The strange thing is that I was already using this construct for another similar rule and it didn't complain about that one: // Catch U.S. States that are also valid roman numerals STATEFRAG="mi"|"dc"|"md" ... {STATEFRAG} / [^a-z] { return newTok("WORD"); } {STATEFRAG} $ { return newTok("WORD"); } > I *think* you can handle this situation by using three rules and a > non-default lexical state: > > %state NONROMAN > ... > {ROMANPAT} / [^a-z] { return newTok("ROMAN"); } > {ROMANPAT} / [a-z] { yypushback(yylength()); yybegin(NONROMAN); } > {ROMANPAT} { return newTok("ROMAN"); } > > <NONROMAN,YYINITIAL> { > ... // non-roman matching rules go here. > } That would get pretty messy if I need to use the same technique for more than one rule in the same grammar though. Speaking of messy, as a stopgap measure I ended up solving this by peeking ahead in the stream (using some of the same techniques as in JFlex.Emitter.emitLexFunctHeader() ) to see if the current match is followed by a letter. Totally illegal & unmaintainable, but it does seem to work. -- Ken Williams Sr. Research Scientist Thomson Reuters Phone: 651-848-7712 ken...@th... |
From: Steve R. <sa...@od...> - 2010-06-15 15:48:45
|
Steve Rowe wrote: > I *think* you can handle this situation by using three rules and a > non-default lexical state: > > %state NONROMAN > ... > {ROMANPAT} / [^a-z] { return newTok("ROMAN"); } > {ROMANPAT} / [a-z] { yypushback(yylength()); yybegin(NONROMAN); } > {ROMANPAT} { return newTok("ROMAN"); } > > <NONROMAN,YYINITIAL> { > ... // non-roman matching rules go here. > } Whoops, you only need two rules: {ROMANPAT} / [a-z] { yypushback(yylength()); yybegin(NONROMAN); } {ROMANPAT} { return newTok("ROMAN"); } Steve |
From: Steve R. <sa...@od...> - 2010-06-15 15:46:43
|
Ken Williams wrote: > On 6/14/10 5:58 PM, "Steve Rowe" <sa...@od...> wrote: >> How about spelling out all of the alternatives such that at least one >> character is mandatory? Something like (warning: untested): >> >> ROMANPAT = "m"+ ("d"? "c"{0,3} | "c" [dm]) >> ("l"? "x"{0,3} | "x" [lc]) >> ("v"? "i"{0,3} | "i" [vx]) >> | ("d" "c"{0,3} | "c"{1,3} | "c" [dm]) >> ("l"? "x"{0,3} | "x" [lc]) >> ("v"? "i"{0,3} | "i" [vx]) >> | ("l" "x"{0,3} | "x"{1,3} | "x" [lc]) >> ("v"? "i"{0,3} | "i" [vx]) >> | ("v" "i"{0,3} | "i"{1,3} | "i" [vx]) > > That seems to *almost* work, but now I get a new warning when compiling: > > ------------------ > Warning: > Rule can never be matched: > {ROMANPAT} $ { return newTok("ROMAN"); } > ------------------ '$' does not match end-of-file. It is an end-of-line ('\n') lookahead. [^a-z] includes '\n', so that's why you're getting this warning. I too would like an end-of-file assertion. <<EOF>> works alone as a rule, but it can't be combined with a regex. I *think* you can handle this situation by using three rules and a non-default lexical state: %state NONROMAN ... {ROMANPAT} / [^a-z] { return newTok("ROMAN"); } {ROMANPAT} / [a-z] { yypushback(yylength()); yybegin(NONROMAN); } {ROMANPAT} { return newTok("ROMAN"); } <NONROMAN,YYINITIAL> { ... // non-roman matching rules go here. } > Incidentally - if I were using a "normal" regex engine like Perl's, I could > solve the original zero-length problem more cleanly by using a lookahead > *inside* the pattern: > > ROMANPAT = (?=[a-z]) m* (d?c{0,3}|c[dm]) (l?x{0,3}|x[lc]) (v?i{0,3}|i[vx]) > > Any chance something like this could make its way into JFlex, or would it > mess up the NFA->DFA conversion? Sorry, I don't know. Steve |
From: Ken W. <ken...@th...> - 2010-06-15 14:32:00
|
On 6/14/10 5:58 PM, "Steve Rowe" <sa...@od...> wrote: > How about spelling out all of the alternatives such that at least one > character is mandatory? Something like (warning: untested): > > ROMANPAT = "m"+ ("d"? "c"{0,3} | "c" [dm]) > ("l"? "x"{0,3} | "x" [lc]) > ("v"? "i"{0,3} | "i" [vx]) > | ("d" "c"{0,3} | "c"{1,3} | "c" [dm]) > ("l"? "x"{0,3} | "x" [lc]) > ("v"? "i"{0,3} | "i" [vx]) > | ("l" "x"{0,3} | "x"{1,3} | "x" [lc]) > ("v"? "i"{0,3} | "i" [vx]) > | ("v" "i"{0,3} | "i"{1,3} | "i" [vx]) That seems to *almost* work, but now I get a new warning when compiling: ------------------ Warning: Rule can never be matched: {ROMANPAT} $ { return newTok("ROMAN"); } ------------------ I'm not sure why that is - what I really want here is a negative lookahead like this: {ROMANPAT} (?![a-z]) { return newTok("ROMAN"); } and I'm trying to emulate it by using a positive lookahead with a negated character class, which necessitates handling the end-of-string condition too: {ROMANPAT} / [^a-z] { return newTok("ROMAN"); } {ROMANPAT} $ { return newTok("ROMAN"); } Maybe there's a way to do the lookahead in the Java code, by some kind of peek() function? Incidentally - if I were using a "normal" regex engine like Perl's, I could solve the original zero-length problem more cleanly by using a lookahead *inside* the pattern: ROMANPAT = (?=[a-z]) m* (d?c{0,3}|c[dm]) (l?x{0,3}|x[lc]) (v?i{0,3}|i[vx]) Any chance something like this could make its way into JFlex, or would it mess up the NFA->DFA conversion? > P.S.: Wikipedia says that the formulation you've specified is modern, > and that things like IIIII (5) and VV (10) and XIIII (14) were once used. Luckily, I don't need to worry about those because the patterns I'm dealing with are coming from a fairly controlled modern source. -- Ken Williams Sr. Research Scientist Thomson Reuters Phone: 651-848-7712 ken...@th... |
From: Steve R. <sa...@od...> - 2010-06-14 23:22:52
|
Hi Ken, Ken Williams wrote: > --------------------------- > ROMANPAT="m"* > ("d"?"c"{0,3}|"c"("d"|"m")) > ("l"?"x"{0,3}|"x"("l"|"c")) > ("v"?"i"{0,3}|"i"("v"|"x")) > ... > {ROMANPAT} / [^a-z] { return newTok("ROMAN"); } > {ROMANPAT} $ { return newTok("ROMAN"); } > --------------------------- > --------------------------- > Lookahead expression must have match with at least length 1. > {ROMANPAT} / [^a-z] { return newTok("ROMAN"); } > --------------------------- > > Does the error refer to the lookahead (what’s after the slash), or the > main regex (before the slash)? IIUC the lookahead itself matches 1 > character, so that shouldn’t be an issue. The ROMANPAT regex can indeed > match a zero-length string, so perhaps that’s the problem. Yes, the problem is the ROMANPAT macro. > Is there some way to force this rule to have a non-zero-length match? How about spelling out all of the alternatives such that at least one character is mandatory? Something like (warning: untested): ROMANPAT = "m"+ ("d"? "c"{0,3} | "c" [dm]) ("l"? "x"{0,3} | "x" [lc]) ("v"? "i"{0,3} | "i" [vx]) | ("d" "c"{0,3} | "c"{1,3} | "c" [dm]) ("l"? "x"{0,3} | "x" [lc]) ("v"? "i"{0,3} | "i" [vx]) | ("l" "x"{0,3} | "x"{1,3} | "x" [lc]) ("v"? "i"{0,3} | "i" [vx]) | ("v" "i"{0,3} | "i"{1,3} | "i" [vx]) Steve P.S.: Wikipedia says that the formulation you've specified is modern, and that things like IIIII (5) and VV (10) and XIIII (14) were once used. |
From: Ken W. <ken...@th...> - 2010-06-14 20:30:40
|
Hi, I have the following in a JFlex grammar I'm writing - the intent is to match Roman Numerals that are not followed by other letters: --------------------------- ROMANPAT="m"* ("d"?"c"{0,3}|"c"("d"|"m")) ("l"?"x"{0,3}|"x"("l"|"c")) ("v"?"i"{0,3}|"i"("v"|"x")) ... {ROMANPAT} / [^a-z] { return newTok("ROMAN"); } {ROMANPAT} $ { return newTok("ROMAN"); } --------------------------- When I try to compile, I get the following error: --------------------------- Lookahead expression must have match with at least length 1. {ROMANPAT} / [^a-z] { return newTok("ROMAN"); } --------------------------- Does the error refer to the lookahead (what¹s after the slash), or the main regex (before the slash)? IIUC the lookahead itself matches 1 character, so that shouldn¹t be an issue. The ROMANPAT regex can indeed match a zero-length string, so perhaps that¹s the problem. I¹m not sure how to correct that in this case though. Is there some way to force this rule to have a non-zero-length match? Any alternate suggestion for accomplishing this is welcome too. -- Ken Williams Sr. Research Scientist Thomson Reuters Phone: 651-848-7712 ken...@th... |