flex-help Mailing List for flex: the fast lexical analyser (Page 5)
flex is a tool for generating scanners
Brought to you by:
wlestes
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(2) |
May
(3) |
Jun
(4) |
Jul
(10) |
Aug
(6) |
Sep
(20) |
Oct
(30) |
Nov
(10) |
Dec
(40) |
2007 |
Jan
(25) |
Feb
(18) |
Mar
(34) |
Apr
(36) |
May
(29) |
Jun
(1) |
Jul
(35) |
Aug
(5) |
Sep
(7) |
Oct
(15) |
Nov
(16) |
Dec
(13) |
2008 |
Jan
(11) |
Feb
(23) |
Mar
(17) |
Apr
(32) |
May
(7) |
Jun
(20) |
Jul
(2) |
Aug
(13) |
Sep
(13) |
Oct
(16) |
Nov
(3) |
Dec
(17) |
2009 |
Jan
(10) |
Feb
(10) |
Mar
(13) |
Apr
(3) |
May
(25) |
Jun
(11) |
Jul
(1) |
Aug
(17) |
Sep
(19) |
Oct
(9) |
Nov
(20) |
Dec
(22) |
2010 |
Jan
(29) |
Feb
(13) |
Mar
(11) |
Apr
(10) |
May
(9) |
Jun
(13) |
Jul
(4) |
Aug
(28) |
Sep
(8) |
Oct
(8) |
Nov
(4) |
Dec
(7) |
2011 |
Jan
(3) |
Feb
(3) |
Mar
(5) |
Apr
(4) |
May
(2) |
Jun
(7) |
Jul
(12) |
Aug
(10) |
Sep
(6) |
Oct
(14) |
Nov
(1) |
Dec
(9) |
2012 |
Jan
(6) |
Feb
(1) |
Mar
(13) |
Apr
(4) |
May
(5) |
Jun
(1) |
Jul
(6) |
Aug
(18) |
Sep
(12) |
Oct
(46) |
Nov
(7) |
Dec
(4) |
2013 |
Jan
(2) |
Feb
(3) |
Mar
|
Apr
(5) |
May
(2) |
Jun
(11) |
Jul
|
Aug
|
Sep
|
Oct
(11) |
Nov
(16) |
Dec
(1) |
2014 |
Jan
(2) |
Feb
(1) |
Mar
|
Apr
(11) |
May
|
Jun
(2) |
Jul
(2) |
Aug
|
Sep
|
Oct
(8) |
Nov
(1) |
Dec
(7) |
2015 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
(11) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(2) |
2016 |
Jan
(1) |
Feb
(4) |
Mar
(6) |
Apr
(2) |
May
(15) |
Jun
(19) |
Jul
(10) |
Aug
|
Sep
(1) |
Oct
(6) |
Nov
(4) |
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2018 |
Jan
(4) |
Feb
(1) |
Mar
(5) |
Apr
|
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2021 |
Jan
(3) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(5) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
From: Will E. <wes...@gm...> - 2015-12-04 15:20:44
|
Thanks for your report. Can you submit a patch or pull request for the below? On Friday, 4 December 2015, 4:16 pm +0100, Akim Demaille <ak...@lr...> wrote: > Hi, > > I’m happy to see that the C++ code is renovated. While at it, please > s/struct yy_buffer_state/yy_buffer_state/g, the struct keyword is absolutely > useless, and clutters the flex-lexer.hh file. > > Also, really, FLEX_STD is, IMHO, very useless. Just spread std:: everywhere. > std:: is 17 years old! > > Actually, reading the generated code, one reads: > > /* The contents of this function are C++ specific, so the () macro is not used. > * This constructor simply maintains backward compatibility. > * DEPRECATED > */ > yyFlexLexer::yyFlexLexer( FLEX_STD istream* arg_yyin, FLEX_STD ostream* arg_yyout ): > yyin(arg_yyin ? arg_yyin->rdbuf() : std::cin.rdbuf()), > yyout(arg_yyout ? arg_yyout->rdbuf() : std::cout.rdbuf()) > { > ctor_common(); > } > > /* The contents of this function are C++ specific, so the () macro is not used. > */ > yyFlexLexer::yyFlexLexer( std::istream& arg_yyin, std::ostream& arg_yyout ): > > one has std::, the other has FLEX_STD. So compliance with std:: is already > assumed. > > > ------------------------------------------------------------------------------ > Go from Idea to Many App Stores Faster with Intel(R) XDK > Give your users amazing mobile app experiences with Intel(R) XDK. > Use one codebase in this all-in-one HTML5 development environment. > Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. > http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140 > -- > Flex-help mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-help -- Will Estes Flex Project Maintainer wes...@gm... https://github.com/westes/flex |
From: Akim D. <ak...@lr...> - 2015-12-04 15:17:01
|
Hi, I’m happy to see that the C++ code is renovated. While at it, please s/struct yy_buffer_state/yy_buffer_state/g, the struct keyword is absolutely useless, and clutters the flex-lexer.hh file. Also, really, FLEX_STD is, IMHO, very useless. Just spread std:: everywhere. std:: is 17 years old! Actually, reading the generated code, one reads: /* The contents of this function are C++ specific, so the () macro is not used. * This constructor simply maintains backward compatibility. * DEPRECATED */ yyFlexLexer::yyFlexLexer( FLEX_STD istream* arg_yyin, FLEX_STD ostream* arg_yyout ): yyin(arg_yyin ? arg_yyin->rdbuf() : std::cin.rdbuf()), yyout(arg_yyout ? arg_yyout->rdbuf() : std::cout.rdbuf()) { ctor_common(); } /* The contents of this function are C++ specific, so the () macro is not used. */ yyFlexLexer::yyFlexLexer( std::istream& arg_yyin, std::ostream& arg_yyout ): one has std::, the other has FLEX_STD. So compliance with std:: is already assumed. |
From: <ga...@si...> - 2015-08-16 02:36:05
|
Hi all, I am using bison + flex to write a sql parser now. I get some performance problem: the cpu cost of the parser is high, almost 3% of total CPU (using perf top to check), under the test of 64 concurrency sysbench oltp-simple test. I check the generate c source file of flex and bison, and find it use malloc and free to get and release memory for each sql in the yyalloc function. Is it possible for me to set a memory allocator, which will use a local buffer, for bison+flex ? Best regards! Dennis |
From: William B. <wil...@gm...> - 2015-07-30 00:48:00
|
Hi flex folks, I'm working on an interpreter that allows macro substitution. The language allows macros to be written between backtick and single quote, and replaced inline with their expansions. (Like C's #define.) So if the macro "foo" is defined to contain the text "bar", writing sum(`foo') should be expanded to sum(bar). The macros are supposed do pure text replacement, without any semantics. They can also be nested, and a token should be able to consist partially of text originally in the source code and partially of text from a macro expansion. I've implemented this behavior with a start condition stack and a parallel stack of expanded text. While within a set of nested macros, I keep track of all the state without using flex's facilities, but once we get back to the INITIAL state I need to put the final expanded text back into the input stream for reading under the usual rules. I considered using a buffer stack or at least one level of additional buffer, but then tokens wouldn't be able to cross buffer boundaries. Instead, I'm trying to unput() the expanded text back into yytext. yytext is declared as an array of the default YYLMAX size rather than a pointer. This actually works fine when the input is from stdin. When input comes in via yy_scan_string, I get a "flex scanner push-back overflow" error after a few characters. Curiously, while the example below doesn't show it, the number of characters seems to depend on how much text is in the string before the macro appears. The documentation as I read it says that this shouldn't happen, and I'm wondering whether it's because I'm compiling the scanner as C++. I can't easily port my stack-handling code to C, so it's hard to check. Is this a known issue with C++ compilation? Or am I misreading the docs? Any information or assistance you can offer is very welcome. Thanks Will ========================= Details and reproducible example: I'm using flex 2.5.35 and g++ 4.8.4 (the Ubuntu version, 4.8.4-2ubuntu1~14.04). My actions are C++ code, but I'm compiling the C scanner with g++ rather than use the C++ class interface. I've attached a minimal example that reproduces this behavior. When it's compiled and run, I get this: wbrannon@ip-10-0-0-87:~$ flex -o lex.yy.cpp ado.fl wbrannon@ip-10-0-0-87:~$ g++ -std=c++11 -g -O3 -fPIC -Wall -pedantic -o test lex.yy.cpp wbrannon@ip-10-0-0-87:~$ ./test 17 16 15 14 13 flex scanner push-back overflow wbrannon@ip-10-0-0-87:~$ |
From: 최익성 <pn...@na...> - 2015-07-08 01:04:23
|
Dear Arthur Schwarz, Tim Schumacher, Will Estes<westes575. Thank you very much for your precious comments and advice. -----Original Message----- From: "Will Estes"<wes...@gm...> To: "Arthur Schwarz"<asc...@at...>; Cc: "' ͼ '"<pn...@na...>; <fle...@li...>; Sent: 2015-07-08 (수) 01:45:27 Subject: Re: [Flex-help] I really thank you for your advice. Arthur Schwarz. No, flex (or any lexer) is not what you want, since the task you describe is not lexing. It sounds like you don't want a parser either. On Tuesday, 7 July 2015, 8:38 am -0700, Arthur Schwarz <asc...@at...> wrote: > This issue is not whether flex can handle the search, the issue is whether > a lexical analyzer is powerful enough or whether, e.g., a parser (bison) is > needed, and whether there are more specific tools which yield the same or > better answers faster. > > > > Try looking at: > > > > Flexigble Pattern Matching in Strings > > Gonzalo Navarro & Mathier Raffinot > > Cambridge University Press > > ISBN 0 521 81307 7 > > _____ > > From: ͼ [mailto:pn...@na...] > Sent: Tuesday, July 07, 2015 8:23 AM > To: Arthur Schwarz > Subject: RE: [Flex-help] I really thank you for your advice. Arthur Schwarz. > > > > Dear Arthur Schwarz. > > > > Thank you very much for your advice and explanation. > > > > You are right. > > > > In the applications such as pattern matching in network processor, security > devices, gene matching. > > > > Multiple pattern matching is required. > > > > I just have a question that flex can be used to these applications. > > > > Thank you very much. > > > > ------------------------------------------------------------------------------ > Don't Limit Your Business. Reach for the Cloud. > GigeNET's Cloud Solutions provide you with the tools and support that > you need to offload your IT needs and focus on growing your business. > Configured For All Businesses. Start Your Cloud Today. > https://www.gigenetcloud.com/ > -- > Flex-help mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-help |
From: Will E. <wes...@gm...> - 2015-07-07 16:45:39
|
No, flex (or any lexer) is not what you want, since the task you describe is not lexing. It sounds like you don't want a parser either. On Tuesday, 7 July 2015, 8:38 am -0700, Arthur Schwarz <asc...@at...> wrote: > This issue is not whether flex can handle the search, the issue is whether > a lexical analyzer is powerful enough or whether, e.g., a parser (bison) is > needed, and whether there are more specific tools which yield the same or > better answers faster. > > > > Try looking at: > > > > Flexigble Pattern Matching in Strings > > Gonzalo Navarro & Mathier Raffinot > > Cambridge University Press > > ISBN 0 521 81307 7 > > _____ > > From: [mailto:pn...@na...] > Sent: Tuesday, July 07, 2015 8:23 AM > To: Arthur Schwarz > Subject: RE: [Flex-help] I really thank you for your advice. Arthur Schwarz. > > > > Dear Arthur Schwarz. > > > > Thank you very much for your advice and explanation. > > > > You are right. > > > > In the applications such as pattern matching in network processor, security > devices, gene matching. > > > > Multiple pattern matching is required. > > > > I just have a question that flex can be used to these applications. > > > > Thank you very much. > > > > ------------------------------------------------------------------------------ > Don't Limit Your Business. Reach for the Cloud. > GigeNET's Cloud Solutions provide you with the tools and support that > you need to offload your IT needs and focus on growing your business. > Configured For All Businesses. Start Your Cloud Today. > https://www.gigenetcloud.com/ > -- > Flex-help mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-help |
From: Tim S. <sch...@gm...> - 2015-07-07 15:54:29
|
You can try using REJECT. It slows things down but will attempt matching already matched text against other patterns. On 8:39AM, Tue, Jul 7, 2015 Arthur Schwarz <asc...@at...> wrote: > This issue is not whether flex can handle the search, the issue is whether > a lexical analyzer is powerful enough or whether, e.g., a parser (bison) is > needed, and whether there are more specific tools which yield the same or > better answers faster. > > > > Try looking at: > > > > Flexigble Pattern Matching in Strings > > Gonzalo Navarro & Mathier Raffinot > > Cambridge University Press > > ISBN 0 521 81307 7 > > _____ > > From: 최익성 [mailto:pn...@na...] > Sent: Tuesday, July 07, 2015 8:23 AM > To: Arthur Schwarz > Subject: RE: [Flex-help] I really thank you for your advice. Arthur > Schwarz. > > > > Dear Arthur Schwarz. > > > > Thank you very much for your advice and explanation. > > > > You are right. > > > > In the applications such as pattern matching in network processor, security > devices, gene matching. > > > > Multiple pattern matching is required. > > > > I just have a question that flex can be used to these applications. > > > > Thank you very much. > > > > > ------------------------------------------------------------------------------ > Don't Limit Your Business. Reach for the Cloud. > GigeNET's Cloud Solutions provide you with the tools and support that > you need to offload your IT needs and focus on growing your business. > Configured For All Businesses. Start Your Cloud Today. > https://www.gigenetcloud.com/ > -- > Flex-help mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-help > |
From: Arthur S. <asc...@at...> - 2015-07-07 15:39:14
|
This issue is not whether flex can handle the search, the issue is whether a lexical analyzer is powerful enough or whether, e.g., a parser (bison) is needed, and whether there are more specific tools which yield the same or better answers faster. Try looking at: Flexigble Pattern Matching in Strings Gonzalo Navarro & Mathier Raffinot Cambridge University Press ISBN 0 521 81307 7 _____ From: 최익성 [mailto:pn...@na...] Sent: Tuesday, July 07, 2015 8:23 AM To: Arthur Schwarz Subject: RE: [Flex-help] I really thank you for your advice. Arthur Schwarz. Dear Arthur Schwarz. Thank you very much for your advice and explanation. You are right. In the applications such as pattern matching in network processor, security devices, gene matching. Multiple pattern matching is required. I just have a question that flex can be used to these applications. Thank you very much. |
From: Arthur S. <asc...@at...> - 2015-07-07 15:17:28
|
I mean that for input streams "abcdefg", I want both pattern /abcd/ and /bcd/ be active. I know that lex/flex generates a token for input streams. I want multiple tokens be generated simultaneously(concurrently) for input streams. 1: Token return is a developer concern. That is, flex allows recognition of a lexeme, the developer provides a return value which is the representative token. 2: Once /abcd/ is recognized then your choices are: a. return two tokens via something like token[0] = value0; token[1] = value1; return token; b. backup 3 characters to reread "bcd" and then return token = "abcd" The caller must now call the lexer to get a token For "bcd". 3: The notion of 'active' does not apply. The lexer works by Reading an input stream until a pattern is matched, then entering the user code for handling the pattern. The user code decides on what to do with the pattern once seen. At the point that a pattern is being processed, we can say that the pattern in 'active', however this is not a common way of addressing that the pattern has been accepted. 4: Someone else remarked that once /abcd/ has been accepted, then by default the substring "bcd" is known. As seen in 2 a) and 2 b) the user code capitalizes on this knowledge by generating the appropriate action, in 2 a) two tokens are return, in 2 b) one token is returned for "abcd" and after a call by the parser, another token is returned for "bcd" 5: Since "bcd" is a substring of "abcd" there is no real need for there to be two patterns, /abcd/ and /bcd/. The issue of simultaneity is mystifying. If there is a single caller (the parser), then that caller usually expects a single response, the return value(s). Concurrency doesn't arise. If there are multiple tasks and the lexer supports their activation depending on whether it sees "abcd" or "bcd" and the desire is to activate two tasks (one for "abcd" and one for "bcd" then the user code does this by using whatever primitives are available. But there is no way in C++ to get true simultaneity. The actions are serial, and even in a multitasking environment, the actions are serial with pseudo-parallelism accomplished 'concurrently'. If this is not the answer that suits your problem, please explain your problem. art |
From: Will E. <wes...@gm...> - 2015-07-07 15:13:06
|
On Tuesday, 7 July 2015, 11:48 pm +0900, 최익성 <pn...@na...> wrote: > I know that lex/flex generates a token for input streams. > > I want multiple tokens be generated simultaneously(concurrently) for input streams. This is not compatible with flex's purpose. In general, the number of possible matches in an input stream is, well, let's just say "large" and leave it at that. |
From: 최익성 <pn...@na...> - 2015-07-07 14:49:10
|
Dear Arthur Schwarz. I really thank you for your advice and example file. I mean that for input streams "abcdefg", I want both pattern /abcd/ and /bcd/ be active. I know that lex/flex generates a token for input streams. I want multiple tokens be generated simultaneously(concurrently) for input streams. I can know the my problem due to many experts. Thank you very much. -----Original Message----- From: "Arthur Schwarz"<asc...@at...> To: "'최익성'"<pn...@na...>; Cc: Sent: 2015-07-07 (화) 22:55:12 Subject: RE: [Flex-help] Is it possible to search concurrent strings in flex ? -----Original Message----- From: 최익성 [mailto:pn...@na...] Sent: Monday, July 06, 2015 2:05 AM To: fle...@li... Subject: [Flex-help] Is it possible to search concurrent strings in flex ? Dear flex experts. > I have a question about lexical analysis. > > Is it possible to find multiple string "abcd" and "bcd" concurrently in flex ? I don't know what you mean by concurrently. If you mean to have two patterns, one detecting "abcd" and the other "bcd", in the same flex program, the answer is yes: /abcd/ /bcd/ Will work. If the input pattern is "abcd" then /abcd/ will be active and /bcd/ will not. If the input pattern is "bcd? Then /bcd/ will be active and /abcd/ will not. > > Is it possible to find multiple string "abcd" and "cdef" concurrently in flex ? Again, I don't know what 'concurrently' means in this context. But the patterns: /abcd/ /cdef/ Will select the string "abcd" or "cdef" which ever is input. If you mean the ability to select /abcd/ and /cdef/ given "abcdef" then this requires a backup. That is given the input pattern "abcdef" then: /abcd/ will choose the substring "abcd". If on the next Iteration of the flex loop you want to detect "cdef" to simulate receiving "abcdcdef" then you have to return 2 characters to the input stream or you need to start a separate lex loop giving /ef/ as the match patterm > > If flex does not find them, is there any open tool which can find strings concurrently ? Again, there is some question as to what you mean by "concurrently" in this context. I am included a flex 'program' which you can use as you will. I believe that it is complex enough so that all of your questions are covered (however poorly). Good luck art > Thank you very much. ---------------------------------------------------------------------------- -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ -- Flex-help mailing list Fle...@li... https://lists.sourceforge.net/lists/listinfo/flex-help |
From: Michele B. <mic...@gm...> - 2015-07-07 07:30:01
|
2015-07-07 8:03 GMT+02:00 Chris verBurg <che...@gm...>: > Maybe I'm not interpreting your question correctly, but since you know > "abcd" and "bcd", wouldn't a match of pattern /abcd/ imply you got both > /abcd/ and /bcd/? Likewise, a match on pattern /abcdef/ tells you you got > both /abcd/ and /cdef/, right? > As far as I know, flex always returns the longest match, and in case of several match with same length, it select the first matching rule which appears in the lex file. Anyway it always return only one match. -- Mick |
From: Chris v. <che...@gm...> - 2015-07-07 06:04:25
|
Maybe I'm not interpreting your question correctly, but since you know "abcd" and "bcd", wouldn't a match of pattern /abcd/ imply you got both /abcd/ and /bcd/? Likewise, a match on pattern /abcdef/ tells you you got both /abcd/ and /cdef/, right? -Chris On Mon, Jul 6, 2015 at 2:04 AM, 최익성 <pn...@na...> wrote: > Dear flex experts. > > I have a question about lexical analysis. > > Is it possible to find multiple string "abcd" and "bcd" concurrently in > flex ? > > Is it possible to find multiple string "abcd" and "cdef" concurrently in > flex ? > > If flex does not find them, is there any open tool which can find strings > concurrently ? > > Thank you very much. > > ------------------------------------------------------------------------------ > Don't Limit Your Business. Reach for the Cloud. > GigeNET's Cloud Solutions provide you with the tools and support that > you need to offload your IT needs and focus on growing your business. > Configured For All Businesses. Start Your Cloud Today. > https://www.gigenetcloud.com/ > -- > Flex-help mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-help > |
From: 최익성 <pn...@na...> - 2015-07-06 09:04:49
|
Dear flex experts. I have a question about lexical analysis. Is it possible to find multiple string "abcd" and "bcd" concurrently in flex ? Is it possible to find multiple string "abcd" and "cdef" concurrently in flex ? If flex does not find them, is there any open tool which can find strings concurrently ? Thank you very much. |
From: David L. B. <dl...@um...> - 2015-02-09 21:49:19
|
Hi, I've taken the "Advanced Calculator" example from the Flex/Bison Oreilly book and made some minor modifications so that setting the debug does not cause a syntax error and I've added C style '()' and '{}' to the IF THEN ELSE and WHILE DO statements. When compiled the calculator works exactly as I expect it to. If instead of calling yyparse() from main I call a function I've defined in the third section of the scanner.l file called "call_readline()" void call_readline() { YY_BUFFER_STATE bp; char *f = readline(">>> "); add_history(f); bp = yy_scan_string(f); free(f); yy_switch_to_buffer(bp); yyparse(); /* eat the input */ yy_delete_buffer(bp); } which I took from a posting by John Levine, the parser returns "error: syntax error" message. I have added several print statements to the scanner and to the parser. Here is a typical session: >>> 1 + 1 TOKEN: 1 PARSE: NUMBER TOKEN: + TOKEN: 1 PARSE: NUMBER PARSE: exp + exp 1: error: syntax error >>> 2 * 2 TOKEN: 2 PARSE: NUMBER TOKEN: * TOKEN: 2 PARSE: NUMBER PARSE: exp * exp 1: error: syntax error >>> ^C I have no idea how to debug this. If I call yyparse() from main the scanner and parser both work. If I change the input from stdin to readline the scanner seems to work but, what is being forwarded to the parser seems to be different for identical input. Can someone provide guidance? Cheers, David |
From: Allen B. <Al...@ep...> - 2014-12-14 21:46:12
|
I found the issue to be with the way I was providing the input file to flex. I was redirecting file content to stdin which would interpret two consecutive NULLs as the end of the stream thus cutting flex off from the input data. The issue was on Windows and I haven't tested it on any *nix based systems. Thanks for sanity check. Allen Blaylock ________________________________________ From: Chris verBurg [che...@gm...] Sent: Saturday, December 13, 2014 4:48 AM To: Allen Blaylock Cc: fle...@li... Subject: Re: [Flex-help] Handle NULL-NULL in input stream. Hmm, I don't see flex skipping consecutive NULLs. Here's what I got: % perl -e 'open($fh, "> t.in"); print {$fh} "asdf\0\0\0asdf"; close($fh);' % cat t.l %option noyywrap %% . { printf("'%c' = %i\n", yytext[0], yytext[0]); } %% int main(int argc, char ** argv) { yylex(); return 0; } % flex -o t.c t.l % cc t.c % ./a.out < t.in 'a' = 97 's' = 115 'd' = 100 'f' = 102 '' = 0 '' = 0 '' = 0 'a' = 97 's' = 115 'd' = 100 'f' = 102 -Chris On Fri, Dec 12, 2014 at 6:34 AM, Allen Blaylock <Al...@ep...> wrote: > > Is there a way to get the parser to handle more than one consecutive NULL > in the input stream? > > Currently the parser generated by flex recognizes the consecutive NULLs as > an EOF which I do not want, > I would like to be able to tell the parser when the end of file is reached > explicitly through one of my rules calling yyterminate(). > > For instance say I had a blob of binary data in my input file that may > contain a sequence of NULL characters. > Luckily I know the length of the binary data because the preceding token > contains the following binary length. > With this information is it possible to tell the parser, move forward by N > bytes and not scan them? > > Allen Blaylock > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > -- > Flex-help mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-help > ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk -- Flex-help mailing list Fle...@li... https://lists.sourceforge.net/lists/listinfo/flex-help |
From: Allen B. <Al...@ep...> - 2014-12-14 21:44:58
|
I found the issue to be with the way I was providing the input file to flex. I was redirecting file content to stdin which would change the newline characters in the input. The issue was on Windows and I haven't tested it on any *nix based systems. Allen Blaylock ________________________________________ From: Chris verBurg [che...@gm...] Sent: Saturday, December 13, 2014 4:41 AM To: Allen Blaylock Cc: fle...@li... Subject: Re: [Flex-help] Including Newline and Carriage Return in yytext I tried that out on my machine (OSX, flex 2.5.35) and it works fine: % perl -e 'open($fh, q{>}, "t.in"); print {$fh} "MY STRING=asdf\r\n"; close($fh);' % ./a.out < t.in We got the string <asdf > Total match length is <16> long The characters in yytext are: <4d> <59> <20> <53> <54> <52> <49> <4e> <47> <3d> <61> <73> <64> <66> <d> <a> % My guess is that your input isn't what you think it is, maybe. Perhaps the \r and \n are swapped, or there is no \r at all? If you created the file on linux and imported it to Windows, it would likely be missing the extra \r, for example. I think I heard even later versions of Windows have stopped with the \r char. I did change two things in the code, though neither should be relevant: 1. added "%option noyywrap" 2. changed the printf of yyleng from %d to %zu. -Chris On Fri, Dec 12, 2014 at 6:29 AM, Allen Blaylock <Al...@ep...> wrote: > > Here is an example parser: > > > %option never-interactive > %option reentrant > > %{ > char my_buf[1024]; > %} > > %% > > MY[ ]STRING=.*\r?\n { > int i; > printf("We got the string <%s>\n",&(yytext[10])); > printf("Total match length is <%d> long\n",yyleng); > printf("The characters in yytext are:\n"); > for(i = 0; i < strlen(yytext); ++i) > { > printf("\t<%x>\n",(unsigned int)yytext[i]); > } > memcpy(my_buf,yytext,yyleng); > } > > . { > // Do nothing, just consume the text > } > > %% > > int main(int argc, char ** argv) > { > yyscan_t scanner; > yylex_init( &scanner ); > yyset_in(stdin, scanner); > yylex(scanner); > yylex_destroy(scanner); > return 0; > } > > > I then take this file and use it against the following text in a file: > > Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin ornare orci > id > neque imperdiet tristique. Nulla faucibus, lacus ut aliquet mollis, mi > tortor > fermentum tellus, suscipit MY STRING=THE QUICK BROWN FOX > bibendum lorem enim vitae massa. Nulla ultrices mi > vel lorem tempor tincidunt. Sed urna tortor, commodo at maximus eget, > pretium ac dolor. Nam auctor > > > ## END OF FILE > > It is important to note that the line endings for the above text should be > (carriage return)(line feed). > > What I would like to see in yytext is "MY STRING=THE QUICK BROWN > FOX(\0x0D\0x0A)" but what I get is: "MY STRING=THE QUICK BROWN FOX(\0x0A)" > so there is no carriage return. > It is important for my parser to be able to parse certain strings byte for > byte as they appear in the input file. > > > Allen Blaylock > > -----Original Message----- > From: Harjot kaur [mailto:har...@gm...] > Sent: Thursday, December 11, 2014 10:29 PM > To: Allen Blaylock > Subject: Re: [Flex-help] Including Newline and Carriage Return in yytext > > On Fri, Dec 12, 2014 at 2:51 AM, Allen Blaylock <Al...@ep...> > wrote: > > I am struggling to get the newline and carriage return characters to > appear in the yytext "buffer." > > > I think it shows your token is not declared in yytext or yylval. > > For instance let's say I was trying to match with the rule: > > > > MY[ ]STRING=.*\r?\n { > > memcpy(my_buf,yytext,yyleng); } > Are u using this function in Flex file or Bison file? > > > > Which you can see copies the buffer exactly to my_buf. Unfortunatly I > only get the portion of the string matched by MY[ ]STRING=.* but not the > optional carriage return or newline character. > > Is it solved out or not? Please show your code. > > > > > -- > Harjot Kaur > Blog: harjotpandher93.wordpress.com > github: https://github.com/jotpandher > " ਕੰਮ ਨਾਲ ਪਿਆਰ ਕਰੀਏ ਫਿਰ ਕੰਮ ਹੀ ਸ਼ੋਂਕ ਬਣ ਜਾਏਗਾ " > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > -- > Flex-help mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-help > ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk -- Flex-help mailing list Fle...@li... https://lists.sourceforge.net/lists/listinfo/flex-help |
From: Chris v. <che...@gm...> - 2014-12-13 11:49:25
|
Hmm, I don't see flex skipping consecutive NULLs. Here's what I got: % perl -e 'open($fh, "> t.in"); print {$fh} "asdf\0\0\0asdf"; close($fh);' % cat t.l %option noyywrap %% . { printf("'%c' = %i\n", yytext[0], yytext[0]); } %% int main(int argc, char ** argv) { yylex(); return 0; } % flex -o t.c t.l % cc t.c % ./a.out < t.in 'a' = 97 's' = 115 'd' = 100 'f' = 102 '' = 0 '' = 0 '' = 0 'a' = 97 's' = 115 'd' = 100 'f' = 102 -Chris On Fri, Dec 12, 2014 at 6:34 AM, Allen Blaylock <Al...@ep...> wrote: > > Is there a way to get the parser to handle more than one consecutive NULL > in the input stream? > > Currently the parser generated by flex recognizes the consecutive NULLs as > an EOF which I do not want, > I would like to be able to tell the parser when the end of file is reached > explicitly through one of my rules calling yyterminate(). > > For instance say I had a blob of binary data in my input file that may > contain a sequence of NULL characters. > Luckily I know the length of the binary data because the preceding token > contains the following binary length. > With this information is it possible to tell the parser, move forward by N > bytes and not scan them? > > Allen Blaylock > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > -- > Flex-help mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-help > |
From: Chris v. <che...@gm...> - 2014-12-13 11:41:46
|
I tried that out on my machine (OSX, flex 2.5.35) and it works fine: % perl -e 'open($fh, q{>}, "t.in"); print {$fh} "MY STRING=asdf\r\n"; close($fh);' % ./a.out < t.in We got the string <asdf > Total match length is <16> long The characters in yytext are: <4d> <59> <20> <53> <54> <52> <49> <4e> <47> <3d> <61> <73> <64> <66> <d> <a> % My guess is that your input isn't what you think it is, maybe. Perhaps the \r and \n are swapped, or there is no \r at all? If you created the file on linux and imported it to Windows, it would likely be missing the extra \r, for example. I think I heard even later versions of Windows have stopped with the \r char. I did change two things in the code, though neither should be relevant: 1. added "%option noyywrap" 2. changed the printf of yyleng from %d to %zu. -Chris On Fri, Dec 12, 2014 at 6:29 AM, Allen Blaylock <Al...@ep...> wrote: > > Here is an example parser: > > > %option never-interactive > %option reentrant > > %{ > char my_buf[1024]; > %} > > %% > > MY[ ]STRING=.*\r?\n { > int i; > printf("We got the string <%s>\n",&(yytext[10])); > printf("Total match length is <%d> long\n",yyleng); > printf("The characters in yytext are:\n"); > for(i = 0; i < strlen(yytext); ++i) > { > printf("\t<%x>\n",(unsigned int)yytext[i]); > } > memcpy(my_buf,yytext,yyleng); > } > > . { > // Do nothing, just consume the text > } > > %% > > int main(int argc, char ** argv) > { > yyscan_t scanner; > yylex_init( &scanner ); > yyset_in(stdin, scanner); > yylex(scanner); > yylex_destroy(scanner); > return 0; > } > > > I then take this file and use it against the following text in a file: > > Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin ornare orci > id > neque imperdiet tristique. Nulla faucibus, lacus ut aliquet mollis, mi > tortor > fermentum tellus, suscipit MY STRING=THE QUICK BROWN FOX > bibendum lorem enim vitae massa. Nulla ultrices mi > vel lorem tempor tincidunt. Sed urna tortor, commodo at maximus eget, > pretium ac dolor. Nam auctor > > > ## END OF FILE > > It is important to note that the line endings for the above text should be > (carriage return)(line feed). > > What I would like to see in yytext is "MY STRING=THE QUICK BROWN > FOX(\0x0D\0x0A)" but what I get is: "MY STRING=THE QUICK BROWN FOX(\0x0A)" > so there is no carriage return. > It is important for my parser to be able to parse certain strings byte for > byte as they appear in the input file. > > > Allen Blaylock > > -----Original Message----- > From: Harjot kaur [mailto:har...@gm...] > Sent: Thursday, December 11, 2014 10:29 PM > To: Allen Blaylock > Subject: Re: [Flex-help] Including Newline and Carriage Return in yytext > > On Fri, Dec 12, 2014 at 2:51 AM, Allen Blaylock <Al...@ep...> > wrote: > > I am struggling to get the newline and carriage return characters to > appear in the yytext "buffer." > > > I think it shows your token is not declared in yytext or yylval. > > For instance let's say I was trying to match with the rule: > > > > MY[ ]STRING=.*\r?\n { > > memcpy(my_buf,yytext,yyleng); } > Are u using this function in Flex file or Bison file? > > > > Which you can see copies the buffer exactly to my_buf. Unfortunatly I > only get the portion of the string matched by MY[ ]STRING=.* but not the > optional carriage return or newline character. > > Is it solved out or not? Please show your code. > > > > > -- > Harjot Kaur > Blog: harjotpandher93.wordpress.com > github: https://github.com/jotpandher > " ਕੰਮ ਨਾਲ ਪਿਆਰ ਕਰੀਏ ਫਿਰ ਕੰਮ ਹੀ ਸ਼ੋਂਕ ਬਣ ਜਾਏਗਾ " > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > -- > Flex-help mailing list > Fle...@li... > https://lists.sourceforge.net/lists/listinfo/flex-help > |
From: Allen B. <Al...@ep...> - 2014-12-12 14:35:05
|
Is there a way to get the parser to handle more than one consecutive NULL in the input stream? Currently the parser generated by flex recognizes the consecutive NULLs as an EOF which I do not want, I would like to be able to tell the parser when the end of file is reached explicitly through one of my rules calling yyterminate(). For instance say I had a blob of binary data in my input file that may contain a sequence of NULL characters. Luckily I know the length of the binary data because the preceding token contains the following binary length. With this information is it possible to tell the parser, move forward by N bytes and not scan them? Allen Blaylock |
From: Allen B. <Al...@ep...> - 2014-12-12 14:29:50
|
Here is an example parser: %option never-interactive %option reentrant %{ char my_buf[1024]; %} %% MY[ ]STRING=.*\r?\n { int i; printf("We got the string <%s>\n",&(yytext[10])); printf("Total match length is <%d> long\n",yyleng); printf("The characters in yytext are:\n"); for(i = 0; i < strlen(yytext); ++i) { printf("\t<%x>\n",(unsigned int)yytext[i]); } memcpy(my_buf,yytext,yyleng); } . { // Do nothing, just consume the text } %% int main(int argc, char ** argv) { yyscan_t scanner; yylex_init( &scanner ); yyset_in(stdin, scanner); yylex(scanner); yylex_destroy(scanner); return 0; } I then take this file and use it against the following text in a file: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin ornare orci id neque imperdiet tristique. Nulla faucibus, lacus ut aliquet mollis, mi tortor fermentum tellus, suscipit MY STRING=THE QUICK BROWN FOX bibendum lorem enim vitae massa. Nulla ultrices mi vel lorem tempor tincidunt. Sed urna tortor, commodo at maximus eget, pretium ac dolor. Nam auctor ## END OF FILE It is important to note that the line endings for the above text should be (carriage return)(line feed). What I would like to see in yytext is "MY STRING=THE QUICK BROWN FOX(\0x0D\0x0A)" but what I get is: "MY STRING=THE QUICK BROWN FOX(\0x0A)" so there is no carriage return. It is important for my parser to be able to parse certain strings byte for byte as they appear in the input file. Allen Blaylock -----Original Message----- From: Harjot kaur [mailto:har...@gm...] Sent: Thursday, December 11, 2014 10:29 PM To: Allen Blaylock Subject: Re: [Flex-help] Including Newline and Carriage Return in yytext On Fri, Dec 12, 2014 at 2:51 AM, Allen Blaylock <Al...@ep...> wrote: > I am struggling to get the newline and carriage return characters to appear in the yytext "buffer." > I think it shows your token is not declared in yytext or yylval. > For instance let's say I was trying to match with the rule: > > MY[ ]STRING=.*\r?\n { > memcpy(my_buf,yytext,yyleng); } Are u using this function in Flex file or Bison file? > > Which you can see copies the buffer exactly to my_buf. Unfortunatly I only get the portion of the string matched by MY[ ]STRING=.* but not the optional carriage return or newline character. Is it solved out or not? Please show your code. -- Harjot Kaur Blog: harjotpandher93.wordpress.com github: https://github.com/jotpandher " ਕੰਮ ਨਾਲ ਪਿਆਰ ਕਰੀਏ ਫਿਰ ਕੰਮ ਹੀ ਸ਼ੋਂਕ ਬਣ ਜਾਏਗਾ " |
From: Allen B. <Al...@ep...> - 2014-12-11 21:21:45
|
I am struggling to get the newline and carriage return characters to appear in the yytext "buffer." For instance let's say I was trying to match with the rule: MY[ ]STRING=.*\r?\n { memcpy(my_buf,yytext,yyleng); } Which you can see copies the buffer exactly to my_buf. Unfortunatly I only get the portion of the string matched by MY[ ]STRING=.* but not the optional carriage return or newline character. How does one change this behavior? |
From: Eric S. R. <es...@th...> - 2014-10-14 21:45:52
|
John P. Hartmann <jph...@gm...>: > You should test for EOF only if fread returns 0; If res is positive, > you should go with that. Next time you read, you'll see EOF. As I noted privately before I subscribed successfully: You are correct, but this glitch is irrelevant to the actual problem, which is that the lexer doesn't seem to be detecting EOF properly no matter when O return it. Accordingly, the misbehavior did not change when I fixed this glitch. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> |
From: John P. H. <jph...@gm...> - 2014-10-14 12:55:55
|
You should test for EOF only if fread returns 0; If res is positive, you should go with that. Next time you read, you'll see EOF. On 14/10/14 14:33, Will Estes wrote: > static ssize_t custom_input(char *buf, size_t result_max, yyscan_t yyscanner) >>{ >> ssize_t res = fread(buf, 1, result_max, yyget_in(yyscanner)); >> >> if (feof(yyget_in(yyscanner)) || ferror(yyget_in(yyscanner))) >> res = YY_NULL; >> if (yydebug) >> fprintf(stderr, "custom_input(..., %ld, ...) -> %zd\n", result_max, res); >> return res; >>} |
From: Will E. <wes...@gm...> - 2014-10-14 12:33:32
|
Yeah that's definitely odd. CC'ing flex-help to get it logged there. First thing, though: Do you notice this with later versions of flex? I expect the answer is "Yes" but let's find out if you've tried that yet. On Tuesday, 14 October 2014, 8:26 am -0400, "Eric S. Raymond" <es...@th...> wrote: > My apologies for sending this directly to you, but my attempt to > subscribe to flex-help seems to have blackholed. > > There seems to be something either buggy or very badly documented (I'm hoping > the latter) about the way flex 2.5.35 generates scanners with the following > options: > > %option reentrant bison-bridge > %option warn nodefault > %option pointer > %option noyywrap noyyget_extra noyyget_leng noyyset_lineno > %option noyyget_out noyyset_out noyyget_lval noyyset_lval > %option noyyget_lloc noyyset_lloc noyyget_debug noyyset_debug > > My situation is this. I maintain cvs-fast-export, which uses a > Bison/Flex parser (using Bison 3.0.2) to digest CVS master files. > High speed is extremely important in this application, as the data > sets (legacy CVs repositories) are often extremely large; the parser > is reentrant so master parsing can be multithreaded for higher > performance. The code is availale at > > https://gitorious.org/cvs-fast-export/cvs-fast-export.git > > and is readily tested with > > cvs-fast-export -v tests/basic.repo/basic/README,v > > from the repo directory. > > I would like to use options like fast, read, batch and > never-interactive, but am blocked from doing so by behaviors I don't > understand. > > The scanner code contains the following hack, inserted in the preamble at > some time in the past and recently modified to use yyget_in(yyscanner) > when I made the scanner re-entrant: > > #define YY_INPUT(buf,result,max_size) { \ > int c = getc(yyget_in(yyscanner)); \ > result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \ > > I would like to remove this and use flex's native input handling, > because this limits the scanner to character-at-a-time input. But when > I do so the parser hangs forever. Running strace reveals that it is > repeatedly reading no input, > > read(3, "", 4096) = 0 > read(3, "", 4096) = 0 > read(3, "", 4096) = 0 > read(3, "", 4096) = 0 > read(3, "", 4096) = 0 > > having apparently failed to recognize EOF. Setting %option batch > never-interactive does not change this. > > My setup code looks like this: > > yyscan_t scanner; > FILE *in; > cvs_file *cvs; > > in = fopen(name, "r"); > > yylex_init(&scanner); > yyset_in(in, scanner); > yyparse(scanner, cvs); > yylex_destroy(scanner); > > fclose(in); > > which if I understand the documentation correctly ought to be sufficient > to set up the default input machinery. My first question is: what is > wrong here? Why is the scanner failing to recognize EOF when using the stock > YY_INPUT? > > I tried replacing the custom YY_INPUT with this logiically equivalent function: > > #define YY_INPUT(buf,result,max_size) result = custom_input(buf, max_size, yyscanner); > > static ssize_t custom_input(char *buf, size_t result_max, yyscan_t yyscanner) > { > int c = getc(yyget_in(yyscanner)); > ssize_t res = (c == EOF) ? YY_NULL : (buf[0] = c, 1); > if (yydebug) > fprintf(stderr, "custom_input(..., %ld, ...) -> %zd\n", result_max, res); > return res; > } > > That works but still limits throughput because I/O is being done a > character at a time. When I replace the custom input function with > this: > > #define YY_INPUT(buf,result,max_size) result = custom_input(buf, max_size, yyscanner); > > static ssize_t custom_input(char *buf, size_t result_max, yyscan_t yyscanner) > { > ssize_t res = fread(buf, 1, result_max, yyget_in(yyscanner)); > > if (feof(yyget_in(yyscanner)) || ferror(yyget_in(yyscanner))) > res = YY_NULL; > if (yydebug) > fprintf(stderr, "custom_input(..., %ld, ...) -> %zd\n", result_max, res); > return res; > } > > the scanner sucks in the entire input file on the first read, raises > end of input immediately, and *doesn't parse tokens out of the input > buffer*. > > esr@snark:~/WWW/cvs-fast-export$ cvs-fast-export -v tests/basic.repo/basic/README,v >/dev/null > Starting parse > Entering state 0 > Reading a token: custom_input(..., 8192, ...) -> 0 > Now at end of input. > Reducing stack by rule 3 (line 105): > -> $$ = nterm headers () > Stack now 0 > Entering state 9 > Reducing stack by rule 25 (line 161): > -> $$ = nterm revisions () > Stack now 0 9 > Entering state 21 > Now at end of input. > parse error syntax error at > > My second question is: why doesn't the generated scanner parse remaining > tokens out of the input buffer when it reaches EOF? > > The documentation is unhelpful on these points and in general vague about > how input is acquired and buffered. I have read it very closely but am > unable to determine whether I am seeing the expected behavior. > -- > <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> > > All governments are more or less combinations against the > people. . .and as rulers have no more virtue than the ruled. . . > the power of government can only be kept within its constituted > bounds by the display of a power equal to itself, the collected > sentiment of the people. > -- Benjamin Franklin Bache, in a Phildelphia Aurora editorial 1794 |