waxeye-users Mailing List for Waxeye Parser Generator
Brought to you by:
orlandodarhill
You can subscribe to this list here.
2008 |
Jan
|
Feb
(1) |
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(6) |
Sep
|
Oct
|
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2009 |
Jan
|
Feb
|
Mar
|
Apr
(4) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
(4) |
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
|
Dec
(2) |
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
(7) |
Mar
(4) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(5) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(3) |
Sep
|
Oct
|
Nov
(3) |
Dec
|
From: Orlando H. <orl...@gm...> - 2017-11-28 12:42:55
|
Hi Huib, Sorry for the slow reply. Support for Unicode has improved, but isn't complete. Looks like Gleb has given a more detailed answer on GitHub. https://github.com/orlandohill/waxeye/issues/82 Regards, Orlando On Nov 27, 2017 16:20, "Huib Verweij via Waxeye-users" < wax...@li...> wrote: Hi, in 2014 there was a thread asking about Unicode support. At the time the support for unicode was not great. Does waxeye have unicode support now? We run into difficulties with UTF-8 files that contain unicode characters. It works on Windows, not on Linux, which prevents us from Dockerizing our product. Thanks for your time. Regards, Huib. ------------------------------------------------------------ ------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Waxeye-users mailing list Wax...@li... https://lists.sourceforge.net/lists/listinfo/waxeye-users |
From: Huib V. <Hui...@ko...> - 2017-11-28 09:02:14
|
Hi Darmie, thanks for your positive reply. I compiled the latest github version but still get errors. So I tried modifying the document according to https://github.com/orlandohill/waxeye/issues/31 and that reports errors too. Is issue #31 describing the implemented syntax for specifying Unicode characters? The part of my document where I get errors (using the syntax described in #31) is: ------------------- getal <- +[0-9] *( ?[.,] +[0-9] ) woord <- +[A-Za-z0-9\x{C0}-\x{D6}\x{D9}-\x{FF}] sp <- +[\t\n\r \x{00}-\x{20}\x{A0}] lt <- +([(),.:;"\-_+*] | apostrophe) apostrophe <- [\x{2018}`'\x{2032}\x{B4}\x{2019}] # \x2032 is “prime" ————————— and the error I get is: ————————— string-append: contract violation expected: string? given: #<path:/lx/tmp/jetty-0.0.0.0-80-cocoon.war-_-any-1677620253370402457.dir/webapp/linkextractor/links/grammars/document.waxeye> argument position: 2nd other arguments...: "syntax error in grammar " "\n" "33:21 expected: [hex, char] received: x\nwoord <- +[A-Za-z0-9\\x{C0}-\\x{D6}\... context...: /lx/waxeye/src/waxeye/load.rkt:55:0: resolve-modular /usr/share/racket/collects/racket/list.rkt:563:2: append-map /lx/waxeye/src/waxeye/load.rkt:24:0: load-grammar /lx/waxeye/src/waxeye/main.rkt:33:0: main #%mzc:waxeye: [running body] loop ------------------- Hartelijke groet, Huib Verweij Informaticus ……………………………………………………………… Kennis- en Exploitatiecentrum Officiële Overheidspublicaties Uitvoeringsorganisatie Bedrijfsvoering Rijk Ministerie van Binnenlandse Zaken en Koninkrijksrelaties Wilhelmina van Pruisenweg 52 | 2595 AN Den Haag | Oranje, 1e etage Postbus 20011 | 2500 EA Den Haag sec...@ko...<mailto:sec...@ko...> | T 0707000525 ……………………………………………………………… M 0618943941 hui...@ko...<mailto:len...@ko...> http://www.UBRijk.nl<http://www.ubrijk.nl/> Op 27 nov. 2017, om 16:32 heeft Darmie Akinlaja <dre...@gm...<mailto:dre...@gm...>> het volgende geschreven: Yes, the recent master should have Unicode support. On Nov 27, 2017 16:20, "Huib Verweij via Waxeye-users" <wax...@li...<mailto:wax...@li...>> wrote: Hi, in 2014 there was a thread asking about Unicode support. At the time the support for unicode was not great. Does waxeye have unicode support now? We run into difficulties with UTF-8 files that contain unicode characters. It works on Windows, not on Linux, which prevents us from Dockerizing our product. Thanks for your time. Regards, Huib. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org<http://Slashdot.org>! http://sdm.link/slashdot _______________________________________________ Waxeye-users mailing list Wax...@li...<mailto:Wax...@li...> https://lists.sourceforge.net/lists/listinfo/waxeye-users |
From: Huib V. <Hui...@ko...> - 2017-11-27 15:20:21
|
Hi, in 2014 there was a thread asking about Unicode support. At the time the support for unicode was not great. Does waxeye have unicode support now? We run into difficulties with UTF-8 files that contain unicode characters. It works on Windows, not on Linux, which prevents us from Dockerizing our product. Thanks for your time. Regards, Huib. |
From: Darmie A. <dre...@gm...> - 2017-08-22 17:00:33
|
Oh yes. I sent this before opening the GitHub issue. Thanks for reaching out. On Tue, 22 Aug 2017, 17:58 Orlando Hill <orl...@gm...> wrote: > Hi Darmie, > > Were you able to solve the problem with the suggestions in the GitHub > issue? > > Kind regards, > Orlando > > On Aug 15, 2017 19:01, "Darmie Akinlaja" <dre...@gm...> wrote: > > Hello , > I have read the Waxeye documentation but I currently needed to better > understand the Waxeye grammar definitions for a full language. I want to > parse a language quite similar to coffeescript but strictly typed, I need > clarifications on how to go about defining the grammar. I have attached > sample language and a grammar I wrote in this email. > > > Looking forward to learning from you on how to properly define waxeye > grammars, I would really appreciate your help. > > Thank you. > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Waxeye-users mailing list > Wax...@li... > https://lists.sourceforge.net/lists/listinfo/waxeye-users > > > |
From: Orlando H. <orl...@gm...> - 2017-08-22 16:58:54
|
Hi Darmie, Were you able to solve the problem with the suggestions in the GitHub issue? Kind regards, Orlando On Aug 15, 2017 19:01, "Darmie Akinlaja" <dre...@gm...> wrote: Hello , I have read the Waxeye documentation but I currently needed to better understand the Waxeye grammar definitions for a full language. I want to parse a language quite similar to coffeescript but strictly typed, I need clarifications on how to go about defining the grammar. I have attached sample language and a grammar I wrote in this email. Looking forward to learning from you on how to properly define waxeye grammars, I would really appreciate your help. Thank you. ------------------------------------------------------------ ------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Waxeye-users mailing list Wax...@li... https://lists.sourceforge.net/lists/listinfo/waxeye-users |
From: Darmie A. <dre...@gm...> - 2017-08-15 16:01:46
|
Hello , I have read the Waxeye documentation but I currently needed to better understand the Waxeye grammar definitions for a full language. I want to parse a language quite similar to coffeescript but strictly typed, I need clarifications on how to go about defining the grammar. I have attached sample language and a grammar I wrote in this email. Looking forward to learning from you on how to properly define waxeye grammars, I would really appreciate your help. Thank you. |
From: Ani A <ani...@gm...> - 2016-07-06 09:07:23
|
On Wed, Jul 6, 2016 at 2:11 PM, Orlando Hill <orl...@gm...> wrote: > Perhaps something like this? > > prog <- module copen stmts cclose > module <- :'module' ws id > id <- +[0-9a-zA-Z] ws > stmts <- *stmt > copen <: '{' ws > cclose <: '{' ws > ws <: *[ \t\n\r] > > You could later extend prog to have an optional sub module. > > prog <- module ?submodule copen stmts cclose > submodule <- :'submodule' ws id > > The parsers currently perform a lot of stack allocations per character of > input. This limits the size of strings that can be safely parsed, except in > the Scheme/Racket version. > > Regards, > Orlando > Oh ok, I will try that, Thank you. -- Regards, Ani |
From: Orlando H. <orl...@gm...> - 2016-07-06 08:41:45
|
Perhaps something like this? prog <- module copen stmts cclose module <- :'module' ws id id <- +[0-9a-zA-Z] ws stmts <- *stmt copen <: '{' ws cclose <: '{' ws ws <: *[ \t\n\r] You could later extend prog to have an optional sub module. prog <- module ?submodule copen stmts cclose submodule <- :'submodule' ws id The parsers currently perform a lot of stack allocations per character of input. This limits the size of strings that can be safely parsed, except in the Scheme/Racket version. Regards, Orlando On Jul 6, 2016 11:01 AM, "Ani A" <ani...@gm...> wrote: > On Wed, Jul 6, 2016 at 6:33 AM, Orlando Hill <orl...@gm...> > wrote: > > Hello Ani, > > > > I would recommend putting your 'module' keyword in a separate > non-terminal > > definition, just as you did with 'id'. You can use the C helper function > > ast_children_as_string to convert an ast node containing only characters > > into a single string (char*). > > > > If having the 'module' keyword in your AST won't give any useful > > information, you may want to use the voiding non-terminal or voiding > > operator. > > > > e.g. > > module <: 'module' > > or > > prog <- :'module' ws id ws '{' stmts '}' > > > > I considered adding non-terminal and operator types for producing a > string > > rather than a list of characters, but didn't add them at the time to keep > > the feature set smaller. > > > > Yes, the + and * operators are designed to be used regardless of the > target > > language. > > > > Unfortunately, I think you'll run into problems parsing large strings > with > > the C version, so it's probably only useful if you're prototyping. > > > > I have intended to restart development on Waxeye a number of times, but > > there have been delays. > > > > Regards, > > Orlando > > > > Thanks for the quick response Orlando, > I will give the ast_children_as_string() a try. I wanted to have the > 'module' string in the AST, because I might > add another production to have a 'sub-module' there - so it will be > useful to have the string in the generated AST. > > I am just beginning to prototype a parser for one of the > schema-definition language I use, > When you say _problems_ with strings, does it mean problem with > input_as_string() or the generated code > itself can be slow (and have huge stack allocations) > Would it be overcome with the use of input_from_file() ? > > -- > Regards, > Ani > |
From: Ani A <ani...@gm...> - 2016-07-06 08:01:05
|
On Wed, Jul 6, 2016 at 6:33 AM, Orlando Hill <orl...@gm...> wrote: > Hello Ani, > > I would recommend putting your 'module' keyword in a separate non-terminal > definition, just as you did with 'id'. You can use the C helper function > ast_children_as_string to convert an ast node containing only characters > into a single string (char*). > > If having the 'module' keyword in your AST won't give any useful > information, you may want to use the voiding non-terminal or voiding > operator. > > e.g. > module <: 'module' > or > prog <- :'module' ws id ws '{' stmts '}' > > I considered adding non-terminal and operator types for producing a string > rather than a list of characters, but didn't add them at the time to keep > the feature set smaller. > > Yes, the + and * operators are designed to be used regardless of the target > language. > > Unfortunately, I think you'll run into problems parsing large strings with > the C version, so it's probably only useful if you're prototyping. > > I have intended to restart development on Waxeye a number of times, but > there have been delays. > > Regards, > Orlando > Thanks for the quick response Orlando, I will give the ast_children_as_string() a try. I wanted to have the 'module' string in the AST, because I might add another production to have a 'sub-module' there - so it will be useful to have the string in the generated AST. I am just beginning to prototype a parser for one of the schema-definition language I use, When you say _problems_ with strings, does it mean problem with input_as_string() or the generated code itself can be slow (and have huge stack allocations) Would it be overcome with the use of input_from_file() ? -- Regards, Ani |
From: Orlando H. <orl...@gm...> - 2016-07-06 01:03:34
|
Hello Ani, I would recommend putting your 'module' keyword in a separate non-terminal definition, just as you did with 'id'. You can use the C helper function ast_children_as_string to convert an ast node containing only characters into a single string (char*). If having the 'module' keyword in your AST won't give any useful information, you may want to use the voiding non-terminal or voiding operator. e.g. module <: 'module' or prog <- :'module' ws id ws '{' stmts '}' I considered adding non-terminal and operator types for producing a string rather than a list of characters, but didn't add them at the time to keep the feature set smaller. Yes, the + and * operators are designed to be used regardless of the target language. Unfortunately, I think you'll run into problems parsing large strings with the C version, so it's probably only useful if you're prototyping. I have intended to restart development on Waxeye a number of times, but there have been delays. Regards, Orlando On Jul 4, 2016 1:22 PM, "Ani A" <ani...@gm...> wrote: > Hello, > > I just started using waxeye, and I really liked the idea of auto > generating AST from the > PEG grammar :) > I read the manual, and am trying some simple grammars to get a hang, > I tried something like: > > prog <- 'module' ws id ws '{' stmts '}' > id <- +[0-9a-zA-Z] > stmts ... > ws ... > > I see that a sample parse dump looks like this: > > prog > | m > | o > | d > | u > | l > | e > -> id > | a > | b > | c > > > is there any way to prune the tree such that I get the matched > terminal 'module' in a single node? > (or, in other words, in the AST, to know that 'module' was matched) > > Also, I am using the C generator, so does 'closure'(*) and > 'plus-closure'(+) make any sense in the C > generator, or they are useful only when generating for a higher level > language ? > > P.S using Waxeye v0.8.0 on Ubuntu linux > > Thanks. > -- > Ani > > > ------------------------------------------------------------------------------ > Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San > Francisco, CA to explore cutting-edge tech and listen to tech luminaries > present their vision of the future. This family event has something for > everyone, including kids. Get more information and register today. > http://sdm.link/attshape > _______________________________________________ > Waxeye-users mailing list > Wax...@li... > https://lists.sourceforge.net/lists/listinfo/waxeye-users > |
From: Ani A <ani...@gm...> - 2016-07-04 10:22:37
|
Hello, I just started using waxeye, and I really liked the idea of auto generating AST from the PEG grammar :) I read the manual, and am trying some simple grammars to get a hang, I tried something like: prog <- 'module' ws id ws '{' stmts '}' id <- +[0-9a-zA-Z] stmts ... ws ... I see that a sample parse dump looks like this: prog | m | o | d | u | l | e -> id | a | b | c is there any way to prune the tree such that I get the matched terminal 'module' in a single node? (or, in other words, in the AST, to know that 'module' was matched) Also, I am using the C generator, so does 'closure'(*) and 'plus-closure'(+) make any sense in the C generator, or they are useful only when generating for a higher level language ? P.S using Waxeye v0.8.0 on Ubuntu linux Thanks. -- Ani |
From: Orlando H. <orl...@gm...> - 2014-03-19 06:39:30
|
Hi Marcin, Certainly. I'll get back to you, in a few days time. Regards, Orlando On Wed, Mar 19, 2014 at 1:01 AM, Marcin Wojnarski <mwo...@ns...>wrote: > Hi Orlando, > > Thanks for the reply. Yes, for me Python is the priority. I'd be grateful > if you release new version as soon as possible. Can you please drop me an > email when this happens? > > Best > Marcin > > > On 03/16/2014 02:47 PM, Orlando Hill wrote: > > Hi Marcin, > > Thanks for the detailed email. > > You're right, it's a serious problem. > > I have designed a new parsing runtime that fixes the performance > problems. > https://github.com/orlandohill/waxeye/blob/master/src/sml/waxeye.sml > > My hope is that, in a couple of days time, I'll finally be able to take > the time to finish the new version. > > I can update the Python runtime, first, if that's the one you're using > most. > > Regards, > Orlando > > > On Sun, Mar 16, 2014 at 6:10 AM, Marcin Wojnarski <mwo...@ns...>wrote: > >> Dear Orlando, >> >> Firstly, thanks for Waxeye. I use it in several Python projects, e.g., >> web scraping in the Nifty library (https://github.com/mwojnars/nifty/), >> and I like it very much. >> >> However, I've just came across a serious issue with generated Python >> parsers. They can't parse long sequences of characters if they fall >> under single non-terminal, and when such an input sequence is given, >> they crash with the "RuntimeError: maximum recursion depth exceeded in >> cmp" exception. >> >> For example, with this extremely simple 1-line grammar: >> >> document <- *. >> >> If you try to parse a text longer than ~500 characters, the parser will >> crash. If the grammar rule is a bit more complex, like: >> >> document <- *(!'<' !'&' .) >> >> the parser will crash already after ~130 characters (on my comp, exact >> numbers may depend on python system settings etc). >> >> I can't find any sensible workaround for this. For example, the >> following grammar also causes crashes: >> >> char <- . >> document <- *char >> >> I might try limiting the length of sequences parsed by a single >> non-terminal, to split all input into chunks of lenght, say, up to 100 >> characters, but there is no way to do this, because one can't restrict >> the maximum number of repetitions parsed in *() or +() rules. >> >> This issue makes using Waxeye extremely risky and unreliable. You can >> stumble upon this bug in literally EVERY type of language. And the exact >> behavior depends on specific input data, so you can live for a long time >> with strong conviction that you have a great parser, until a day comes >> when somebody feeds an atypical input and the parser would crash. >> >> Besides, this bug indicates an inefficiency that's present even when the >> parser runs correctly. Namely, how can it be that rules like *(X) cause >> the parser to make a nested function call on every consecutive match of >> the (X) expression? There's something wrong with the implementation, I >> don't believe that PEG by itself might enforce such nested calls. >> >> I'd be grateful for your help in fixing this. I like Waxeye and dislike >> the idea of switching to another tool. >> >> Thanks, >> Marcin >> >> >> >> ------------------------------------------------------------------------------ >> Learn Graph Databases - Download FREE O'Reilly Book >> "Graph Databases" is the definitive new guide to graph databases and their >> applications. Written by three acclaimed leaders in the field, >> this first edition is now available. Download your free book today! >> http://p.sf.net/sfu/13534_NeoTech >> _______________________________________________ >> Waxeye-users mailing list >> Wax...@li... >> https://lists.sourceforge.net/lists/listinfo/waxeye-users >> > > > |
From: Marcin W. <mwo...@ns...> - 2014-03-18 12:01:42
|
Hi Orlando, Thanks for the reply. Yes, for me Python is the priority. I'd be grateful if you release new version as soon as possible. Can you please drop me an email when this happens? Best Marcin On 03/16/2014 02:47 PM, Orlando Hill wrote: > Hi Marcin, > > Thanks for the detailed email. > > You're right, it's a serious problem. > > I have designed a new parsing runtime that fixes the performance problems. > https://github.com/orlandohill/waxeye/blob/master/src/sml/waxeye.sml > > My hope is that, in a couple of days time, I'll finally be able to > take the time to finish the new version. > > I can update the Python runtime, first, if that's the one you're using > most. > > Regards, > Orlando > > > On Sun, Mar 16, 2014 at 6:10 AM, Marcin Wojnarski <mwo...@ns... > <mailto:mwo...@ns...>> wrote: > > Dear Orlando, > > Firstly, thanks for Waxeye. I use it in several Python projects, e.g., > web scraping in the Nifty library > (https://github.com/mwojnars/nifty/), > and I like it very much. > > However, I've just came across a serious issue with generated Python > parsers. They can't parse long sequences of characters if they fall > under single non-terminal, and when such an input sequence is given, > they crash with the "RuntimeError: maximum recursion depth exceeded in > cmp" exception. > > For example, with this extremely simple 1-line grammar: > > document <- *. > > If you try to parse a text longer than ~500 characters, the parser > will > crash. If the grammar rule is a bit more complex, like: > > document <- *(!'<' !'&' .) > > the parser will crash already after ~130 characters (on my comp, exact > numbers may depend on python system settings etc). > > I can't find any sensible workaround for this. For example, the > following grammar also causes crashes: > > char <- . > document <- *char > > I might try limiting the length of sequences parsed by a single > non-terminal, to split all input into chunks of lenght, say, up to 100 > characters, but there is no way to do this, because one can't restrict > the maximum number of repetitions parsed in *() or +() rules. > > This issue makes using Waxeye extremely risky and unreliable. You can > stumble upon this bug in literally EVERY type of language. And the > exact > behavior depends on specific input data, so you can live for a > long time > with strong conviction that you have a great parser, until a day comes > when somebody feeds an atypical input and the parser would crash. > > Besides, this bug indicates an inefficiency that's present even > when the > parser runs correctly. Namely, how can it be that rules like *(X) > cause > the parser to make a nested function call on every consecutive > match of > the (X) expression? There's something wrong with the implementation, I > don't believe that PEG by itself might enforce such nested calls. > > I'd be grateful for your help in fixing this. I like Waxeye and > dislike > the idea of switching to another tool. > > Thanks, > Marcin > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases > and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > Waxeye-users mailing list > Wax...@li... > <mailto:Wax...@li...> > https://lists.sourceforge.net/lists/listinfo/waxeye-users > > |
From: Orlando H. <orl...@gm...> - 2014-03-16 13:47:09
|
Hi Marcin, Thanks for the detailed email. You're right, it's a serious problem. I have designed a new parsing runtime that fixes the performance problems. https://github.com/orlandohill/waxeye/blob/master/src/sml/waxeye.sml My hope is that, in a couple of days time, I'll finally be able to take the time to finish the new version. I can update the Python runtime, first, if that's the one you're using most. Regards, Orlando On Sun, Mar 16, 2014 at 6:10 AM, Marcin Wojnarski <mwo...@ns...>wrote: > Dear Orlando, > > Firstly, thanks for Waxeye. I use it in several Python projects, e.g., > web scraping in the Nifty library (https://github.com/mwojnars/nifty/), > and I like it very much. > > However, I've just came across a serious issue with generated Python > parsers. They can't parse long sequences of characters if they fall > under single non-terminal, and when such an input sequence is given, > they crash with the "RuntimeError: maximum recursion depth exceeded in > cmp" exception. > > For example, with this extremely simple 1-line grammar: > > document <- *. > > If you try to parse a text longer than ~500 characters, the parser will > crash. If the grammar rule is a bit more complex, like: > > document <- *(!'<' !'&' .) > > the parser will crash already after ~130 characters (on my comp, exact > numbers may depend on python system settings etc). > > I can't find any sensible workaround for this. For example, the > following grammar also causes crashes: > > char <- . > document <- *char > > I might try limiting the length of sequences parsed by a single > non-terminal, to split all input into chunks of lenght, say, up to 100 > characters, but there is no way to do this, because one can't restrict > the maximum number of repetitions parsed in *() or +() rules. > > This issue makes using Waxeye extremely risky and unreliable. You can > stumble upon this bug in literally EVERY type of language. And the exact > behavior depends on specific input data, so you can live for a long time > with strong conviction that you have a great parser, until a day comes > when somebody feeds an atypical input and the parser would crash. > > Besides, this bug indicates an inefficiency that's present even when the > parser runs correctly. Namely, how can it be that rules like *(X) cause > the parser to make a nested function call on every consecutive match of > the (X) expression? There's something wrong with the implementation, I > don't believe that PEG by itself might enforce such nested calls. > > I'd be grateful for your help in fixing this. I like Waxeye and dislike > the idea of switching to another tool. > > Thanks, > Marcin > > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > Waxeye-users mailing list > Wax...@li... > https://lists.sourceforge.net/lists/listinfo/waxeye-users > |
From: Marcin W. <mwo...@ns...> - 2014-03-15 17:26:06
|
Dear Orlando, Firstly, thanks for Waxeye. I use it in several Python projects, e.g., web scraping in the Nifty library (https://github.com/mwojnars/nifty/), and I like it very much. However, I've just came across a serious issue with generated Python parsers. They can't parse long sequences of characters if they fall under single non-terminal, and when such an input sequence is given, they crash with the "RuntimeError: maximum recursion depth exceeded in cmp" exception. For example, with this extremely simple 1-line grammar: document <- *. If you try to parse a text longer than ~500 characters, the parser will crash. If the grammar rule is a bit more complex, like: document <- *(!'<' !'&' .) the parser will crash already after ~130 characters (on my comp, exact numbers may depend on python system settings etc). I can't find any sensible workaround for this. For example, the following grammar also causes crashes: char <- . document <- *char I might try limiting the length of sequences parsed by a single non-terminal, to split all input into chunks of lenght, say, up to 100 characters, but there is no way to do this, because one can't restrict the maximum number of repetitions parsed in *() or +() rules. This issue makes using Waxeye extremely risky and unreliable. You can stumble upon this bug in literally EVERY type of language. And the exact behavior depends on specific input data, so you can live for a long time with strong conviction that you have a great parser, until a day comes when somebody feeds an atypical input and the parser would crash. Besides, this bug indicates an inefficiency that's present even when the parser runs correctly. Namely, how can it be that rules like *(X) cause the parser to make a nested function call on every consecutive match of the (X) expression? There's something wrong with the implementation, I don't believe that PEG by itself might enforce such nested calls. I'd be grateful for your help in fixing this. I like Waxeye and dislike the idea of switching to another tool. Thanks, Marcin |
From: Nico V. <nv...@gm...> - 2014-02-19 08:04:14
|
Hello Orlando, > It sounds like your grammar might be regular. If that is the case, it > could be that a tool like Ragel is a better choice over PEGs. Thank you for the suggestion. I am looking into Ragel, at least for the part of my grammar that I have problems with. For other parts I can still use Waxeye. Kind regards, Nico --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com |
From: Orlando H. <orl...@gm...> - 2014-02-18 11:51:33
|
Hi Nico, It sounds like your grammar might be regular. If that is the case, it could be that a tool like Ragel is a better choice over PEGs. http://www.complang.org/ragel/ I'm not sure how it is with large grammars, but I remember hearing that it was one of the reasons that Mongrel, the Ruby web server, performed so well. Kind regards, Orlando On Tue, Feb 18, 2014 at 7:45 PM, Nico Verwer <nv...@gm...> wrote: > Hello Orlando, > > Thanks for your quick response. > > How big is the grammar you have? Perhaps it's the kind that Waxeye should >> be able to handle without changing Racket's default settings. >> > It consists or more than 30000 non-terminals, each producing one or more > alternative titles (of laws and other regulations). Plus one non-terminal > which is the alteration (|) of all of the 30000 non-terminals. > > I do not really expect Waxeye to be able to generate a grammar for this. > At the moment I am considering doing a trie-based lookup of the titles > before parsing with my Waxeye grammar. > > Kind regards, > Nico > > --- > Dit e-mailbericht bevat geen virussen en malware omdat avast! > Antivirus-bescherming actief is. > http://www.avast.com > > |
From: Nico V. <nv...@gm...> - 2014-02-18 06:45:25
|
Hello Orlando, Thanks for your quick response. > How big is the grammar you have? Perhaps it's the kind that Waxeye > should be able to handle without changing Racket's default settings. It consists or more than 30000 non-terminals, each producing one or more alternative titles (of laws and other regulations). Plus one non-terminal which is the alteration (|) of all of the 30000 non-terminals. I do not really expect Waxeye to be able to generate a grammar for this. At the moment I am considering doing a trie-based lookup of the titles before parsing with my Waxeye grammar. Kind regards, Nico --- Dit e-mailbericht bevat geen virussen en malware omdat avast! Antivirus-bescherming actief is. http://www.avast.com |
From: Orlando H. <orl...@gm...> - 2014-02-17 20:17:23
|
Hi Nico, I've briefly looked in the Racket documentation for compiling executables, running from the command-line, and in the configure script options for compiling Racket from source. I didn't see anything that mentioned the VM's memory. That's not to say it's impossible, though. How big is the grammar you have? Perhaps it's the kind that Waxeye should be able to handle without changing Racket's default settings. Kind regards, Orlando On Tue, Feb 18, 2014 at 4:05 AM, Nico Verwer <nv...@gm...> wrote: > Hello waxeye-users, > > I have an insanely large grammar (generated), which causes the following > error when I try to compile it (to Java); > Racket virtual machine has run out of memory; aborting > > Is it possible to somehow increase the memory for the Racket vm and > generate a new waxeye.exe? I do not (yet) have racket installed, so I > can't set it there. > > Best regards, > Nico Verwer > > --- > This email is free from viruses and malware because avast! Antivirus > protection is active. > http://www.avast.com > > > > ------------------------------------------------------------------------------ > Managing the Performance of Cloud-Based Applications > Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. > Read the Whitepaper. > > http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk > _______________________________________________ > Waxeye-users mailing list > Wax...@li... > https://lists.sourceforge.net/lists/listinfo/waxeye-users > |
From: Nico V. <nv...@gm...> - 2014-02-17 15:05:57
|
Hello waxeye-users, I have an insanely large grammar (generated), which causes the following error when I try to compile it (to Java); Racket virtual machine has run out of memory; aborting Is it possible to somehow increase the memory for the Racket vm and generate a new waxeye.exe? I do not (yet) have racket installed, so I can't set it there. Best regards, Nico Verwer --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com |
From: Orlando H. <orl...@gm...> - 2014-02-16 07:56:05
|
Hi Adema, Waxeye doesn't have very good support for Unicode, at the moment. It is possible to specify 8 bit hexadecimal code points. e.g. \<40> Better Unicode support is something I'll have to improve, in a future version. I might change to a more standard syntax, while I'm at it. Sorry, that I can't be more help on this, for now. Regards, Orlando On Thu, Feb 13, 2014 at 5:04 AM, Elie Roudninski <xa...@gm...> wrote: > Hi, > > I want to know if there is some support for unicode ? > I looked into waxeye grammar file and i did not see anything related to > unicode. > For example, is it possible to match chars such as \u2028 or \2029 ? > > I'm actually trying to do a JavaScript/ECMAScript parser and i will be > glad to release it as soon as i have a working version. > > Regards, > > adema > > > ------------------------------------------------------------------------------ > Android apps run on BlackBerry 10 > Introducing the new BlackBerry 10.2.1 Runtime for Android apps. > Now with support for Jelly Bean, Bluetooth, Mapview and more. > Get your Android app in front of a whole new audience. Start now. > > http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk > _______________________________________________ > Waxeye-users mailing list > Wax...@li... > https://lists.sourceforge.net/lists/listinfo/waxeye-users > > |
From: Elie R. <xa...@gm...> - 2014-02-12 16:04:54
|
Hi, I want to know if there is some support for unicode ? I looked into waxeye grammar file and i did not see anything related to unicode. For example, is it possible to match chars such as \u2028 or \2029 ? I'm actually trying to do a JavaScript/ECMAScript parser and i will be glad to release it as soon as i have a working version. Regards, adema |
From: Orlando H. <orl...@gm...> - 2013-08-26 14:49:23
|
Hi Rares, Sorry about that. You're right, it's not an issue for the Scheme implementation. I will certainly be fixing this, at some stage. The quickest way would be for me to finish the redesign, even if I only released a new runtime for one language, to begin with. I assume that Java is the most important one for you. I'll have to think about whether I have the time, just now. Best Regards, Orlando On Mon, Aug 26, 2013 at 11:45 PM, Rares Ispas <ra...@ra...> wrote: > ** > Hello Orlando, > > I have made some tests with WaxEye/Java and I have encountered a problem: > stack overflow when parsing moderately large input. The grammar was trivial: > > # A basic grammar > start <- *prop > prop <- "Mary runs." ws > ws <: *[ \t\n\r] > > The input file was: > Mary runs. > Mary runs. > ... ( a few thousand lines) > > The result was: > Exception in thread "main" java.lang.StackOverflowError > at > org.waxeye.parser.Parser$InnerParser.visitCharTransition(Parser.java:467) > at org.waxeye.parser.CharTransition.acceptVisitor(CharTransition.java:76) > at org.waxeye.parser.Parser$InnerParser.matchEdge(Parser.java:380) > at org.waxeye.parser.Parser$InnerParser.matchEdges(Parser.java:350) > at org.waxeye.parser.Parser$InnerParser.matchState(Parser.java:318) > at org.waxeye.parser.Parser$InnerParser.matchAutomaton(Parser.java:220) > at > org.waxeye.parser.Parser$InnerParser.visitAutomatonTransition(Parser.java:461) > at > org.waxeye.parser.AutomatonTransition.acceptVisitor(AutomatonTransition.java:46) > at org.waxeye.parser.Parser$InnerParser.matchEdge(Parser.java:380) > ...(many thousands of lines) > > The same grammar would work using "bin/waxeye -i simple.waxeye < > simple/simple.test", which probably is due to Scheme's call-tail > elimination. > > Any chance that you could replace recursion with a loop in the > Java/JavaScript implementation? In our environment we do not have much > control over the stack size, so using -Xss is not an option. > > -- > Best Regards, > Rares > > > ------------------------------------------------------------------------------ > Introducing Performance Central, a new site from SourceForge and > AppDynamics. Performance Central is your source for news, insights, > analysis and resources for efficient Application Performance Management. > Visit us today! > http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk > _______________________________________________ > Waxeye-users mailing list > Wax...@li... > https://lists.sourceforge.net/lists/listinfo/waxeye-users > > |
From: Rares I. <ra...@ra...> - 2013-08-26 11:48:50
|
Hello Orlando, I have made some tests with WaxEye/Java and I have encountered a problem: stack overflow when parsing moderately large input. The grammar was trivial: # A basic grammar start <- *prop prop <- "Mary runs." ws ws <: *[ \t\n\r] The input file was: Mary runs. Mary runs. ... ( a few thousand lines) The result was: Exception in thread "main" java.lang.StackOverflowError at org.waxeye.parser.Parser$InnerParser.visitCharTransition(Parser.java:467) at org.waxeye.parser.CharTransition.acceptVisitor(CharTransition.java:76) at org.waxeye.parser.Parser$InnerParser.matchEdge(Parser.java:380) at org.waxeye.parser.Parser$InnerParser.matchEdges(Parser.java:350) at org.waxeye.parser.Parser$InnerParser.matchState(Parser.java:318) at org.waxeye.parser.Parser$InnerParser.matchAutomaton(Parser.java:220) at org.waxeye.parser.Parser$InnerParser.visitAutomatonTransition(Parser.java:461) at org.waxeye.parser.AutomatonTransition.acceptVisitor(AutomatonTransition.java:46) at org.waxeye.parser.Parser$InnerParser.matchEdge(Parser.java:380) ...(many thousands of lines) The same grammar would work using "bin/waxeye -i simple.waxeye < simple/simple.test", which probably is due to Scheme's call-tail elimination. Any chance that you could replace recursion with a loop in the Java/JavaScript implementation? In our environment we do not have much control over the stack size, so using -Xss is not an option. -- Best Regards, Rares |
From: Rares I. <ra...@ra...> - 2013-07-22 12:41:59
|
Hello Orlando, Thank you for your kind response. I did not provide an answer until now because I had not read the papers you recommended. Bryan's original paper does not mention any kind of pushdown automata. My purpose is to implement some kind of in-editor code completion, for example for SQL. For a position inside a string, I would need to know the non-terminals and terminals which are valid at that point, even if the string is incomplete. For example, with the cursor at position zero, we could have "select, insert, create table, etc". With cursor after "create " we would get "table, schema, index, etc". It is not clear for me how to extract this information from the Automaton and States list. Thanks, Rares On Thu, 11 Jul 2013 08:10:23 +0300, Orlando Hill <orl...@gm...> wrote: > Hi Rares, > > Bryan Ford's 'Parsing Expression Grammars: A Recognition-Based Syntactic > Foundation' is the most relevant. > > Both of Ford's Packrat papers discuss memoization. Waxeye's current > parsing machine uses a hash table rather than an array for >memoization, > and only memoizes results from non-terminal expressions. > > The current parsing machine is basically a Pushdown automaton > (http://en.wikipedia.org/wiki/Pushdown_automaton) with ordered >choice > taken into account when building the finite-state automata for each > non-terminal expression. I moved to this representation >to make code > generation for multiple languages easier. > > I don't think I ever fixed the bug you mentioned, as it was my intention > to redesign the parsing machine. > > A year ago, I redesigned the parsing machine for better performance and > ease of understanding. > > This new version works directly with a tree of parsing expressions. For > functional languages (e.g. Scheme), a direct-style >evaluator is all > that is needed. For other languages (C, Java, JavaScript, Python, Ruby), > a direct-style evaluator should be >transformed into a form that can be > written as a while loop and a switch statement. > > The current code is here: > https://github.com/orlandohill/waxeye/blob/master/src/sml/waxeye.sml > > To better understand this version, I would recommend the following > papers: > > Three Steps for the CPS Transformation (1991) by Olivier Danvy > Section 3 of Defunctionalization at Work (2001) by Olivier Danvy, Lasse > R. Nielsen > Section 2.1 of A Functional Correspondence between Evaluators and > Abstract Machines (2003) by Mads Sig Ager, Dariusz Biernacki, >Olivier > Danvy, Jan Midtgaard > > There are a few things that need to be done, before I can release a new > version of Waxeye. > > * Finish error tracking. > * Generate regression tests for each supported language. > * Port the new design to each supported language. > > What are you aiming to do? > Can you explain what you mean by statement completion? > > Let me know, if you need any further help. > > Best Regards, > Orlando > > > On Wed, Jul 10, 2013 at 7:52 PM, Rares Ispas <ra...@ra...> > wrote: >> Hello, >> >> Can you please reference the exact paper(s) used in implementing your >> variant of the parsing algorithm? I would need to do some >>tweaks to >> it (ie add statement completion) and for this I have to understand it >> very well. Your only reference is to PEG's home >>page, but there are a >> lot of papers and variants there, which one should I use for reference? >> >> You also mentioned in 2011 that you made a mistake in the backtracking >> implementation, have you solved it since? >> >> --Best Regards, >> Rares >> >> ------------------------------------------------------------------------------ >> See everything from the browser to the database with AppDynamics >> Get end-to-end visibility with application monitoring from AppDynamics >> Isolate bottlenecks and diagnose root cause in seconds. >> Start your free trial of AppDynamics Pro today! >> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >> _______________________________________________ >> Waxeye-users mailing list >> Wax...@li... >> https://lists.sourceforge.net/lists/listinfo/waxeye-users |