pyparsing-users Mailing List for Python parsing module (Page 3)
Brought to you by:
ptmcg
You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(2) |
Feb
|
Mar
(2) |
Apr
(12) |
May
(2) |
Jun
|
Jul
|
Aug
(12) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2006 |
Jan
(5) |
Feb
(1) |
Mar
(10) |
Apr
(3) |
May
(7) |
Jun
(2) |
Jul
(2) |
Aug
(7) |
Sep
(8) |
Oct
(17) |
Nov
|
Dec
(3) |
2007 |
Jan
(4) |
Feb
|
Mar
(10) |
Apr
|
May
(6) |
Jun
(11) |
Jul
(1) |
Aug
|
Sep
(19) |
Oct
(8) |
Nov
(32) |
Dec
(8) |
2008 |
Jan
(12) |
Feb
(6) |
Mar
(42) |
Apr
(47) |
May
(17) |
Jun
(15) |
Jul
(7) |
Aug
(2) |
Sep
(13) |
Oct
(6) |
Nov
(11) |
Dec
(3) |
2009 |
Jan
(2) |
Feb
(3) |
Mar
|
Apr
|
May
(11) |
Jun
(13) |
Jul
(19) |
Aug
(17) |
Sep
(8) |
Oct
(3) |
Nov
(7) |
Dec
(1) |
2010 |
Jan
(2) |
Feb
|
Mar
(19) |
Apr
(6) |
May
|
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
(4) |
Nov
(3) |
Dec
(2) |
2011 |
Jan
(4) |
Feb
|
Mar
(5) |
Apr
(1) |
May
(3) |
Jun
(8) |
Jul
(6) |
Aug
(8) |
Sep
(35) |
Oct
(1) |
Nov
(1) |
Dec
(2) |
2012 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(4) |
May
|
Jun
(1) |
Jul
|
Aug
(6) |
Sep
(18) |
Oct
|
Nov
(1) |
Dec
|
2013 |
Jan
(7) |
Feb
(7) |
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
(1) |
Aug
(5) |
Sep
(3) |
Oct
(11) |
Nov
(3) |
Dec
|
2014 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(6) |
May
(10) |
Jun
(4) |
Jul
|
Aug
(5) |
Sep
(2) |
Oct
(4) |
Nov
(1) |
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(13) |
May
(1) |
Jun
|
Jul
(2) |
Aug
|
Sep
(9) |
Oct
(2) |
Nov
(11) |
Dec
(2) |
2016 |
Jan
|
Feb
(3) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(4) |
2017 |
Jan
(2) |
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(4) |
Nov
(3) |
Dec
|
2018 |
Jan
(10) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2023 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
From: Paul M. <pt...@au...> - 2015-12-17 10:17:40
|
Hm, interesting approach, to make the function an operator. For your list of args, the operator is not a delimited list, it is just the ',' symbol itself. Try adding this to your infixNotation call, before fnop: (',', 2, opAssoc.LEFT, lambda t: tuple(t[::2]) if len(t)>1 else t[0]), Totally untested. Good luck! -- Paul -----Original Message----- From: Andrew Nelson [mailto:and...@gm...] Sent: Wednesday, December 16, 2015 10:52 PM To: pyp...@li... Subject: [Pyparsing] Scientific Calculator - how to handle multiparameter functions Dear pyparsing users, I'm attempting to make a fully functional scientific calculator by expanding on some example code I found on the pyparsing wiki. The code I have so far is at: http://pastebin.com/aAPri29k I have managed to get the calculator to do things along the lines of: >>> vars_={'A': 0, 'B': 1.1, 'C': 2.2, 'D': 3.3, 'E': 4.4, 'F': 5.5, 'G': 6.6, 'H':7.7, 'I':8.8, 'J':9.9, "abc": 20} >>> arith = Arith( vars_ ) >>> arith.eval("1+2*sin(B)*2") 4.564829440245742 So far so good. However, I would now like to use functions which have more that one parameter, e.g. math.pow or math.atan2. I've been trying all day to add this kind of functionality, but failing miserably. I've mainly been experimenting with delimitedList: listop = delimitedList(operand) and adding this listop to the infixNotation call. Can anyone inform me if there is a simple solution? I'm a total novice at parsing. regards, Andrew. _____________________________________ Dr. Andrew Nelson _____________________________________ ---------------------------------------------------------------------------- -- _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
From: Andrew N. <and...@gm...> - 2015-12-17 04:52:11
|
Dear pyparsing users, I'm attempting to make a fully functional scientific calculator by expanding on some example code I found on the pyparsing wiki. The code I have so far is at: http://pastebin.com/aAPri29k I have managed to get the calculator to do things along the lines of: >>> vars_={'A': 0, 'B': 1.1, 'C': 2.2, 'D': 3.3, 'E': 4.4, 'F': 5.5, 'G': 6.6, 'H':7.7, 'I':8.8, 'J':9.9, "abc": 20} >>> arith = Arith( vars_ ) >>> arith.eval("1+2*sin(B)*2") 4.564829440245742 So far so good. However, I would now like to use functions which have more that one parameter, e.g. math.pow or math.atan2. I've been trying all day to add this kind of functionality, but failing miserably. I've mainly been experimenting with delimitedList: listop = delimitedList(operand) and adding this listop to the infixNotation call. Can anyone inform me if there is a simple solution? I'm a total novice at parsing. regards, Andrew. _____________________________________ Dr. Andrew Nelson _____________________________________ |
From: Martijn V. <m.v...@lu...> - 2015-11-26 12:10:50
|
Dear Paul, Thanks, the memory issue seems to be resolved. I do have another problem with 2.0.6 and up which I think is due to the same change. On parsing something that's not accepted by the grammar, I get a parse exception, for example: ParseException: Expected "IVS" (at char 7), (line:1, col:8) With the latest SVN I get the same thing, but the exception message contains a huge expected string (more than 600,000 characters), for example: ParseException: Expected {{{{{{Suppress:({["GI"] ^ ["GI:"] ^ ["gi"] ^ ["gi:"]}) W:(0123...)} ^ {~{"LRG_"} Combine:({W:(abcd...) W:(0123...)}) [{Suppress:(".") W:(0123...)}]} ^ Combine:({"UD_" W:(abcd...) {{"_" W:(0123...)}}...})} [{Suppress:("(") Group:({W:(abcd...) [{{Suppress:("_v") W:(0123...)} ^ {Suppress:("_i") W:(0123...)}}]}) Suppress:(")")}]} ^ {Combine:({"LRG_" W:(0123...)}) [{{Suppress:("t") W:(0123...)} ^ {Suppress:("p") W:(0123...)}}]}} Suppress:(":") [{W:(cgmn...) Suppress:(".")}] {Empty {Gro ... etcetera, you get the idea. I'd say that this is undesirable. This is with the same HGVS grammar I linked earlier. Just try parsing the empty string for example. https://github.com/mutalyzer/mutalyzer/blob/master/mutalyzer/grammar.py best, Martijn On Wed, 2015-11-25 at 14:00 -0600, Paul McGuire wrote: > For those who reported having memory and Unicode issues, if possible, please > download the latest committed version of pyparsing from the SourceForge SVN > repo, and see if this resolves your issues. > > For the memory problem, it is probably not necessary to actually parse any > text, simply invoke the streamline() method on your top-level grammar > instance: > > parser.streamline() > > For the Unicode problems, you should stop seeing the UnicodeEncodeError > exceptions when creating your parser. > > Thanks for the feedback, everyone - if these changes work out, I'll follow > up with an actual 2.0.7 release in the next day or so. > > -- Paul > > > > -----Original Message----- > From: Martijn Vermaat [mailto:m.v...@lu...] > Sent: Tuesday, November 24, 2015 10:32 AM > To: pyp...@li... > Subject: Re: [Pyparsing] Memory issues with 2.0.6 > > Dear Paul and others, > > (Sorry for replying out of thread, I only just subscribed.) > > I can report the same problem with Pyparsing 2.0.6 on this grammar: > > https://github.com/mutalyzer/mutalyzer/blob/master/mutalyzer/grammar.py > > Memory usage continues to increase. > > best, > Martijn > > > > The problem is definitely in the "cosmetic-only" change to the > > returned error message for MatchFirst and Or (which also manifests as > > a Unicode error), and does not even require calling parseString, just > > streamline(). Thanks for the test case Will, I can repro the problem > > with it, but am trying to distill it down to a smaller case to add to > > my unit tests, and to work with in fixing the bug. > > > > For now, impatient users can comment out line 2354 in pyparsing.py: > > > > self.errmsg = "Expected " + str(self) > > > > ---------------------------------------------------------------------------- > -- > Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users > amazing mobile app experiences with Intel(R) XDK. > Use one codebase in this all-in-one HTML5 development environment. > Design, debug & build mobile apps & 2D/3D high-impact games for multiple > OSs. > http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140 > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > |
From: Will M. <wil...@gm...> - 2015-11-26 10:57:47
|
Hi Paul, Works for me. Thanks! Will On Wed, Nov 25, 2015 at 8:00 PM, Paul McGuire <pt...@au...> wrote: > For those who reported having memory and Unicode issues, if possible, > please > download the latest committed version of pyparsing from the SourceForge SVN > repo, and see if this resolves your issues. > > For the memory problem, it is probably not necessary to actually parse any > text, simply invoke the streamline() method on your top-level grammar > instance: > > parser.streamline() > > For the Unicode problems, you should stop seeing the UnicodeEncodeError > exceptions when creating your parser. > > Thanks for the feedback, everyone - if these changes work out, I'll follow > up with an actual 2.0.7 release in the next day or so. > > -- Paul > > > > -----Original Message----- > From: Martijn Vermaat [mailto:m.v...@lu...] > Sent: Tuesday, November 24, 2015 10:32 AM > To: pyp...@li... > Subject: Re: [Pyparsing] Memory issues with 2.0.6 > > Dear Paul and others, > > (Sorry for replying out of thread, I only just subscribed.) > > I can report the same problem with Pyparsing 2.0.6 on this grammar: > > https://github.com/mutalyzer/mutalyzer/blob/master/mutalyzer/grammar.py > > Memory usage continues to increase. > > best, > Martijn > > > > The problem is definitely in the "cosmetic-only" change to the > > returned error message for MatchFirst and Or (which also manifests as > > a Unicode error), and does not even require calling parseString, just > > streamline(). Thanks for the test case Will, I can repro the problem > > with it, but am trying to distill it down to a smaller case to add to > > my unit tests, and to work with in fixing the bug. > > > > For now, impatient users can comment out line 2354 in pyparsing.py: > > > > self.errmsg = "Expected " + str(self) > > > > > ---------------------------------------------------------------------------- > -- > Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users > amazing mobile app experiences with Intel(R) XDK. > Use one codebase in this all-in-one HTML5 development environment. > Design, debug & build mobile apps & 2D/3D high-impact games for multiple > OSs. > http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140 > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > > -- Will McGugan http://www.willmcgugan.com |
From: Paul M. <pt...@au...> - 2015-11-25 20:00:39
|
For those who reported having memory and Unicode issues, if possible, please download the latest committed version of pyparsing from the SourceForge SVN repo, and see if this resolves your issues. For the memory problem, it is probably not necessary to actually parse any text, simply invoke the streamline() method on your top-level grammar instance: parser.streamline() For the Unicode problems, you should stop seeing the UnicodeEncodeError exceptions when creating your parser. Thanks for the feedback, everyone - if these changes work out, I'll follow up with an actual 2.0.7 release in the next day or so. -- Paul -----Original Message----- From: Martijn Vermaat [mailto:m.v...@lu...] Sent: Tuesday, November 24, 2015 10:32 AM To: pyp...@li... Subject: Re: [Pyparsing] Memory issues with 2.0.6 Dear Paul and others, (Sorry for replying out of thread, I only just subscribed.) I can report the same problem with Pyparsing 2.0.6 on this grammar: https://github.com/mutalyzer/mutalyzer/blob/master/mutalyzer/grammar.py Memory usage continues to increase. best, Martijn > The problem is definitely in the "cosmetic-only" change to the > returned error message for MatchFirst and Or (which also manifests as > a Unicode error), and does not even require calling parseString, just > streamline(). Thanks for the test case Will, I can repro the problem > with it, but am trying to distill it down to a smaller case to add to > my unit tests, and to work with in fixing the bug. > > For now, impatient users can comment out line 2354 in pyparsing.py: > > self.errmsg = "Expected " + str(self) ---------------------------------------------------------------------------- -- Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users amazing mobile app experiences with Intel(R) XDK. Use one codebase in this all-in-one HTML5 development environment. Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140 _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
From: Martijn V. <m.v...@lu...> - 2015-11-24 17:13:10
|
Dear Paul and others, (Sorry for replying out of thread, I only just subscribed.) I can report the same problem with Pyparsing 2.0.6 on this grammar: https://github.com/mutalyzer/mutalyzer/blob/master/mutalyzer/grammar.py Memory usage continues to increase. best, Martijn > The problem is definitely in the "cosmetic-only" change to the > returned error message for MatchFirst and Or (which also manifests as > a Unicode error), and does not even require calling parseString, just > streamline(). Thanks for the test case Will, I can repro the problem > with it, but am trying to distill it down to a smaller case to add to > my unit tests, and to work with in fixing the bug. > > For now, impatient users can comment out line 2354 in pyparsing.py: > > self.errmsg = "Expected " + str(self) |
From: Andrea C. <an...@cd...> - 2015-11-20 16:22:46
|
On Fri, Nov 20, 2015 at 4:16 AM, Paul McGuire <pt...@au...> wrote: > > Andrea, I'll try to take a look at the PyContracts code that you posted and > see if any glaring areas jump out. will, I hope these descriptions will give > you some clues where to start looking in your grammar. I can also post some > before-after snippets that you can patch into your versions of pyparsing and > rerun your tests. Thanks Paul! 1) As for the PyContracts project, where I get syntax errors, this is my travis project: https://travis-ci.org/AndreaCensi/contracts This currently shows the same code working for 2.7,3.2,3.3,pypy, but failing for 3.4 and 3.5. Currently the version that is installed by pip is pyparsing-2.0.6-py2.py3. I'm not an expert at Travis. I wish there was a way to run different builds with multiple versions of pyparsing. To look into the grammar, start here: https://github.com/AndreaCensi/contracts/blob/master/src/contracts/syntax.py Other parts are in other files. It is fairly complex - I was using almost all the features of PyParsing. In the past, what was failing in >=3.4 were tests related to the unary operator "-". It didn't recognize things like ">=-1". Now the problem is errors like this: 'array(=4|>=2,<=0)' => pyparsing.ParseException: Expected {{FollowedBy:({{Forward: {{FollowedBy:({{Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...} "-"} Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}}) Group:({Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...} {{"-" Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}}}...})} | Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}} "+"} Forward: {{FollowedBy:({{Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...} "-"} Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}}) Group:({Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...} {{"-" Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}}}...})} | Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}}}) Group:({Forward: {{FollowedBy:({{Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...} "-"} Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}}) Group:({Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...} {{"-" Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}}}...})} | Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}} {{"+" Forward: {{FollowedBy:({{Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...} "-"} Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}}) Group:({Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...} {{"-" Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}}}...})} | Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}}}}...})} | Forward: {{FollowedBy:({{Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...} "-"} Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}}) Group:({Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...} {{"-" Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}}}...})} | Forward: {{FollowedBy:({{... "*"} ...}) Group:({... {{"*" ...}}...})} | ...}}} (at char 10), (line:1, col:11) 2) As for the other project, the one with out of memory errors, I do have unicode expressions in literals. Could the changes in unicode expressions lead to out of memory errors? ~ Just read your last message about the cosmetic bug. When you push out the fix, I will let you know if the problems above disappear. |
From: Paul M. <pt...@au...> - 2015-11-20 14:41:32
|
The problem is definitely in the "cosmetic-only" change to the returned error message for MatchFirst and Or (which also manifests as a Unicode error), and does not even require calling parseString, just streamline(). Thanks for the test case Will, I can repro the problem with it, but am trying to distill it down to a smaller case to add to my unit tests, and to work with in fixing the bug. For now, impatient users can comment out line 2354 in pyparsing.py: self.errmsg = "Expected " + str(self) -- Paul --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
From: Paul M. <pt...@au...> - 2015-11-20 09:16:29
|
There were two logic changes made in 2.0.6: - a bug in Or (operator ^) was fixed which handles the case where the longest match fails because of a parse action raising an exception, but an alternative shorter match succeeds; previously this would erroneously fail to match, but now successfully returns the shorter alternative match - a bug in Each (operator &) was fixed that would erroneously return multiple matches of Optional expressions There was one additional change that introduced a bug that only affects users with unicode in their expressions. If your grammar has complex expressions (especially recursive expressions) using ^ or & operators, these new bugfixes may be the problem. Andrea, I'll try to take a look at the PyContracts code that you posted and see if any glaring areas jump out. will, I hope these descriptions will give you some clues where to start looking in your grammar. I can also post some before-after snippets that you can patch into your versions of pyparsing and rerun your tests. -- Paul -----Original Message----- From: Andrea Censi [mailto:an...@cd...] Sent: Thursday, November 19, 2015 7:03 PM To: Will McGugan <wil...@gm...> Cc: Pyp...@li... Subject: Re: [Pyparsing] Memory issues with 2.0.6 > starts eating memory until the process is killed by the OS. Me too! I have been wondering why all of a sudden my Travis unit tests were failing, with the processes being killed. <snip> --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
From: Andrea C. <an...@cd...> - 2015-11-20 01:03:57
|
> starts eating memory until the process is killed by the OS. Me too! I have been wondering why all of a sudden my Travis unit tests were failing, with the processes being killed. ~ [I also have a case in which a grammar worked (defined as: "parses string s") in Python 2.7, but doesn't (syntax error with the same string s) in >=3.3, and the specific errors changed with the latest update of PyParsing. http://github.com/AndreaCensi/contracts So I guess something substantial changed in the latest release. ] On Thu, Nov 19, 2015 at 7:42 PM, Will McGugan <wil...@gm...> wrote: > Hi, > > My app has a fairly complex grammar to parse expressions. Up to 2.0.5 it > was working well. But in 2.0.6 it gets stuck parsing and starts eating > memory until the process is killed by the OS. > > I've pinned pyparsing to 2.0.5 for now. I haven't done any debugging yet, > but I was wondering what had changed since 2.0.5 that could trigger this > kind of behaviour? Any known bugs? > > Thanks in advance, > > Will McGugan > ------------------------------------------------------------------------------ > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users -- Andrea Censi | http://andrea.caltech.edu | "Not all those who wander are lost." research scientist @ LIDS / Massachusetts Institute of Technology |
From: Will M. <wil...@gm...> - 2015-11-20 00:43:00
|
Hi, My app has a fairly complex grammar to parse expressions. Up to 2.0.5 it was working well. But in 2.0.6 it gets stuck parsing and starts eating memory until the process is killed by the OS. I've pinned pyparsing to 2.0.5 for now. I haven't done any debugging yet, but I was wondering what had changed since 2.0.5 that could trigger this kind of behaviour? Any known bugs? Thanks in advance, Will McGugan |
From: Paul M. <pt...@au...> - 2015-11-10 01:05:10
|
Nice report, Max, especially boiling the bug repro down to a single statement - very helpful! Just finishing the check-in of a fix to SourceForge SVN, will be included in the next release. -- Paul -----Original Message----- From: Max Rothman [mailto:max...@gm...] Sent: Sunday, November 08, 2015 8:15 PM To: pyp...@li... Subject: [Pyparsing] Bug report: setResultsName works inconsistently on Optional in Each It seems that Optional.setResultsName joined by Each does not behave consistently with other ParserElements. For example: >>> (Optional('foo')('one') & >>> pp.Optional('bar')('two')).parseString('foo bar') (['foo', 'bar'], {}) >>> (Optional('bar')('two') & Optional('foo')('one')).parseString('foo >>> bar') (['foo', 'bar'], {'two': [('bar', 1)]}) A workaround is to name the Literals themselves instead of the Optionals: >>> (Optional(Literal('bar')('two')) & Optional(Literal('foo')('one'))).parseString('foo bar') (['foo', 'bar'], {'two': [('bar', 1)], 'one': [('foo', 0)]}) >>> (Optional(Literal('foo')('one')) & Optional(Literal('bar')('two'))).parseString('foo bar') (['foo', 'bar'], {'two': [('bar', 1)], 'one': [('foo', 0)]}) The problem does not manifest when joining Optional objects with And: >>> (Optional('foo')('one') + >>> pp.Optional('bar')('two')).parseString('foo bar') (['foo', 'bar'], {'two': [('bar', 1)], 'one': [('foo', 0)]}) Thanks, Max ---------------------------------------------------------------------------- -- Presto, an open source distributed SQL query engine for big data, initially developed by Facebook, enables you to easily query your data on Hadoop in a more interactive manner. Teradata is also now providing full enterprise support for Presto. Download a free open source copy now. http://pubads.g.doubleclick.net/gampad/clk?id=250295911&iu=/4140 _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
From: Max R. <max...@gm...> - 2015-11-09 02:14:58
|
It seems that Optional.setResultsName joined by Each does not behave consistently with other ParserElements. For example: >>> (Optional('foo')('one') & pp.Optional('bar')('two')).parseString('foo bar') (['foo', 'bar'], {}) >>> (Optional('bar')('two') & Optional('foo')('one')).parseString('foo bar') (['foo', 'bar'], {'two': [('bar', 1)]}) A workaround is to name the Literals themselves instead of the Optionals: >>> (Optional(Literal('bar')('two')) & Optional(Literal('foo')('one'))).parseString('foo bar') (['foo', 'bar'], {'two': [('bar', 1)], 'one': [('foo', 0)]}) >>> (Optional(Literal('foo')('one')) & Optional(Literal('bar')('two'))).parseString('foo bar') (['foo', 'bar'], {'two': [('bar', 1)], 'one': [('foo', 0)]}) The problem does not manifest when joining Optional objects with And: >>> (Optional('foo')('one') + pp.Optional('bar')('two')).parseString('foo bar') (['foo', 'bar'], {'two': [('bar', 1)], 'one': [('foo', 0)]}) Thanks, Max |
From: Paul M. <pt...@au...> - 2015-10-31 15:11:45
|
First of all, it is not necessary (and probably not even helpful) to define every element of your grammar as an instance variable, preceded by 'self.'. In nearly all cases, when I define a grammar within a parsing class, I'll do something like this: class BobParser(object): def __init__(self): expr1 = Literal("Bob's your") expr2 = oneOf("uncle aunt brother sister father mother") self.parser = expr1 + expr2('relation') def who_is_it(self, string): return self.parser.parseString(string).relation All the sub-expressions can just be local variables, and save the "self." business for just the top-level parser. This really looks like you just started at the beginning of your input text and started writing pyparsing expressions to it. Many times there's nothing wrong with that, but in your case, there is a lot of complexity and structure to your input. And more importantly, many internal patterns that are repeated - these can be defined once and then reused by name over and over. I often encourage people to write a BNF for complex data like this. Or at *least* look at the input at an overall level for which bits are common and can be reused. There are many small pieces that can be defined and reused in larger parts, and these will help simplify your grammar. For instance, there are many places where you use this: Combine(self.plain_number + 'x' + self.plain_number, adjacent=False).setResultsName('something') which really is challenging to the eye to see what is the parser and what is the meta-information (adjacency, results name). And you repeat this whenever you need a "0 x 0" or "640 x 480", so it makes things very messy. If you define for yourself this reusable expression: number_x_number = Combine(plain_number + 'x' + plain_number, adjacent = False) Then you can use it in multiple places as: number_x_number("clean_aperature") number_x_number("Dimensions") etc. I also strongly recommend using the short-cut version of expr.setResultsName('xyz'), as expr('xyz'). This notation can really clean up your grammar definition, and make it easier to see the overall parser without being distracted by all the function calls. Finally, you make heavy use of "Combine(something + something + something, adjacent=False)". Please consider using Group instead. It's clearer to follow, it implicitly allows whitespace (so adjacent=False is not necessary), and it allows you to define results names within the group, making for a useful substructure (as in sample_time below): number_x_number = Group(plain_number + 'x' + plain_number) number_sl_number = Group(plain_number + '/' + plain_number) word_sl_word = Gruop(word + '/' + word) timestamp = Regex(r'\d\d:\d\d:\d\d\.\d\d\d') sample_time = Group(number_sl_number('sample') + timestamp('time')) So just going through and stripping out all the "self." stuff and using some of these repeated sub-expressions might make things easier to follow, and your overall intent and structure will be clearer. Here is one area of your parser with these changes: audio_dimensions = 'Dimensions:' + number_x_number('Dimensions') audio_track_matrix = 'Track Matrix:' + restOfLine('track_matrix') audio_track_dimensions = audio_dimensions + audio_track_matrix subtitle_dimensions = 'Dimensions:' + number_x_number('Dimensions') subtitle_track_matrix = 'Track Matrix:' + restOfLine('track_matrix') subtitle_track_dimensions = subtitle_dimensions + subtitle_track_matrix video_dimensions = 'Dimensions:' + number_x_number('Dimensions') video_clean_aperture = 'CleanAperture: ' + number_x_number('CleanAperture') video_production_aperture = 'ProductionAperture:' + number_x_number('ProductionAperture') video_encoded_pixels = 'EncodedPixels:' + number_x_number('EncodedPixels') Now you can answer some other questions for yourself, like "why do I repeat the 'Dimensions:' expression?", and further simplify your grammar. Good luck, -- Paul --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
From: Robin S. <rob...@gm...> - 2015-10-31 00:28:13
|
I have some text I am trying to parse, specifically the track data. I have grammar defined and it works, however I am having the following problem: If the first track found is an audio track, I can retrieve both the audio and video tracks. However, if the first track is a video track, when I try to retrieve the tracks, I get the video track twice and no audio track. Here is my sample data: =============================================================================== File: file:///Users/test/Desktop/Media/Playback_Video_Format_Files/ipad_2_frontface_portrait.mov 5,674,352 bytes File Type Info: MajorBrand 'qt ' MinorVersion 0x00000000 1 compatible brands: 'qt ' Movie Data ('mdat'): 1 atom found, 5,665,002 data bytes com.apple.coremedia.formatreader.quicktime-iso Creation Date (date): 2011-06-14T16:06:30-0700 Movie Timescale: 600 Duration: 7444/600 00:00:12.407 Preferred rate: 1.00 Preferred Volume: 1.00 Movie Matrix: 1.0 0.0 0.0 / 0.0 1.0 0.0 / 0.0 0.0 1.0 Movie is self-contained. Not fast-start. QT user data atom available. QT metadata atom available. 2 tracks present. Track ID 1 vide (Video) Enabled Self-contained Format vide/avc1 dimensions: video 640 x 480, presentation: 640 x 480 (pixelAspect+clean), cleanAperture: 640 x 480 @ 0,0 (originTopLeft) Media Timescale: 600 Duration: 7447/600 00:00:12.412 MinSampleDuration: 19/600 AdvanceDecodeDelta: 0/600 00:00:00.000 Num data bytes: 5555600 Est. data rate: 3.581 Mbps Nominal framerate: 29.972 fps 372 samples Included in auto selection. Language code <und> Dimensions: 640 x 480 CleanAperture: 640 x 480 ProductionAperture: 640 x 480 EncodedPixels: 640 x 480 Track Matrix: 0.0 1.0 0.0 / -1.0 0.0 0.0 / 480.0 0.0 1.0 1 edit: Media start 0/600 00:00:00.000 dur 7444/600 00:00:12.407 Track start 0/600 00:00:00.000 dur 7444/600 00:00:12.407 QT metadata atom available. Track ID 2 soun (Audio) Enabled Self-contained Format soun/aac 44100 Hz aac FormatFlags: 0x00000000 Bytes/Pkt: 0 Frames/Pkt: 1024 Bytes/Frame: 0 Chan/Frame: 1 Bits/Chan: 0 Reserved: 0x00000000 ChannelLayout: Mono Media Timescale: 44100 Duration: 549888/44100 00:00:12.469 MinSampleDuration: 1024/44100 AdvanceDecodeDelta: 0/44100 00:00:00.000 Num data bytes: 99464 Est. data rate: 63.815 kbps Nominal framerate: 43.066 fps 537 samples Track volume: 1 Included in auto selection. Language code <und> Dimensions: 0 x 0 Track Matrix: 1.0 0.0 0.0 / 0.0 1.0 0.0 / 0.0 0.0 1.0 1 edit: Media start 0/44100 00:00:00.000 dur 7444/600 00:00:12.407 Track start 0/600 00:00:00.000 dur 7444/600 00:00:12.407 """ Here is a snippet of my grammar: # Start Track Info Block # Track Info self.crap = Suppress(Optional('[file]')) self.track_id = 'Track ID' + self.plain_number + self.word + '(' + self.word.setResultsName('type') + restOfLine # Track Format self.audio_track_format = 'Format' + Combine(self.word + '/' + self.word).setResultsName('track_format') + \ Combine(self.plain_number + self.word, adjacent=False).setResultsName('frequency') + \ self.word.setResultsName('codec') + restOfLine self.subtitle_track_format = 'Format' + Combine(self.word + '/' + self.word).setResultsName( 'track_format') + restOfLine self.vtrack_format = 'Format' + Combine(self.word + '/' + self.word).setResultsName('track_format') self.vtrack_dimensions = 'dimensions: video' + Combine(self.plain_number + 'x' + self.plain_number, adjacent=False).setResultsName('video') self.vtrack_presentation = Suppress(', ') + 'presentation:' + \ Combine(self.plain_number + 'x' + self.plain_number + '(' + self.word + '+' + self.word + ')' + Suppress(','), adjacent=False) \ .setResultsName('presentation') self.vtrack_cleanaperture = 'cleanAperture:' + Combine(self.plain_number + 'x' + self.plain_number + restOfLine, adjacent=False) \ .setResultsName('cleanAperture') self.video_track_format = self.vtrack_format + self.vtrack_dimensions + \ self.vtrack_presentation + self.vtrack_cleanaperture # Audio Channel Layout self.channel_layout = 'ChannelLayout:' + restOfLine.setResultsName('channel_layout') # Audio Track Volume self.audio_track_volume = 'Track volume:' + self.plain_number.setParseAction(lambda t: int(t[0])) \ .setResultsName('audio_volume') # Frame reordering self.frame = Optional(Combine('Frame') + restOfLine) # Included self.included = Combine('Included' + restOfLine) # Media Timescale self.timescale = Combine(self.word + 'Timescale:' + self.plain_number, adjacent=False) self.duration = Combine('Duration:' + self.div_num + self.time, adjacent=False) self.sample_duration = Combine('MinSampleDuration:' + self.div_num.setResultsName('sample_duration'), adjacent=False) self.decode_delta = Combine('AdvanceDecodeDelta:' + Combine(self.div_num + restOfLine) .setResultsName('decode_delta'), adjacent=False) self.media_timescale = self.timescale + self.duration + self.sample_duration + self.decode_delta # Track Data self.data_bytes = 'Num data bytes:' + self.plain_number.setResultsName('data_bytes') self.data_rate = 'Est. data rate:' + Combine(self.decimal_number + self.word, adjacent=False) \ .setResultsName('estimated_data_rate') self.frame_rate = 'Nominal framerate:' + Combine(self.decimal_number + self.word, adjacent=False) \ .setResultsName('fps') self.samples = self.plain_number.setResultsName('samples') self.track_data = self.data_bytes + self.data_rate + self.frame_rate + self.samples + restOfLine # Track Dimensions self.audio_dimensions = 'Dimensions:' + Combine(self.plain_number + 'x' + self.plain_number, adjacent=False) \ .setResultsName('Dimensions') self.audio_track_matrix = 'Track Matrix:' + restOfLine.setResultsName('track_matrix') self.audio_track_dimensions = self.audio_dimensions + self.audio_track_matrix self.subtitle_dimensions = 'Dimensions:' + Combine(self.plain_number + 'x' + self.plain_number, adjacent=False) \ .setResultsName('Dimensions') self.subtitle_track_matrix = 'Track Matrix:' + restOfLine.setResultsName('track_matrix') self.subtitle_track_dimensions = self.subtitle_dimensions + self.subtitle_track_matrix self.video_dimensions = 'Dimensions:' + Combine(self.plain_number + 'x' + self.plain_number, adjacent=False) \ .setResultsName('Dimensions') self.video_clean_aperture = 'CleanAperture: ' + Combine(self.plain_number + 'x' + self.plain_number, adjacent=False).setResultsName('CleanAperture') self.video_production_aperture = 'ProductionAperture:' + Combine(self.plain_number + 'x' + self.plain_number, adjacent=False).setResultsName( 'ProductionAperture') self.video_encoded_pixels = 'EncodedPixels:' + Combine(self.plain_number + 'x' + self.plain_number, adjacent=False).setResultsName('EncodedPixels') self.video_track_matrix = 'Track Matrix:' + restOfLine.setResultsName('TrackMatrix') self.video_track_dimensions = self.video_dimensions + self.video_clean_aperture + \ self.video_production_aperture + self.video_encoded_pixels + self.video_track_matrix # Edits self.num_edits = self.plain_number.setResultsName('number_of_edits').setParseAction(lambda t: int(t[0])) + \ Optional(Literal('edit:') | Literal('edits:')) self.media_start = 'Media start' + Optional(Group(self.div_num + self.time) | 'INVALID TIME') \ .setResultsName('MediaStart') self.media_duration = 'dur' + Group(self.div_num + self.time).setResultsName('MediaDuration') self.track_start = 'Track start' + Group(self.div_num + self.time).setResultsName('TrackStart') self.track_duration = 'dur' + Group(self.div_num + self.time).setResultsName('TrackDuration') self.media_edit = Group(self.media_start + self.media_duration + self.track_start + self.track_duration + restOfLine + LineEnd().suppress()) self.edits = self.num_edits + OneOrMore(self.media_edit).setResultsName('edits') # Define Track Info Block # Audio Block self.audio_track_info = Group(self.crap + self.track_id + self.audio_track_format + self.channel_layout + self.media_timescale + self.track_data + self.audio_track_volume + self.included + self.audio_track_dimensions + OneOrMore(self.edits) + Optional(self.qt_user_data)) # Subtitle Block self.subtitle_track_info = Group(self.crap + self.track_id + self.subtitle_track_format + \ self.media_timescale + self.track_data + self.included + \ self.subtitle_track_dimensions + OneOrMore(self.edits) + Optional(self.qt_user_data)) # Video Block self.video_track_info = Group(self.crap + self.track_id + self.video_track_format + self.media_timescale + self.track_data + self.frame + self.included + self.video_track_dimensions + OneOrMore(self.edits) + Optional(self.qt_user_data)) self.tracks = ZeroOrMore(self.audio_track_info | self.video_track_info | self.subtitle_track_info) \ .setResultsName('tracks') |
From: Diez B. R. <de...@we...> - 2015-09-09 15:42:58
|
Yup. Very low frequency, but alive and kicking. Diez > On 09 Sep 2015, at 17:24, Stephan Sahm <Ste...@gm...> wrote: > > dear all: is this mailinglist still active? > > best, > Stephan > ------------------------------------------------------------------------------ > Monitor Your Dynamic Infrastructure at Any Scale With Datadog! > Get real-time metrics from all of your servers, apps and tools > in one place. > SourceForge users - Click here to start your Free Trial of Datadog now! > http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140 > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > |
From: Stephan S. <Ste...@gm...> - 2015-09-09 15:25:07
|
dear all: is this mailinglist still active? best, Stephan |
From: Stephan S. <Ste...@gm...> - 2015-09-03 18:54:35
|
Dear all, (I just asked the question "How to use ~ / NotAny() combined with LineStart() ?" - I think the problem there might be related to the BUG I just tracked down) I experienced several problems with using LineStart(). Now I tracked down a minimal failing example which shows (at least for me) that this is in fact a BUG, probably within updating WhitespaceChars. Example Code: ``` from pyparsing import * string = "\nstart\nend" SOL = LineStart() start = SOL + Literal("start") end = SOL + Literal("end") parser = start.setDebug() + end # same for OneOrMore, ZeroOrMore and certainly more... print parser.parseString(string) # Match {LineStart "start"} at loc 0(1,1) # Matched {LineStart "start"} -> ['start'] # ['start', 'end'] parser2 = start.setDebug() + end.setDebug() print parser2.parseString(string) # Match {LineStart "start"} at loc 0(1,1) # Matched {LineStart "start"} -> ['start'] # Match {LineStart "end"} at loc 6(2,1) # Exception raised:Expected start of line (at char 6), (line:2, col:1) ``` so the extra of parser2 is just middle.setDebug() (while start.setDebug() did nothing bad!) Similar behaviours accur with OneOrMore, ZeroOrMore and certainly more... (Of course, one would expect start.setDebug() to break the system as well, however as asked in my previous question "How to use ~ / NotAny() combined with LineStart() ?" the start seems to have its special rules) After reading some online Q/A it seems that the correct WhitespaceChars are really essential for LineStart to work. Any help is appreci ated! I really like pyparsings idea, however not be able to work with lines is a major drawback which I haven't expected. Hopefully, the bug can be fixed soon thanks in advance, best, Stephan |
From: Stephan S. <Ste...@gm...> - 2015-09-03 14:53:19
|
Dear all, I am new to pyparsing and have the following aim: *I want to match every line which does NOT start with "ABC".* I got so far, that I formed an expression which matches an arbitrary line: from pyparsing import * line = LineStart().leaveWhitespace() + restOfLine key = Literal("ABC") So here are my first guesses, what to try: parser0 = ~key + line parser1 = ~(LineStart() + key) + line However they respond like follows: string = """ ABC ABC foo ABC ABC foo""" print parser0.searchString(string) [['foo ABC'], ['foo']] print parser1.searchString(string) [[' ABC'], ['foo ABC'], ['ABC'], ['foo']] parser0 also ignores " ABC", which should be matched however. parser1 does good here, however does not ignore the second "ABC" what do I have to do for such a basic task to work? Any help is highly appreciated, thanks in advance, best, Stephan |
From: mayamatakeshi <may...@gm...> - 2015-09-02 08:11:32
|
Thanks Paul. I have just corrected my script. R, Takeshi. On Wed, Sep 2, 2015 at 4:58 PM, Paul McGuire <pt...@au...> wrote: > Sorry, I'm always getting these backwards if I don't look them up! > > AND takes precedence over OR. I had this backwards. The bool_expr part of > the program should be: > > boolExpr = Forward() > operand = boolcomparison | LPAR + boolExpr + RPAR > and_term = Group(operand + ZeroOrMore(AND + operand)) > or_term = Group(and_term + ZeroOrMore(OR + and_term)) > boolExpr << or_term > > I'll add in NOT processing for you (NOT is highest precedence): > > NOT,AND,OR = map(CaselessKeyword, "NOT AND OR".split()) > > And change: > > operand = boolcomparison | LPAR + boolExpr + RPAR > > to: > > operand = Optional(NOT) + (boolcomparison | LPAR + boolExpr + RPAR) > > Sorry for the confusion! > > -- Paul > > > -----Original Message----- > From: mayamatakeshi [mailto:may...@gm...] > Sent: Tuesday, September 01, 2015 11:48 PM > To: Paul McGuire <pt...@au...> > Cc: pyp...@li... > Subject: Re: [Pyparsing] Getting "maximum recursion depth exceeded" when > using Forward > > Paul, > it was exactly what I wanted to achieve! > Thanks a lot. > R, > Takeshi > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > > |
From: Paul M. <pt...@au...> - 2015-09-02 07:58:44
|
Sorry, I'm always getting these backwards if I don't look them up! AND takes precedence over OR. I had this backwards. The bool_expr part of the program should be: boolExpr = Forward() operand = boolcomparison | LPAR + boolExpr + RPAR and_term = Group(operand + ZeroOrMore(AND + operand)) or_term = Group(and_term + ZeroOrMore(OR + and_term)) boolExpr << or_term I'll add in NOT processing for you (NOT is highest precedence): NOT,AND,OR = map(CaselessKeyword, "NOT AND OR".split()) And change: operand = boolcomparison | LPAR + boolExpr + RPAR to: operand = Optional(NOT) + (boolcomparison | LPAR + boolExpr + RPAR) Sorry for the confusion! -- Paul -----Original Message----- From: mayamatakeshi [mailto:may...@gm...] Sent: Tuesday, September 01, 2015 11:48 PM To: Paul McGuire <pt...@au...> Cc: pyp...@li... Subject: Re: [Pyparsing] Getting "maximum recursion depth exceeded" when using Forward Paul, it was exactly what I wanted to achieve! Thanks a lot. R, Takeshi --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
From: mayamatakeshi <may...@gm...> - 2015-09-02 04:48:20
|
Paul, it was exactly what I wanted to achieve! Thanks a lot. R, Takeshi On Wed, Sep 2, 2015 at 1:22 PM, Paul McGuire <pt...@au...> wrote: > There is a standard approach to parsing this kind of expression, called > "infix notation", which will take into account the precedence of operations > so that the final evaluation will be performed properly. In your simplified > case, you have two operations, AND and OR - by mathematical convention, OR > has higher precedence than AND. So if I wanted to evaluate: > > color = RED and size = XXL or cost > $50 > > This would be equivalent to: > > color = RED and (size = XXL or cost > $50) > > And so we would want our parser to return a nested structure that helps us > evaluate these expressions properly. > > The standard recursive approach to defining an infix notation parser > follows > the pattern (where all operators are left-associative): > > level1term = level2term [ level1op level2term ]... > level2term = level3term [ level2op level3term ]... > level3term = level4term [ level3op level4term ]... > ... > levelNterm = simplest_operand | '(' level1term ')' > > For 4-function arithmetic, where multiplication and division take > precedence > over addition and subtraction, this looks like (using the arithmetic names > 'term' and 'factor' that we all learned in high school algebra): > > expr = term [ ('+' or '-') term ]... > term = factor [ ('*' or '/') factor ]... > factor = numeric_value | variable_name | '(' expr ')' > > and with these 3 lines, you can parse "4*a + 5*b + c - d/10", and get a > resulting structure where all the operators respect their proper > precedences. > > (I think it was the elegance of this algorithm which first sparked my > interest in parsers, over 30 years ago.) > > There is a nice wikipedia page that covers this too: > https://en.wikipedia.org/wiki/Operator-precedence_parser. > > So here is your application implemented following this pattern, using > pyparsing: > > from pyparsing import * > """ > BNF: > > bool_expr = and_term > and_term = or_term [AND or_term]... > or_term = operand [OR operand]... > operand = boolcomparison | '(' bool_expr ')' > boolcomparison = ('env' | 'name') '=' rvalue > rvalue = word composed of alpha chars > """ > > AND,OR = map(CaselessKeyword, "AND OR".split()) > LPAR,RPAR = map(Suppress,'()') > key = oneOf(['env', 'name']) > val = Word(alphas) > keyEqVal = key + '=' + val > keyEqVal = key + oneOf('= < > != >= <=') + val > boolcomparison = Group(keyEqVal) > > # implement BNF, bottom-up > boolExpr = Forward() > operand = boolcomparison | LPAR + boolExpr + RPAR > or_term = Group(operand + ZeroOrMore(OR + operand)) > and_term = Group(or_term + ZeroOrMore(AND + or_term)) > boolExpr << and_term > > # enclosed = Forward() > # nestedParens = nestedExpr('(', ')', content=enclosed) > # boolExpr = enclosed + oneOf(["and", "or"]) + enclosed > # enclosed << (boolExpr | nestedParens | keyEqVal) > > data = 'env=prod' > # OR should take precedence over AND > data = 'name=Bob and env=test or env=prototype' > print boolExpr.parseString(data) > > > The result is: > > [[[['name', '=', 'Bob']], 'AND', [['env', '=', 'test'], 'OR', ['env', > '=', 'prototype']]]] > > Note that the grouping will case the OR to be evaluated before the AND. > > The common occurrence of these kinds of expressions led me to write the > 'operatorPrecedence' parser helper, now renamed to 'infixNotation'. > Converting this sample to use 'infixNotation' is left as an exercise. > > HTH, > -- Paul > > > -----Original Message----- > From: mayamatakeshi [mailto:may...@gm...] > Sent: Tuesday, September 01, 2015 10:06 PM > To: pyp...@li... > Subject: [Pyparsing] Getting "maximum recursion depth exceeded" when using > Forward > > Hello, I am getting "maximum recursion depth exceeded" with the below: > > ##################################### > from pyparsing import * > > key = oneOf(['env', 'name']) > val = Word(alphas) > keyEqVal = key + '=' + val > enclosed = Forward() > nestedParens = nestedExpr('(', ')', content=enclosed) boolExpr = enclosed + > oneOf(["and", "or"]) + enclosed enclosed << (boolExpr | nestedParens | > keyEqVal) > > data = 'e=prod' > print enclosed.parseString(data) > ##################################### > > I know it is due my definition of boolExpr, but I could not figure out how > to correct it. > > R, > Takeshi > > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > > |
From: Paul M. <pt...@au...> - 2015-09-02 04:22:02
|
There is a standard approach to parsing this kind of expression, called "infix notation", which will take into account the precedence of operations so that the final evaluation will be performed properly. In your simplified case, you have two operations, AND and OR - by mathematical convention, OR has higher precedence than AND. So if I wanted to evaluate: color = RED and size = XXL or cost > $50 This would be equivalent to: color = RED and (size = XXL or cost > $50) And so we would want our parser to return a nested structure that helps us evaluate these expressions properly. The standard recursive approach to defining an infix notation parser follows the pattern (where all operators are left-associative): level1term = level2term [ level1op level2term ]... level2term = level3term [ level2op level3term ]... level3term = level4term [ level3op level4term ]... ... levelNterm = simplest_operand | '(' level1term ')' For 4-function arithmetic, where multiplication and division take precedence over addition and subtraction, this looks like (using the arithmetic names 'term' and 'factor' that we all learned in high school algebra): expr = term [ ('+' or '-') term ]... term = factor [ ('*' or '/') factor ]... factor = numeric_value | variable_name | '(' expr ')' and with these 3 lines, you can parse "4*a + 5*b + c - d/10", and get a resulting structure where all the operators respect their proper precedences. (I think it was the elegance of this algorithm which first sparked my interest in parsers, over 30 years ago.) There is a nice wikipedia page that covers this too: https://en.wikipedia.org/wiki/Operator-precedence_parser. So here is your application implemented following this pattern, using pyparsing: from pyparsing import * """ BNF: bool_expr = and_term and_term = or_term [AND or_term]... or_term = operand [OR operand]... operand = boolcomparison | '(' bool_expr ')' boolcomparison = ('env' | 'name') '=' rvalue rvalue = word composed of alpha chars """ AND,OR = map(CaselessKeyword, "AND OR".split()) LPAR,RPAR = map(Suppress,'()') key = oneOf(['env', 'name']) val = Word(alphas) keyEqVal = key + '=' + val keyEqVal = key + oneOf('= < > != >= <=') + val boolcomparison = Group(keyEqVal) # implement BNF, bottom-up boolExpr = Forward() operand = boolcomparison | LPAR + boolExpr + RPAR or_term = Group(operand + ZeroOrMore(OR + operand)) and_term = Group(or_term + ZeroOrMore(AND + or_term)) boolExpr << and_term # enclosed = Forward() # nestedParens = nestedExpr('(', ')', content=enclosed) # boolExpr = enclosed + oneOf(["and", "or"]) + enclosed # enclosed << (boolExpr | nestedParens | keyEqVal) data = 'env=prod' # OR should take precedence over AND data = 'name=Bob and env=test or env=prototype' print boolExpr.parseString(data) The result is: [[[['name', '=', 'Bob']], 'AND', [['env', '=', 'test'], 'OR', ['env', '=', 'prototype']]]] Note that the grouping will case the OR to be evaluated before the AND. The common occurrence of these kinds of expressions led me to write the 'operatorPrecedence' parser helper, now renamed to 'infixNotation'. Converting this sample to use 'infixNotation' is left as an exercise. HTH, -- Paul -----Original Message----- From: mayamatakeshi [mailto:may...@gm...] Sent: Tuesday, September 01, 2015 10:06 PM To: pyp...@li... Subject: [Pyparsing] Getting "maximum recursion depth exceeded" when using Forward Hello, I am getting "maximum recursion depth exceeded" with the below: ##################################### from pyparsing import * key = oneOf(['env', 'name']) val = Word(alphas) keyEqVal = key + '=' + val enclosed = Forward() nestedParens = nestedExpr('(', ')', content=enclosed) boolExpr = enclosed + oneOf(["and", "or"]) + enclosed enclosed << (boolExpr | nestedParens | keyEqVal) data = 'e=prod' print enclosed.parseString(data) ##################################### I know it is due my definition of boolExpr, but I could not figure out how to correct it. R, Takeshi --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |
From: mayamatakeshi <may...@gm...> - 2015-09-02 03:06:25
|
Hello, I am getting "maximum recursion depth exceeded" with the below: ##################################### from pyparsing import * key = oneOf(['env', 'name']) val = Word(alphas) keyEqVal = key + '=' + val enclosed = Forward() nestedParens = nestedExpr('(', ')', content=enclosed) boolExpr = enclosed + oneOf(["and", "or"]) + enclosed enclosed << (boolExpr | nestedParens | keyEqVal) data = 'e=prod' print enclosed.parseString(data) ##################################### I know it is due my definition of boolExpr, but I could not figure out how to correct it. R, Takeshi |
From: Paul M. <pt...@au...> - 2015-07-28 12:55:53
|
Since you are using LabelNameLineParser as a more complete parser of the results from LabelNameLine, you can't add the lambda as a condition (which only evaluates whether the condition holds, but does *not* modify the parsed results), you have to add it as a parse action. If you use the LabelNameLineParser in a parse action, then the results of the more complex parser will be returned - addCondition just raises or doesn't raise ParseException. Change this line: LabelNameLine = Line.copy().addCondition(lambda self, loc, toks: LabelNameLineParser.parseString(toks[0], True)) To: LabelNameLine = Line.copy().addParseAction(lambda self, loc, toks: LabelNameLineParser.parseString(toks[0], True)) With this change, I get these results: ['chap-filt'] ['filt-eq-char'] ['fmslat-filt'] ['fmslat-two'] ['fmslat-one'] ['up-straight'] ['thm1:prim-exists'] ['free-alt-star'] ['free-alt-eq'] ['free-alt-impl'] ['two-diags'] ... -- Paul -----Original Message----- From: Victor Porton [mailto:po...@na...] Sent: Monday, July 27, 2015 4:31 PM To: pyp...@li... Subject: [Pyparsing] [Fwd: 1. .suppress() does not work; 2. a new API call suggestion] Probably it is a bug in pyparsing, but most probably is my misunderstanding. Note that it uses modified pyparsing with new method .addCondition() (attached). When I run my script (attached): $ ./DuplicateRefs.py chap-filt.lyx it produces output like: ['name "chap-filt"'] In my opinion, it should instead produce ['chap-filt'] because I use .suppress() in my code. Sorry, that I package the data in a separate file, not in a string, but the real example file is long. What is wrong? How to make it to produce only label name (like 'chap -filt'), not like ['name "chap-filt"']? Additional issue: Because of peculiarity of the syntax of the .lyx file (attached) I analyze, I first split it into tokens and then parse the tokens themselves (with another parser). See for example: LabelNameLineParser = \ pyparsing.Keyword("name").suppress() + pyparsing.White(" ").suppress() + \ pyparsing.Literal('"').suppress() + pyparsing.CharsNotIn('"') + pyparsing.Literal('"').suppress() LabelNameLine = Line.copy().addCondition(lambda self, loc, toks: LabelNameLineParser.parseString(toks[0], True)) Maybe, we should introduce a shorter API for tasks like this? (I am unsure whether this situation is often enough to deserve a special API.) What is your opinion? What if I will write a patch which does this? will you use it? -- Victor Porton - http://portonvictor.org --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |