Thread: [Pyparsing] parsing with operatorPrecedence takes too much time
Brought to you by:
ptmcg
From: Ralf S. <sc...@gm...> - 2008-04-02 16:33:11
|
Hi all, I'm attaching a short test program that tries to parse mathematical expressions using operatorPrecedence. It takes nearly one second to parse a simple expression like 1+1+1-(5+2) (on a 2.4Ghz machine). Any hints? Am I doing something wrong? Regards, - Ralf |
From: Joshua J. K. <jk...@sk...> - 2008-04-02 17:31:20
|
On Wed, 2008-04-02 at 18:33 +0200, Ralf Schmitt wrote: > Hi all, > > I'm attaching a short test program that tries to parse mathematical > expressions using operatorPrecedence. > It takes nearly one second to parse a simple expression like 1+1+1-(5+2) (on > a 2.4Ghz machine). You forgot to attach it. :) When you say "takes nearly one second" do you mean for the python script to start up, initialize, parse, and exit? Or one second just for parsing? Have you tried *just* the parsing routine under something like timeit? Check my code against the timeit docs, but something like this: t = timeit("parse_exp.parseString('1+1+1-(5+2)')", setup='from __main__ import parse_exp) print t.timeit() That should give you an idea of how long it's actually taking. j -- Joshua Kugler VOC/SigNet Provider (aka Web App Programmer) S&K Aerospace Alaska 1 |
From: Ralf S. <sc...@gm...> - 2008-04-02 17:44:06
|
On Wed, Apr 2, 2008 at 7:29 PM, Joshua J. Kugler <jk...@sk...> wrote: > On Wed, 2008-04-02 at 18:33 +0200, Ralf Schmitt wrote: > > Hi all, > > > > I'm attaching a short test program that tries to parse mathematical > > expressions using operatorPrecedence. > > It takes nearly one second to parse a simple expression like 1+1+1-(5+2) > (on > > a 2.4Ghz machine). > > You forgot to attach it. :) > hmm. gmail says it's there. I'm pasting it at the end of this email. > When you say "takes nearly one second" do you mean for the python script > to start up, initialize, parse, and exit? Or one second just for > parsing? Have you tried *just* the parsing routine under something like > timeit? > I mean only the parsing. > Check my code against the timeit docs, but something like this: > > t = timeit("parse_exp.parseString('1+1+1-(5+2)')", > setup='from __main__ import parse_exp) > > print t.timeit() > > That should give you an idea of how long it's actually taking. > no need for timeit. look at this: #! /usr/bin/env python # -*- coding: utf-8 -*- from pyparsing import (Word, Literal, CaselessLiteral, Combine, Optional, nums, StringEnd, operatorPrecedence, opAssoc) # define grammar point = Literal('.') number=integer = Word(nums) floatnumber = Combine( (integer + Optional( point + Optional(number) ) + Optional( CaselessLiteral('e') + integer )) | (point + Optional(number) + Optional( CaselessLiteral('e') + integer )) ) operand = floatnumber | integer plus = Literal("+") minus = Literal("-") mult = Literal("*") div = Literal("/") | CaselessLiteral("div") | CaselessLiteral("mod") addop = plus | minus multop = mult | div cmpop = Literal("<>") | Literal("!=") | Literal("=") | Literal("<=") | Literal(">=") | Literal(">") | Literal("<") expr = operatorPrecedence(operand, [ (Literal("+") | Literal("-") | CaselessLiteral("not"), 1, opAssoc.RIGHT), (multop, 2, opAssoc.LEFT), (addop, 2, opAssoc.LEFT), (Literal("round"), 2, opAssoc.LEFT), (cmpop, 2, opAssoc.LEFT), (CaselessLiteral("and"), 2, opAssoc.LEFT), (CaselessLiteral("or"), 2, opAssoc.LEFT), ]) + StringEnd() import time stime=time.time() res=expr.parseString("1+1+1-(5+2)") print time.time()-stime, res |
From: Paul M. <pt...@au...> - 2008-04-02 22:15:53
|
This performance problem has come up a couple of times in the past month. For a start, you could try enabling packrat parsing - change your import and following lines to: from pyparsing import (Word, Literal, CaselessLiteral, Combine, Optional, nums, StringEnd, operatorPrecedence, opAssoc,ParserElement) ParserElement.enablePackrat() This will speed up your parser about 500X (not kidding!). -- Paul -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Ralf Schmitt Sent: Wednesday, April 02, 2008 12:44 PM To: Joshua J. Kugler Cc: pyp...@li... Subject: Re: [Pyparsing] parsing with operatorPrecedence takes too much time On Wed, Apr 2, 2008 at 7:29 PM, Joshua J. Kugler <jk...@sk...> wrote: > On Wed, 2008-04-02 at 18:33 +0200, Ralf Schmitt wrote: > > Hi all, > > > > I'm attaching a short test program that tries to parse mathematical > > expressions using operatorPrecedence. > > It takes nearly one second to parse a simple expression like > > 1+1+1-(5+2) > (on > > a 2.4Ghz machine). > > You forgot to attach it. :) > hmm. gmail says it's there. I'm pasting it at the end of this email. > When you say "takes nearly one second" do you mean for the python > script to start up, initialize, parse, and exit? Or one second just > for parsing? Have you tried *just* the parsing routine under > something like timeit? > I mean only the parsing. > Check my code against the timeit docs, but something like this: > > t = timeit("parse_exp.parseString('1+1+1-(5+2)')", > setup='from __main__ import parse_exp) > > print t.timeit() > > That should give you an idea of how long it's actually taking. > no need for timeit. look at this: #! /usr/bin/env python # -*- coding: utf-8 -*- from pyparsing import (Word, Literal, CaselessLiteral, Combine, Optional, nums, StringEnd, operatorPrecedence, opAssoc) # define grammar point = Literal('.') number=integer = Word(nums) floatnumber = Combine( (integer + Optional( point + Optional(number) ) + Optional( CaselessLiteral('e') + integer )) | (point + Optional(number) + Optional( CaselessLiteral('e') + integer )) ) operand = floatnumber | integer plus = Literal("+") minus = Literal("-") mult = Literal("*") div = Literal("/") | CaselessLiteral("div") | CaselessLiteral("mod") addop = plus | minus multop = mult | div cmpop = Literal("<>") | Literal("!=") | Literal("=") | Literal("<=") | Literal(">=") | Literal(">") | Literal("<") expr = operatorPrecedence(operand, [ (Literal("+") | Literal("-") | CaselessLiteral("not"), 1, opAssoc.RIGHT), (multop, 2, opAssoc.LEFT), (addop, 2, opAssoc.LEFT), (Literal("round"), 2, opAssoc.LEFT), (cmpop, 2, opAssoc.LEFT), (CaselessLiteral("and"), 2, opAssoc.LEFT), (CaselessLiteral("or"), 2, opAssoc.LEFT), ]) + StringEnd() import time stime=time.time() res=expr.parseString("1+1+1-(5+2)") print time.time()-stime, res |
From: Ralf S. <sc...@gm...> - 2008-04-03 08:58:51
|
On Thu, Apr 3, 2008 at 12:15 AM, Paul McGuire <pt...@au...> wrote: > This performance problem has come up a couple of times in the past month. > For a start, you could try enabling packrat parsing - change your import > and > following lines to: > > > from pyparsing import (Word, Literal, CaselessLiteral, > Combine, Optional, nums, StringEnd, > operatorPrecedence, > opAssoc,ParserElement) > ParserElement.enablePackrat() > > > This will speed up your parser about 500X (not kidding!). > fine, thanks. Actually I was thinking pyparsing was kidding me when it was that slow. (I think this is still rather slow if pyparsing can only parse 500 of those very basic expressions in one second). I was using a parser, which didn't use operatorPrecedence before and it never felt slow. Am I better off writing the parsing rules myself? (like in http://code.pediapress.com/hg/mwlib/file/tip/mwlib/expr.py) It's similar to the mathematical expression parser from the examples directory. btw, I am running python 2.6 and with this change I get the following error: ~/ python e.py ralf@rat64ok Traceback (most recent call last): File "e.py", line 41, in <module> res=expr.parseString("1+1+1-(5+2)") File "/home/ralf/py26/lib/python2.6/site-packages/pyparsing-1.4.11-py2.6.egg/pyparsing.py", line 980, in parseString loc, tokens = self._parse( instring.expandtabs(), 0 ) File "/home/ralf/py26/lib/python2.6/site-packages/pyparsing-1.4.11-py2.6.egg/pyparsing.py", line 907, in _parseCache if lookup in ParserElement._exprArgCache: TypeError: unhashable type: 'And' python 2.6 won't let you call hash() on an object which implements it's own __eq__, but doesn't implement __hash__. This is something you might like to fix in an upcoming release.. I will try to fix that issue and will send you a patch... - ralf |
From: Ralf S. <sc...@gm...> - 2008-04-03 11:25:34
|
> > python 2.6 won't let you call hash() on an object which implements it's > own __eq__, but doesn't implement __hash__. This is something you might like > to fix in an upcoming release.. > I will try to fix that issue and will send you a patch... > I'm now using the following workaround: if sys.version_info>=(2,6): ParserElement.__hash__ = lambda self: hash(id(self)) - ralf |
From: Ralf S. <sc...@gm...> - 2008-04-04 10:32:47
|
On Thu, Apr 3, 2008 at 12:15 AM, Paul McGuire <pt...@au...> wrote: > This performance problem has come up a couple of times in the past month. > For a start, you could try enabling packrat parsing - change your import > and > following lines to: > > > from pyparsing import (Word, Literal, CaselessLiteral, > Combine, Optional, nums, StringEnd, > operatorPrecedence, > opAssoc,ParserElement) > ParserElement.enablePackrat() > > > This will speed up your parser about 500X (not kidding!). > Paul, operatorPrecedence isn't working for me. I think something must be wrong, it's just much too slow. I've turned on packrat parsing. The current code can be viewed here: http://code.pediapress.com/hg/mwlib/file/ce563c294ad1/mwlib/e.py or downloaded here: http://code.pediapress.com/hg/mwlib/raw-file/ce563c294ad1/mwlib/e.py It reads expressions from stdin. Parsing something like: 5-(((63+459.67)/1.8)<100000)-(((63+459.67)/1.8)<10000)-(((63+459.67)/1.8)<1000)-(((63+459.67)/1.8)<100)-(((63+459.67)/1.8)<10) takes nearly 0.09 seconds. (i.e. one can only parse around 11 such expressions, which is unbeliavably slow for a 2.4 Ghz processor). The string representation of the expr parser element is huge: In [2]: len(str(e.expr)) Out[2]: 4317534 For my old parser this is length is 22076. What's going on? - Ralf |
From: Paul M. <pt...@au...> - 2008-04-03 12:08:30
|
Ralf - Thanks! I can make this patch easily in the next release. Is there any harm in just defining this function for all versions, instead of conditionalizing for v2.6? -- Paul -----Original Message----- From: pyp...@li... [mailto:pyp...@li...] On Behalf Of Ralf Schmitt Sent: Thursday, April 03, 2008 6:26 AM To: Paul McGuire Cc: pyp...@li... Subject: Re: [Pyparsing] parsing with operatorPrecedence takes too much time > > python 2.6 won't let you call hash() on an object which implements > it's own __eq__, but doesn't implement __hash__. This is something you > might like to fix in an upcoming release.. > I will try to fix that issue and will send you a patch... > I'm now using the following workaround: if sys.version_info>=(2,6): ParserElement.__hash__ = lambda self: hash(id(self)) - ralf ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace _______________________________________________ Pyparsing-users mailing list Pyp...@li... https://lists.sourceforge.net/lists/listinfo/pyparsing-users |
From: Ralf S. <sc...@gm...> - 2008-04-03 12:11:36
|
On Thu, Apr 3, 2008 at 2:08 PM, Paul McGuire <pt...@au...> wrote: > Ralf - > > Thanks! I can make this patch easily in the next release. > > Is there any harm in just defining this function for all versions, instead > of conditionalizing for v2.6? > It might be a bit slower than the builtin hash. Other than that it should be no problem. - ralf |
From: Paul M. <pt...@au...> - 2008-04-04 15:16:45
|
Ralf - Part of what you are seeing is the result of a bug-fix in pyparsing 1.4.9 (I think), in which the operands of an operatorPrecedence were evaluated once for each level in the precedence chain, including the calling of parse actions. To correct this, each level now does a lookahead to see if that level applies, and only if successful then parses that level, otherwise descends to the next level of the list. So now every level is prefixed by a "FollowedBy" with that level's expression. Since the levels nest, this gets geometrically larger as more levels are added to the list. I ran your code with a modified version of operatorPrecedence that does not do this, and the time to parse your given string dropped from .09 to .06 seconds, and the expression string representation dropped to 202600 chars. I've poked a bit at your example, and I don't see any other low-hanging fruit to crank more speed out of this parser. What you might do is try converting by hand to an explicit infix notation parser, using the style in fourFn.py that comes with pyparsing. Other than that, I would say this may be as much a pyparsing performance issue as it is operatorPrecedence. -- Paul |
From: Ralf S. <sc...@gm...> - 2008-04-04 15:32:40
|
On Fri, Apr 4, 2008 at 5:16 PM, Paul McGuire <pt...@au...> wrote: > > I've poked a bit at your example, and I don't see any other low-hanging > fruit to crank more speed out of this parser. What you might do is try > converting by hand to an explicit infix notation parser, using the style > in > fourFn.py that comes with pyparsing. Other than that, I would say this > may > be as much a pyparsing performance issue as it is operatorPrecedence. > thanks for having a look at it. I will be implementing this parser without pyparsing then. -ralf |