Thread: [Pyparsing] parsing with operatorPrecedence takes too much time

Brought to you by: ptmcg

pyparsing-users

[Pyparsing] parsing with operatorPrecedence takes too much time

From: Ralf S. <sc...@gm...> - 2008-04-02 16:33:11

Hi all,

I'm attaching a short test program that tries to parse mathematical
expressions using operatorPrecedence.
It takes nearly one second to parse a simple expression like 1+1+1-(5+2) (on
a 2.4Ghz machine).

Any hints? Am I doing something wrong?

Regards,
- Ralf

Re: [Pyparsing] parsing with operatorPrecedence takes too much time

From: Joshua J. K. <jk...@sk...> - 2008-04-02 17:31:20

On Wed, 2008-04-02 at 18:33 +0200, Ralf Schmitt wrote:
> Hi all,
> 
> I'm attaching a short test program that tries to parse mathematical
> expressions using operatorPrecedence.
> It takes nearly one second to parse a simple expression like 1+1+1-(5+2) (on
> a 2.4Ghz machine).

You forgot to attach it. :)

When you say "takes nearly one second" do you mean for the python script
to start up, initialize, parse, and exit?  Or one second just for
parsing?  Have you tried *just* the parsing routine under something like
timeit?

Check my code against the timeit docs, but something like this:

t = timeit("parse_exp.parseString('1+1+1-(5+2)')",
    setup='from __main__ import parse_exp)

print t.timeit()

That should give you an idea of how long it's actually taking.

j

-- 
Joshua Kugler
VOC/SigNet Provider (aka Web App Programmer)
S&K Aerospace Alaska
1

Re: [Pyparsing] parsing with operatorPrecedence takes too much time

From: Ralf S. <sc...@gm...> - 2008-04-02 17:44:06

On Wed, Apr 2, 2008 at 7:29 PM, Joshua J. Kugler <jk...@sk...>
wrote:

> On Wed, 2008-04-02 at 18:33 +0200, Ralf Schmitt wrote:
> > Hi all,
> >
> > I'm attaching a short test program that tries to parse mathematical
> > expressions using operatorPrecedence.
> > It takes nearly one second to parse a simple expression like 1+1+1-(5+2)
> (on
> > a 2.4Ghz machine).
>
> You forgot to attach it. :)
>

hmm. gmail says it's there. I'm pasting it at the end of this email.


> When you say "takes nearly one second" do you mean for the python script
> to start up, initialize, parse, and exit?  Or one second just for
> parsing?  Have you tried *just* the parsing routine under something like
> timeit?
>

I mean only the parsing.


> Check my code against the timeit docs, but something like this:
>
> t = timeit("parse_exp.parseString('1+1+1-(5+2)')",
>    setup='from __main__ import parse_exp)
>
> print t.timeit()
>
> That should give you an idea of how long it's actually taking.
>

no need for timeit. look at this:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

from pyparsing import (Word, Literal, CaselessLiteral,
                       Combine, Optional, nums, StringEnd,
operatorPrecedence, opAssoc)

# define grammar
point = Literal('.')
number=integer = Word(nums)
floatnumber = Combine( (integer +
                       Optional( point + Optional(number) ) +
                       Optional(  CaselessLiteral('e') + integer ))
                       | (point + Optional(number) + Optional(
CaselessLiteral('e') + integer ))
                     )

operand = floatnumber | integer

plus  = Literal("+")
minus = Literal("-")
mult  = Literal("*")
div   = Literal("/") | CaselessLiteral("div") | CaselessLiteral("mod")
addop  = plus | minus
multop = mult | div

cmpop = Literal("<>") | Literal("!=") | Literal("=") | Literal("<=") |
Literal(">=") | Literal(">") | Literal("<")

expr = operatorPrecedence(operand, [
        (Literal("+") | Literal("-") | CaselessLiteral("not"), 1,
opAssoc.RIGHT),
        (multop, 2, opAssoc.LEFT),
        (addop, 2, opAssoc.LEFT),
        (Literal("round"), 2, opAssoc.LEFT),
        (cmpop, 2, opAssoc.LEFT),
        (CaselessLiteral("and"), 2, opAssoc.LEFT),
        (CaselessLiteral("or"), 2, opAssoc.LEFT),
        ]) + StringEnd()

import time
stime=time.time()
res=expr.parseString("1+1+1-(5+2)")
print time.time()-stime, res

Re: [Pyparsing] parsing with operatorPrecedence takes too much time

From: Paul M. <pt...@au...> - 2008-04-02 22:15:53

This performance problem has come up a couple of times in the past month.
For a start, you could try enabling packrat parsing - change your import and
following lines to:


from pyparsing import (Word, Literal, CaselessLiteral,
                       Combine, Optional, nums, StringEnd,
operatorPrecedence, 
                       opAssoc,ParserElement)
ParserElement.enablePackrat()


This will speed up your parser about 500X (not kidding!).

-- Paul
 

-----Original Message-----
From: pyp...@li...
[mailto:pyp...@li...] On Behalf Of Ralf
Schmitt
Sent: Wednesday, April 02, 2008 12:44 PM
To: Joshua J. Kugler
Cc: pyp...@li...
Subject: Re: [Pyparsing] parsing with operatorPrecedence takes too much time

On Wed, Apr 2, 2008 at 7:29 PM, Joshua J. Kugler <jk...@sk...>
wrote:

> On Wed, 2008-04-02 at 18:33 +0200, Ralf Schmitt wrote:
> > Hi all,
> >
> > I'm attaching a short test program that tries to parse mathematical 
> > expressions using operatorPrecedence.
> > It takes nearly one second to parse a simple expression like 
> > 1+1+1-(5+2)
> (on
> > a 2.4Ghz machine).
>
> You forgot to attach it. :)
>

hmm. gmail says it's there. I'm pasting it at the end of this email.


> When you say "takes nearly one second" do you mean for the python 
> script to start up, initialize, parse, and exit?  Or one second just 
> for parsing?  Have you tried *just* the parsing routine under 
> something like timeit?
>

I mean only the parsing.


> Check my code against the timeit docs, but something like this:
>
> t = timeit("parse_exp.parseString('1+1+1-(5+2)')",
>    setup='from __main__ import parse_exp)
>
> print t.timeit()
>
> That should give you an idea of how long it's actually taking.
>

no need for timeit. look at this:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

from pyparsing import (Word, Literal, CaselessLiteral,
                       Combine, Optional, nums, StringEnd,
operatorPrecedence, opAssoc)

# define grammar
point = Literal('.')
number=integer = Word(nums)
floatnumber = Combine( (integer +
                       Optional( point + Optional(number) ) +
                       Optional(  CaselessLiteral('e') + integer ))
                       | (point + Optional(number) + Optional(
CaselessLiteral('e') + integer ))
                     )

operand = floatnumber | integer

plus  = Literal("+")
minus = Literal("-")
mult  = Literal("*")
div   = Literal("/") | CaselessLiteral("div") | CaselessLiteral("mod")
addop  = plus | minus
multop = mult | div

cmpop = Literal("<>") | Literal("!=") | Literal("=") | Literal("<=") |
Literal(">=") | Literal(">") | Literal("<")

expr = operatorPrecedence(operand, [
        (Literal("+") | Literal("-") | CaselessLiteral("not"), 1,
opAssoc.RIGHT),
        (multop, 2, opAssoc.LEFT),
        (addop, 2, opAssoc.LEFT),
        (Literal("round"), 2, opAssoc.LEFT),
        (cmpop, 2, opAssoc.LEFT),
        (CaselessLiteral("and"), 2, opAssoc.LEFT),
        (CaselessLiteral("or"), 2, opAssoc.LEFT),
        ]) + StringEnd()

import time
stime=time.time()
res=expr.parseString("1+1+1-(5+2)")
print time.time()-stime, res

Re: [Pyparsing] parsing with operatorPrecedence takes too much time

From: Ralf S. <sc...@gm...> - 2008-04-03 08:58:51

On Thu, Apr 3, 2008 at 12:15 AM, Paul McGuire <pt...@au...> wrote:

> This performance problem has come up a couple of times in the past month.
> For a start, you could try enabling packrat parsing - change your import
> and
> following lines to:
>
>
> from pyparsing import (Word, Literal, CaselessLiteral,
>                       Combine, Optional, nums, StringEnd,
> operatorPrecedence,
>                        opAssoc,ParserElement)
> ParserElement.enablePackrat()
>
>
> This will speed up your parser about 500X (not kidding!).
>

fine, thanks. Actually I was thinking pyparsing was kidding me when it was
that slow.
(I think this is still rather slow if pyparsing can only parse 500 of those
very basic expressions in one second). I was using a parser, which didn't
use operatorPrecedence before and it never felt slow. Am I better off
writing the parsing rules myself? (like in
http://code.pediapress.com/hg/mwlib/file/tip/mwlib/expr.py)
It's similar to the mathematical expression parser from the examples
directory.

btw, I am running python 2.6 and with this change I get the following error:
~/ python e.py                                                    ralf@rat64ok
Traceback (most recent call last):
  File "e.py", line 41, in <module>
    res=expr.parseString("1+1+1-(5+2)")
  File
"/home/ralf/py26/lib/python2.6/site-packages/pyparsing-1.4.11-py2.6.egg/pyparsing.py",
line 980, in parseString
    loc, tokens = self._parse( instring.expandtabs(), 0 )
  File
"/home/ralf/py26/lib/python2.6/site-packages/pyparsing-1.4.11-py2.6.egg/pyparsing.py",
line 907, in _parseCache
    if lookup in ParserElement._exprArgCache:
TypeError: unhashable type: 'And'

python 2.6 won't let you call hash() on an object which implements it's own
__eq__, but doesn't implement __hash__. This is something you might like to
fix in an upcoming release..
I will try to fix that issue and will send you a patch...

- ralf

Re: [Pyparsing] parsing with operatorPrecedence takes too much time

From: Ralf S. <sc...@gm...> - 2008-04-03 11:25:34

>
> python 2.6 won't let you call hash() on an object which implements it's
> own __eq__, but doesn't implement __hash__. This is something you might like
> to fix in an upcoming release..
> I will try to fix that issue and will send you a patch...
>

I'm now using the following workaround:

 if sys.version_info>=(2,6):
    ParserElement.__hash__ = lambda self: hash(id(self))

- ralf

Re: [Pyparsing] parsing with operatorPrecedence takes too much time

From: Ralf S. <sc...@gm...> - 2008-04-04 10:32:47

On Thu, Apr 3, 2008 at 12:15 AM, Paul McGuire <pt...@au...> wrote:

> This performance problem has come up a couple of times in the past month.
> For a start, you could try enabling packrat parsing - change your import
> and
> following lines to:
>
>
> from pyparsing import (Word, Literal, CaselessLiteral,
>                       Combine, Optional, nums, StringEnd,
> operatorPrecedence,
>                        opAssoc,ParserElement)
> ParserElement.enablePackrat()
>
>
> This will speed up your parser about 500X (not kidding!).
>

Paul,

operatorPrecedence isn't working for me. I think something must be wrong,
it's just much too slow.
I've turned on packrat parsing. The current code can be viewed here:
http://code.pediapress.com/hg/mwlib/file/ce563c294ad1/mwlib/e.py
or downloaded here:
http://code.pediapress.com/hg/mwlib/raw-file/ce563c294ad1/mwlib/e.py

It reads expressions from stdin.

Parsing something like:
5-(((63+459.67)/1.8)<100000)-(((63+459.67)/1.8)<10000)-(((63+459.67)/1.8)<1000)-(((63+459.67)/1.8)<100)-(((63+459.67)/1.8)<10)

takes nearly 0.09 seconds. (i.e. one can only parse around 11 such
expressions, which is unbeliavably slow for a 2.4 Ghz processor).

The string representation of the expr parser element is huge:
In [2]: len(str(e.expr))
Out[2]: 4317534

For my old parser this is length is 22076.

What's going on?

- Ralf

Re: [Pyparsing] parsing with operatorPrecedence takes too much time

From: Paul M. <pt...@au...> - 2008-04-03 12:08:30

Ralf -

Thanks! I can make this patch easily in the next release.

Is there any harm in just defining this function for all versions, instead
of conditionalizing for v2.6?

-- Paul 

-----Original Message-----
From: pyp...@li...
[mailto:pyp...@li...] On Behalf Of Ralf
Schmitt
Sent: Thursday, April 03, 2008 6:26 AM
To: Paul McGuire
Cc: pyp...@li...
Subject: Re: [Pyparsing] parsing with operatorPrecedence takes too much time

>
> python 2.6 won't let you call hash() on an object which implements 
> it's own __eq__, but doesn't implement __hash__. This is something you 
> might like to fix in an upcoming release..
> I will try to fix that issue and will send you a patch...
>

I'm now using the following workaround:

 if sys.version_info>=(2,6):
    ParserElement.__hash__ = lambda self: hash(id(self))

- ralf
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for just about anything Open
Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Pyparsing-users mailing list
Pyp...@li...
https://lists.sourceforge.net/lists/listinfo/pyparsing-users

Re: [Pyparsing] parsing with operatorPrecedence takes too much time

From: Ralf S. <sc...@gm...> - 2008-04-03 12:11:36

On Thu, Apr 3, 2008 at 2:08 PM, Paul McGuire <pt...@au...> wrote:

> Ralf -
>
> Thanks! I can make this patch easily in the next release.
>
> Is there any harm in just defining this function for all versions, instead
> of conditionalizing for v2.6?
>

It might be a bit slower than the builtin hash. Other than that it should be
no problem.

- ralf

Re: [Pyparsing] parsing with operatorPrecedence takes too much time

From: Paul M. <pt...@au...> - 2008-04-04 15:16:45

Ralf -

Part of what you are seeing is the result of a bug-fix in pyparsing 1.4.9 (I
think), in which the operands of an operatorPrecedence were evaluated once
for each level in the precedence chain, including the calling of parse
actions.  To correct this, each level now does a lookahead to see if that
level applies, and only if successful then parses that level, otherwise
descends to the next level of the list.  So now every level is prefixed by a
"FollowedBy" with that level's expression.  Since the levels nest, this gets
geometrically larger as more levels are added to the list.  I ran your code
with a modified version of operatorPrecedence that does not do this, and the
time to parse your given string dropped from .09 to .06 seconds, and the
expression string representation dropped to 202600 chars.

I've poked a bit at your example, and I don't see any other low-hanging
fruit to crank more speed out of this parser.  What you might do is try
converting by hand to an explicit infix notation parser, using the style in
fourFn.py that comes with pyparsing.  Other than that, I would say this may
be as much a pyparsing performance issue as it is operatorPrecedence.

-- Paul

Re: [Pyparsing] parsing with operatorPrecedence takes too much time

From: Ralf S. <sc...@gm...> - 2008-04-04 15:32:40

On Fri, Apr 4, 2008 at 5:16 PM, Paul McGuire <pt...@au...> wrote:

>
> I've poked a bit at your example, and I don't see any other low-hanging
> fruit to crank more speed out of this parser.  What you might do is try
> converting by hand to an explicit infix notation parser, using the style
> in
> fourFn.py that comes with pyparsing.  Other than that, I would say this
> may
> be as much a pyparsing performance issue as it is operatorPrecedence.
>

thanks for having a look at it. I will be implementing this parser without
pyparsing then.

-ralf