Python parsing module / Discussion / Help/Open Discussion: request for parsing speed optimization tips

request for parsing speed optimization tips

Forum: Help/Open Discussion

Creator: Klaas Hofstra

Created: 2005-03-24

Updated: 2013-05-14

Klaas Hofstra - 2005-03-24

Hi,

My current grammar is about as complex as K&R C. Parsing files does now take up to 10 seconds. I'd like some tips about how I can optimize my grammar in order to optimize parsing speed. I can imagine that you'd like rules to match asap to cut down iterations and stuff.

This is my first big project in Python so I don't have any optimization experience with this language. Would it be possible to optimize pyparsing by doing stuff like pre-allocation and such or is pyparsing as fast as it can be?

-Klaas

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Paul McGuire - 2005-03-25
  
  Not having seen your specific grammar, I can only make some general suggestions:
  1. Use '^' sparingly, try to use '|' and the oneOf helper so that alternations will short-circuit ('^' has to evaluate all paths before choosing the longest - but sometimes this is unavoidable).
  2. Use repetition instead of recursion for lists. Many BNFs define "one or more" using something like:
  .    listOfAs = A | A + listOfAs
  To implement this directly in pyparsing requires Forward to set up a recursive grammar:
  .    listOfAs = Forward()
  .    listOfAs << Literal("A") ^ Literal("A") + listOfAs
  This is *much* more efficiently implemented as:
  .    listOfAs = OneOrMore( Literal("A") )
  3. Use Optional instead of "or empty". Another BNF idiom is this:
  .    optionalA = "A" | empty
  I first saw this in a grammar where it was not so obvious, where an optional argument was shown as
  .    arg = argDefn | empty
  The pyparsing form for this is:
  .    arg = Optional( argDefn )
  4. Use psyco. This will shave about 30-40% off your parsing time, especially in a complex grammar. If you are writing your grammar for a platform where you aren't sure if psyco is available, use this form:
  .try:
  .    import psyco
  .    psyco.full()
  .except:
  .    pass
  
  Back around version 1.0.4 I did a lot of performance tuning to pyparsing with the hotshot profiler to try to wring out what I could (I've also wanted to keep pyparsing implemented in pure Python, although I think a pyrex version could be an option.) Probably the biggest performance improvement was the pre-allocation of exception objects when ParserElements are constructed. It turns out that each element has a particular exception that it raises whenever it fails to match, varying only in the location value, and I was creating and destroying them by the *thousands*. The second biggest perf improvement was using startswith to avoid creating a string slice in the Literal class, which is *very* heavily used (suggestion by Dan Griffith). Since then I've looked at some other obvious performance opportunities, but they haven't panned out. pyparsing is a parser type called a "combinator" and raw performance is just one of those things that combinators are less strong.
  
  If you want to e-mail me your grammar, I'll try to see if there are any other obvious things you could improve.
  
  -- Paul
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Klaas Hofstra - 2005-03-25
    
    Paul,
    
    Thanks for the reply! I'll update my grammar with those tips in mind. My grammar is still being finalized, but once it is done I can mail it to you.
    
    Cheers,
    
    Klaas
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

request for parsing speed optimization tips

Forums

Help

request for parsing speed optimization tips document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

request for parsing speed optimization tips