Re: [Pyparsing] Speeding up a parse

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi David!

I've read that guessing, which parts of a program need optimizing, is 
usually impossible. You need to profile your program. There is a 
profiler built into python:
http://docs.python.org/lib/profile.html

I have no experience with the profiler, but the basic usage is fairly 
simple.

However, the profiler will probably show you, that most of the time is 
spent in some function of the Pyparsing library. You expected it 
anyway, and it doesn't help you very much. You might catch a parse 
action that consumes much time this way; and you might spot parts of 
Pyparsing that need optimization.

So maybe you should start to write a profiling extension for 
Pyparsing! I think it is feasible because the class ParserElement 
contains some highlevel driver functions that are executed for each 
parser object (_parseNoCache, _parseCache). I think it could be done 
like this:

You create a class variable:
    ParserElement.profileStats = {}
It maps: 
    <parser's name> : n_enter, n_success, n_fail, t_cumulative

Then at the start of _parseNoCache or _parseCache you locate the 
matching entry,
    ParserElement.profileStats[self.name]
increment the the enter counter and store the current time. 

At the exit points you increase either the success or the failure 
counters; compute the time spent in the parser and add it to the 
cumulative time value. 

At the end of the program you convert the dict to a list and store it 
to a text file. You should also add a sorting function.

I think _parseCache looks pretty simple; adding something like my 
proposed profiling facility seems easy. I haven't looked carefully at 
anything, nor did I write any code. (I should have maybe, instead of 
writing this lengthy email.) As packrat parsing gives you no big 
additional problems I think you should just use  _parseCache because 
its more easy. 

I hope this helps you at least somewhat.
Kind regards,
Eike.