Re: [Pyparsing] Speeding up a parse

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Well, it's something that I was actually looking into doing.  I profiled
my parse actions by hand (using a stopwatch class), and when the code
gets handed off to me, it's actually a really small amount of time that
I use it (ms times usually).  

I did try psyco, and it worked great, but I had to disable the
"keepOriginalText" chunk of my code, because it imports Inspect, which
hoses psyco (known psyco issue).

As a matter of fact, it gave me almost a 100% speedup, which is
fantastic, and doesn't require much code.  But I lose the ability to
keep the source.  

I think some more investigation of this would be a bit more in my favor,
as psyco is a "known" method to speed up, and frankly did an amazing job
when I got it working.

--dw

> -----Original Message-----
> From: Eike Welk [mailto:eik...@gm...] 
> Sent: Thursday, September 25, 2008 3:19 PM
> To: pyp...@li...
> Subject: Re: [Pyparsing] Speeding up a parse
> 
> Hi David!
> 
> I've read that guessing, which parts of a program need 
> optimizing, is usually impossible. You need to profile your 
> program. There is a profiler built into python:
> http://docs.python.org/lib/profile.html
> 
> I have no experience with the profiler, but the basic usage 
> is fairly simple.
> 
> However, the profiler will probably show you, that most of 
> the time is spent in some function of the Pyparsing library. 
> You expected it anyway, and it doesn't help you very much. 
> You might catch a parse action that consumes much time this 
> way; and you might spot parts of Pyparsing that need optimization.
> 
> So maybe you should start to write a profiling extension for 
> Pyparsing! I think it is feasible because the class 
> ParserElement contains some highlevel driver functions that 
> are executed for each parser object (_parseNoCache, 
> _parseCache). I think it could be done like this:
> 
> You create a class variable:
>     ParserElement.profileStats = {}
> It maps: 
>     <parser's name> : n_enter, n_success, n_fail, t_cumulative
> 
> Then at the start of _parseNoCache or _parseCache you locate 
> the matching entry,
>     ParserElement.profileStats[self.name]
> increment the the enter counter and store the current time. 
> 
> At the exit points you increase either the success or the 
> failure counters; compute the time spent in the parser and 
> add it to the cumulative time value. 
> 
> At the end of the program you convert the dict to a list and 
> store it to a text file. You should also add a sorting function.
> 
> I think _parseCache looks pretty simple; adding something 
> like my proposed profiling facility seems easy. I haven't 
> looked carefully at anything, nor did I write any code. (I 
> should have maybe, instead of writing this lengthy email.) As 
> packrat parsing gives you no big additional problems I think 
> you should just use  _parseCache because its more easy. 
> 
> I hope this helps you at least somewhat.
> Kind regards,
> Eike.
> 
> --------------------------------------------------------------
> -----------
> This SF.Net email is sponsored by the Moblin Your Move 
> Developer's challenge Build the coolest Linux based 
> applications with Moblin SDK & win great prizes Grand prize 
> is a trip for two to an Open Source event anywhere in the 
> world http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Pyparsing-users mailing list
> Pyp...@li...
> https://lists.sourceforge.net/lists/listinfo/pyparsing-users
>