Re: [Pyparsing] Speeding up a parse
Brought to you by:
ptmcg
From: <dav...@l-...> - 2008-09-25 20:26:32
|
Well, it's something that I was actually looking into doing. I profiled my parse actions by hand (using a stopwatch class), and when the code gets handed off to me, it's actually a really small amount of time that I use it (ms times usually). I did try psyco, and it worked great, but I had to disable the "keepOriginalText" chunk of my code, because it imports Inspect, which hoses psyco (known psyco issue). As a matter of fact, it gave me almost a 100% speedup, which is fantastic, and doesn't require much code. But I lose the ability to keep the source. I think some more investigation of this would be a bit more in my favor, as psyco is a "known" method to speed up, and frankly did an amazing job when I got it working. --dw > -----Original Message----- > From: Eike Welk [mailto:eik...@gm...] > Sent: Thursday, September 25, 2008 3:19 PM > To: pyp...@li... > Subject: Re: [Pyparsing] Speeding up a parse > > Hi David! > > I've read that guessing, which parts of a program need > optimizing, is usually impossible. You need to profile your > program. There is a profiler built into python: > http://docs.python.org/lib/profile.html > > I have no experience with the profiler, but the basic usage > is fairly simple. > > However, the profiler will probably show you, that most of > the time is spent in some function of the Pyparsing library. > You expected it anyway, and it doesn't help you very much. > You might catch a parse action that consumes much time this > way; and you might spot parts of Pyparsing that need optimization. > > So maybe you should start to write a profiling extension for > Pyparsing! I think it is feasible because the class > ParserElement contains some highlevel driver functions that > are executed for each parser object (_parseNoCache, > _parseCache). I think it could be done like this: > > You create a class variable: > ParserElement.profileStats = {} > It maps: > <parser's name> : n_enter, n_success, n_fail, t_cumulative > > Then at the start of _parseNoCache or _parseCache you locate > the matching entry, > ParserElement.profileStats[self.name] > increment the the enter counter and store the current time. > > At the exit points you increase either the success or the > failure counters; compute the time spent in the parser and > add it to the cumulative time value. > > At the end of the program you convert the dict to a list and > store it to a text file. You should also add a sorting function. > > I think _parseCache looks pretty simple; adding something > like my proposed profiling facility seems easy. I haven't > looked carefully at anything, nor did I write any code. (I > should have maybe, instead of writing this lengthy email.) As > packrat parsing gives you no big additional problems I think > you should just use _parseCache because its more easy. > > I hope this helps you at least somewhat. > Kind regards, > Eike. > > -------------------------------------------------------------- > ----------- > This SF.Net email is sponsored by the Moblin Your Move > Developer's challenge Build the coolest Linux based > applications with Moblin SDK & win great prizes Grand prize > is a trip for two to an Open Source event anywhere in the > world http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Pyparsing-users mailing list > Pyp...@li... > https://lists.sourceforge.net/lists/listinfo/pyparsing-users > |