Re: [Pyparsing] Speeding up a parse
Brought to you by:
ptmcg
From: Eike W. <eik...@gm...> - 2008-09-25 20:19:36
|
Hi David! I've read that guessing, which parts of a program need optimizing, is usually impossible. You need to profile your program. There is a profiler built into python: http://docs.python.org/lib/profile.html I have no experience with the profiler, but the basic usage is fairly simple. However, the profiler will probably show you, that most of the time is spent in some function of the Pyparsing library. You expected it anyway, and it doesn't help you very much. You might catch a parse action that consumes much time this way; and you might spot parts of Pyparsing that need optimization. So maybe you should start to write a profiling extension for Pyparsing! I think it is feasible because the class ParserElement contains some highlevel driver functions that are executed for each parser object (_parseNoCache, _parseCache). I think it could be done like this: You create a class variable: ParserElement.profileStats = {} It maps: <parser's name> : n_enter, n_success, n_fail, t_cumulative Then at the start of _parseNoCache or _parseCache you locate the matching entry, ParserElement.profileStats[self.name] increment the the enter counter and store the current time. At the exit points you increase either the success or the failure counters; compute the time spent in the parser and add it to the cumulative time value. At the end of the program you convert the dict to a list and store it to a text file. You should also add a sorting function. I think _parseCache looks pretty simple; adding something like my proposed profiling facility seems easy. I haven't looked carefully at anything, nor did I write any code. (I should have maybe, instead of writing this lengthy email.) As packrat parsing gives you no big additional problems I think you should just use _parseCache because its more easy. I hope this helps you at least somewhat. Kind regards, Eike. |