Re: [Pyparsing] Painfully slow parsing.
Brought to you by:
ptmcg
From: Paul M. <pt...@au...> - 2009-08-29 05:00:59
|
> -----Original Message----- > From: John Krukoff [mailto:jkr...@lt...] > Sent: Friday, August 28, 2009 6:09 PM > To: pyp...@li... > Subject: [Pyparsing] Painfully slow parsing. > > Hello, > > I have a serious speed problem with a parser written using pyparsing, > where it's taking ~13 minutes to parse a 30 line file. I'm totally lost > on what might be causing it, as small variations seem to be causing > large differences in parsing time. I was hoping I could get some tips on > general optimization strategies to follow. For instance, I'm suspicious > that I should be trying harder to use the '-' operator, and wonder if > that would help... > John - I've not seen people use '-' as a way to speed up parsing, but I imagine it could help. '?load', '?attribute', and '?element' look like good places where '-' would be a fit (right after the keyword literal). But I am struggling as to where to even begin. You have posted 500+ lines of parser code, without much guidance as to what BNF you are working from, or what you are trying to get from the parser. But you didn't post the 30 line test file, so I have nothing to run your parser with. You already mention that packratting isn't an option, how about psyco? Why do you write this: restartIndentation = pyparsing.Literal( '<' ).setParseAction( lambda s, l, t: push_indent( ) ).suppress( ) resumeIndentation = pyparsing.Literal( '>' ).setParseAction( lambda s, l, t: pop_indent( ) ).suppress( ) Instead of: restartIndentation = pyparsing.Literal( '<' ).setParseAction( push_indent ).suppress( ) resumeIndentation = pyparsing.Literal( '>' ).setParseAction( pop_indent ).suppress( ) Here's an idea: follow your definition of endStatement with this: endStatement.setName("endStatement").setDebug() Re-run your test, and see how much retracing of your steps is going on. You might find that you parse to the end, and then spend most of the time figuring out that you're actually AT the end. This code also looks like a likely performance problem: def create_block( simple, compound ): block = pyparsing.Forward( ) simpleStatement = simple + endStatement compoundStatement = compound + endStatement + pyparsing.Optional( block ) statement = compoundStatement | simpleStatement block << addons.indentedBlock( statement, aIndentations ) block.setParseAction( lambda s, l, t: t[ 0 ] ) return compoundStatement You can try adding some more setName/setDebug calls, to get more insight to how pyparsing is working its way through your grammar. As for getting response to your questions, the mailing list and wiki Discussion tab are about the same, although I think other people besides me are more likely to chime in on the list. This actually bodes well for getting a faster response, as I just started a new job, and am pretty busy trying to get off on a good start. You could also try posting on stackoverflow.com - you might get a response from Alex Martelli himself! Good luck! -- Paul |