Re: [Pyparsing] Painfully slow parsing.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> -----Original Message-----
> From: John Krukoff [mailto:jkr...@lt...]
> Sent: Friday, August 28, 2009 6:09 PM
> To: pyp...@li...
> Subject: [Pyparsing] Painfully slow parsing.
> 
> Hello,
> 
> I have a serious speed problem with a parser written using pyparsing,
> where it's taking ~13 minutes to parse a 30 line file. I'm totally lost
> on what might be causing it, as small variations seem to be causing
> large differences in parsing time. I was hoping I could get some tips on
> general optimization strategies to follow. For instance, I'm suspicious
> that I should be trying harder to use the '-' operator, and wonder if
> that would help...
> 

John -

I've not seen people use '-' as a way to speed up parsing, but I imagine it
could help.  '?load', '?attribute', and '?element' look like good places
where '-' would be a fit (right after the keyword literal).

But I am struggling as to where to even begin.  You have posted 500+ lines
of parser code, without much guidance as to what BNF you are working from,
or what you are trying to get from the parser.  But you didn't post the 30
line test file, so I have nothing to run your parser with.

You already mention that packratting isn't an option, how about psyco?

Why do you write this:
restartIndentation = pyparsing.Literal( '<' ).setParseAction( lambda s, l,
t: push_indent( ) ).suppress( )
resumeIndentation = pyparsing.Literal( '>' ).setParseAction( lambda s, l, t:
pop_indent( ) ).suppress( )

Instead of:

restartIndentation = pyparsing.Literal( '<' ).setParseAction( push_indent
).suppress( )
resumeIndentation = pyparsing.Literal( '>' ).setParseAction( pop_indent
).suppress( )

Here's an idea: follow your definition of endStatement with this:

endStatement.setName("endStatement").setDebug()

Re-run your test, and see how much retracing of your steps is going on.  You
might find that you parse to the end, and then spend most of the time
figuring out that you're actually AT the end.

This code also looks like a likely performance problem:

	def create_block( simple, compound ):
		block = pyparsing.Forward( )
		simpleStatement = simple + endStatement
		compoundStatement = compound + endStatement +
pyparsing.Optional( block )
		statement = compoundStatement | simpleStatement
		block << addons.indentedBlock( statement, aIndentations )
		block.setParseAction( lambda s, l, t: t[ 0 ] )
		return compoundStatement

You can try adding some more setName/setDebug calls, to get more insight to
how pyparsing is working its way through your grammar.

As for getting response to your questions, the mailing list and wiki
Discussion tab are about the same, although I think other people besides me
are more likely to chime in on the list.  This actually bodes well for
getting a faster response, as I just started a new job, and am pretty busy
trying to get off on a good start.  You could also try posting on
stackoverflow.com - you might get a response from Alex Martelli himself!

Good luck!
-- Paul