Re: [Pyparsing] packrat bug + various

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Paul,
	thank you for your reply

Paul McGuire wrote:
> Paolo -
> 
> Thanks for the whitespace bugfix.  Did you have a test case that raised this
> problem in the first place?  If so, I'd like to add it to my unit tests.

Attached...

> Packrat parsing has issues besides the ParseResults problem you describe.
> Some parse actions have important side effects that get omitted when results
> are retrieved from a cache instead of being explicitly reparsed.  The
> packrat concept showed such promise I couldn't pass it up, but I had to
> leave it disabled by default, so as not to break code for existing
> applications.  This is an especially interesting phenomenon for me; as far
> as pure theoretical parsing is concerned, packratting is a no-brainer.  But
> pyparsing's implementation, including parse actions, goes beyond the pure
> parsing function, and in the process breaks the packrat concept.

I crashed against the parseAction problem as well after posting the 
first email.

I agree with you that packratting requires a deep review of pyparsing:
I just tried to refactor _parseNoCache in order to handle doAction flag 
cleanly but I've given up :-)
Why not adding doAction to the lookup key?

I attach a proof of concept patch ("patch_fix_packrat.diff") that seems
to fix all my cases...

> In upgrading your app to be able to use packratting, is it possible that
> your grammar could be restructured?  For instance, in your example, if you
> change:
>     e = A ^ ( A + B + C )
> to:
>     e2 = A + Optional(B + C)
> then packratting does no harm.  I'm sure this is a simplified example for
> the sake of easy posting, but maybe similar mods could be made to your base
> app grammar.

Not easy at all... it would require review tons of code :-)

> Performance enhancements is a sore subject for me.  Pyparsing has gotten
> some pretty nasty comments with respect to its parsing speed, so I admit I
> committed a number of coding sins in trying to eke out a little more speed
> at parsing time.  I've also been reluctant to port parts of pyparsing to C,
> as I think this complicates its inclusion in other projects, and also its
> portability.  

IMO, the real value of pyparsing is not speed but predictability, 
elagance, flexibility, easy of use... speed should be a second priority 
goal.

> Here are the main performance enhancement techniques, in rough descending
> order of importance, as I recall (not counting packratting):
> - cacheing of exception objects - pyparsing raises exceptions a *lot* as
> various paths are tried and rejected.  I optimized the creation of
> exceptions as much as possible, and then realized that I could keep raising
> the same exception object.

This breaks threading as well...

> - using startswith in Literal instead of comparing the literal string with a
> string slice. Literals are the most common terminal, and so
> Literal.parseImpl has to be fast.

Ok

> - ParseResults reuse with __new__ and __iadd__.  There are non-performance
> reasons for doing this also, mostly having to do with gathering results from
> the exprs in an And.

My proof of concept patch remove the functional reason. Do you agree?

> - avoidance of try/except - one of the ugliest parts of the _parse method is
> the repetition of code, depending on whether debugging methods are defined
> or not.  But setting up the try/except has a cost, and _parse is about as
> high-traffic a method as there is.

Ok

> - using re's or dict keys for Word.parseImpl testing with initChars and
> bodyChars

Ok.

Which is the overall performance improvement you got from exception 
caching, parseresults reuse and _parse() optimizations?

> But pyparsing seems to be stabilizing - the last release was about 7 weeks
> ago, and I've made only minor changes since then.  Perhaps pyparsing could
> move forward as a pair of releases, one pure Python and one C-enhanced
> (similar to elementTree or StringIO).  

That exactly what I was thinking to. Have a clean, portable, reference 
pure python implementation and add C optimizations...

> I've done *no* Python+C development,
> so this is the other barrier for me to doing such a task.  I *have* put in
> way too much time on pyparsing, at the expense of some family and work time,
> so I'm not eager to take on a major rewrite just now.

I understand perfectly. I'm very busy as well and cannot make promises.
If you have a "benchmark" grammar and test cases that you think are 
representative I'll be glad of work on them in my spare time.

> If I had a multithreaded app that used pyparsing, I would probably serialize
> access to the parser object.  I don't want to make the resetCache invocation
> explicit, since the need for it is kind of obscure, and so likely to be
> overlooked.

I agree.

> Thanks for your persistence with pyparsing, Paolo, I know you have been
> working with it for a while.  (We never did get to try adding the GLR
> mode...).

I tried. I come to almost complete ugly working solution (parseImpl is 
generator of solutions) but it was difficult to build predictable 
grammar with it. I'm now using with great satisfaction the stock 
pyparsing. ;-)

	Ciao e Grazie
	Paolo