Re: [Pyparsing] Better way than operatorPrecedence to parse a regexp-like grammar?
Brought to you by:
ptmcg
From: spir <den...@fr...> - 2009-02-26 07:50:12
|
Le Wed, 25 Feb 2009 23:14:44 -0700, Andrew Warkentin <and...@gm...> s'exprima ainsi: > I am writing a proxy (http://xuproxy.sourceforge.net/) that includes > header and content filters that use a regexp-like matching language > (basically a hybrid of globs and regexps). The parser for this matching > language is implemented using pyparsing (the actual matching code does > not use pyparsing, only the parser for the patterns themselves). > Currently, it uses the operatorPrecedence function (actually, a custom > version of it, modified to deal with certain features of the matching > language). There are two problems with this, though. The first is that > it is slow (up to a second or more on complex patterns), even with > packrat parsing enabled. The second is that very complex patterns cause > the maximum recursion depth to be exceeded. Is there a way to parse a > regexp-like language with pyparsing that is faster and involves less > recursion than operatorPrecedence? Maybe it is a stupid -- anyway ;-) Have you considered simply letting down operator precedence for your language? In favor of sequential semantics + (gouping). I had a similar problem and this worked fine for me (actually I just did not wish to cope with operator precedence, I had no perfomance issue), because: * Many expressions did not involve operators of different precedence level * Many where logically commutative, meaning one could rewrite "a | b c" to "b c | a" when "b c" cannot be a prefix of "a" * I often simply wrote intermediate (sub) expressions so that composite ones hold higher level non-terminals in which there is no more precedence: p = a b | c d ==> p1 = a b p2 = c d p = p1 | p2 * In other cases writing () was not such a burden. I may be wrong but I think operator precedence is inherently time-consuming precisely because it involves recursion. As recursion cannot be done at the same location in the input string (otherwise infinite loop -- the reason why left recursion is not possible), then packrat memoïzing will not help with this one aspect of the issue. As I understand it, it can help only in such case: a b | a c Then when b fails, a will not be evaluated twice. (Please tell me if I'm wrong.) Denis ------ la vita e estrany |