Re: [Pyparsing] Order problem in binary operation withoperationPrecedence()
Brought to you by:
ptmcg
From: Gustavo N. <me...@gu...> - 2009-07-09 20:14:50
|
Hello, Paul! First of all, thank you very much for your help. :) Paul said: > Gustavo - > > The first issue you have is your definition of identifier0, which you > define using the Unicode regex "\w", but which cannot start with a numeric > digit. Unfortunately, there are a lot more numeric digits in Unicode than > just "0" through "9" - over 300 of them! Here is how I replaced your > identifier0 expression, and got your code to at least start working: > > unidigit = Regex(u"[" + "".join(unichr(c) for c in xrange(0x10000) if > unichr(c).isdigit()) + "]", re.UNICODE) > identifier0 = Combine(~unidigit + Regex("[\w]+", re.UNICODE)) Thanks, I've just replaced my solution (which used a parse action) with this. > Some other comments: > > 1. > > eq = CaselessLiteral("==") > ne = CaselessLiteral("!=") > lt = CaselessLiteral("<") > gt = CaselessLiteral(">") > le = CaselessLiteral("<=") > ge = CaselessLiteral(">=") > > Why are these not just plain Literal's? Making them caseless just makes > pyparsing do extra work trying to match upper or lower case "==". Sorry, I forgot to explain that: In the actual code, I don't use those literals; I use variables because those tokens can be overridden by the user (e.g., the equality token could be "is" instead of "=="), so I want those tokens to be case-insensitive if applicable. > 2. > relationals = eq ^ ne ^ lt ^ gt ^ le ^ ge > > '^' can be relatively expensive for pyparsing if there are many > alternatives, since all alternatives will be checked. This version is > equivalent: > > relationals = eq | ne | le | ge | lt | gt Thanks for the hint. I used pipes initially but for some reason some tests didn't pass... I've just restored the pipes and the all the tests pass now, though :-S > (I *did* have to be careful about the order, having to check for "<=" > before "<", and ">=" before ">".) Could you please elaborate on that? Is it for performance reasons? > 3. > not_ = Suppress("~") > and_ = Suppress("&") > in_or = Suppress("|") > ex_or = Suppress("^") > > Do you really want to suppress these operators? Without them, you will > have a dickens of a time evaluating the parsed expression. These should > probably be Literal's, too. I think so: In the actual code, I have my own parse nodes (for operations and operands), and I use the parse actions to convert Pyparsing's parse trees into my own parse tree. For example, the parse action for and_ looks like this: def make_and(tokens): left_operand = tokens[0][0] right_operand = tokens[0][1] return BooleanoAnd(left_operand, right_operand) So the operator isn't necessary to make the new parse tree. > 4. The parsed results show an empty [] before every identifier, I think > this is to represent the empty leading namespace for each. You might want > to try this for identifier: > > identifier = Combine(namespace.setResultsName("namespace_parts") + > identifier0.setResultsName("identifier")) > > Combine will return the match as a single string, but you can still access > the individual parts of the identifier by their results names if you need > to. (Combine will also ensure that you match only contiguous characters as > an identifier.) Nice, I've just updated my code accordingly. > Looks interesting, keep us posted. Sure! I'm getting closer to the first alpha release, by the way (for the impatient: https://launchpad.net/booleano) ;-) Cheers! -- Gustavo Narea <xri://=Gustavo>. | Tech blog: =Gustavo/(+blog)/tech ~ About me: =Gustavo/about | |