Re: [Pyparsing] Order problem in binary operation withoperationPrecedence()
Brought to you by:
ptmcg
From: Paul M. <pt...@au...> - 2009-07-09 14:41:37
|
Gustavo - The first issue you have is your definition of identifier0, which you define using the Unicode regex "\w", but which cannot start with a numeric digit. Unfortunately, there are a lot more numeric digits in Unicode than just "0" through "9" - over 300 of them! Here is how I replaced your identifier0 expression, and got your code to at least start working: unidigit = Regex(u"[" + "".join(unichr(c) for c in xrange(0x10000) if unichr(c).isdigit()) + "]", re.UNICODE) identifier0 = Combine(~unidigit + Regex("[\w]+", re.UNICODE)) Some other comments: 1. eq = CaselessLiteral("==") ne = CaselessLiteral("!=") lt = CaselessLiteral("<") gt = CaselessLiteral(">") le = CaselessLiteral("<=") ge = CaselessLiteral(">=") Why are these not just plain Literal's? Making them caseless just makes pyparsing do extra work trying to match upper or lower case "==". 2. relationals = eq ^ ne ^ lt ^ gt ^ le ^ ge '^' can be relatively expensive for pyparsing if there are many alternatives, since all alternatives will be checked. This version is equivalent: relationals = eq | ne | le | ge | lt | gt (I *did* have to be careful about the order, having to check for "<=" before "<", and ">=" before ">".) 3. not_ = Suppress("~") and_ = Suppress("&") in_or = Suppress("|") ex_or = Suppress("^") Do you really want to suppress these operators? Without them, you will have a dickens of a time evaluating the parsed expression. These should probably be Literal's, too. 4. The parsed results show an empty [] before every identifier, I think this is to represent the empty leading namespace for each. You might want to try this for identifier: identifier = Combine(namespace.setResultsName("namespace_parts") + identifier0.setResultsName("identifier")) Combine will return the match as a single string, but you can still access the individual parts of the identifier by their results names if you need to. (Combine will also ensure that you match only contiguous characters as an identifier.) Looks interesting, keep us posted. -- Paul |