Thread: [Pyparsing] Order problem in binary operation with operationPrecedence()
Brought to you by:
ptmcg
From: Gustavo N. <me...@gu...> - 2009-07-06 20:00:27
|
Hello again, everybody. I've defined a Pyparsing grammar for boolean expressions, based on operationPrecedence() and I have an strange problem. This grammar has five types of operands: 3 literals (string, number, set) and 2 non-literals (variable, function). The following expressions work like a charm: * "pi == 3.1416" * "pi > 3.1416" * "pi < 3.1416" * "pi => 3.1416" * "pi <= 3.1416" But if I rearrange the operands, I'd get an exception (like "Expected end of text (at char 7), (line:1, col:8)"): * "3.1416 == pi" * "3.1416 < pi" * "3.1416 > pi" * etc. Every other combination of operands work, AFAIK, except when the first operand in a binary operation is a number: * "hello" in {"hi", "bye", "hello", "good-bye"} * "e < pi" * "pi > e" What's going wrong? I've attached an excerpt from my library for you to reproduce it easily. Thanks in advance! -- Gustavo Narea <xri://=Gustavo>. | Tech blog: =Gustavo/(+blog)/tech ~ About me: =Gustavo/about | |
From: Paul M. <pt...@au...> - 2009-07-06 20:08:37
|
> > I've defined a Pyparsing grammar for boolean expressions, based on > operationPrecedence() and I have an strange problem. > > <snip> > > What's going wrong? I've attached an excerpt from my library for you to > reproduce it easily. > Gustavo - I'm afraid your attachments didn't make it through. Could you paste your code extract to http://pyparsing.pastebin.com and send us the link? -- Paul |
From: Gustavo N. <me...@gu...> - 2009-07-07 16:33:21
|
Hello, Paul! Paul said: > I'm afraid your attachments didn't make it through. Could you paste your > code extract to http://pyparsing.pastebin.com and send us the link? I'm sorry about that. Here it is: http://pyparsing.pastebin.com/f7fa3659d Thanks in advance, -- Gustavo Narea <xri://=Gustavo>. | Tech blog: =Gustavo/(+blog)/tech ~ About me: =Gustavo/about | |
From: Paul M. <pt...@au...> - 2009-07-09 14:41:37
|
Gustavo - The first issue you have is your definition of identifier0, which you define using the Unicode regex "\w", but which cannot start with a numeric digit. Unfortunately, there are a lot more numeric digits in Unicode than just "0" through "9" - over 300 of them! Here is how I replaced your identifier0 expression, and got your code to at least start working: unidigit = Regex(u"[" + "".join(unichr(c) for c in xrange(0x10000) if unichr(c).isdigit()) + "]", re.UNICODE) identifier0 = Combine(~unidigit + Regex("[\w]+", re.UNICODE)) Some other comments: 1. eq = CaselessLiteral("==") ne = CaselessLiteral("!=") lt = CaselessLiteral("<") gt = CaselessLiteral(">") le = CaselessLiteral("<=") ge = CaselessLiteral(">=") Why are these not just plain Literal's? Making them caseless just makes pyparsing do extra work trying to match upper or lower case "==". 2. relationals = eq ^ ne ^ lt ^ gt ^ le ^ ge '^' can be relatively expensive for pyparsing if there are many alternatives, since all alternatives will be checked. This version is equivalent: relationals = eq | ne | le | ge | lt | gt (I *did* have to be careful about the order, having to check for "<=" before "<", and ">=" before ">".) 3. not_ = Suppress("~") and_ = Suppress("&") in_or = Suppress("|") ex_or = Suppress("^") Do you really want to suppress these operators? Without them, you will have a dickens of a time evaluating the parsed expression. These should probably be Literal's, too. 4. The parsed results show an empty [] before every identifier, I think this is to represent the empty leading namespace for each. You might want to try this for identifier: identifier = Combine(namespace.setResultsName("namespace_parts") + identifier0.setResultsName("identifier")) Combine will return the match as a single string, but you can still access the individual parts of the identifier by their results names if you need to. (Combine will also ensure that you match only contiguous characters as an identifier.) Looks interesting, keep us posted. -- Paul |
From: Gustavo N. <me...@gu...> - 2009-07-09 20:14:50
|
Hello, Paul! First of all, thank you very much for your help. :) Paul said: > Gustavo - > > The first issue you have is your definition of identifier0, which you > define using the Unicode regex "\w", but which cannot start with a numeric > digit. Unfortunately, there are a lot more numeric digits in Unicode than > just "0" through "9" - over 300 of them! Here is how I replaced your > identifier0 expression, and got your code to at least start working: > > unidigit = Regex(u"[" + "".join(unichr(c) for c in xrange(0x10000) if > unichr(c).isdigit()) + "]", re.UNICODE) > identifier0 = Combine(~unidigit + Regex("[\w]+", re.UNICODE)) Thanks, I've just replaced my solution (which used a parse action) with this. > Some other comments: > > 1. > > eq = CaselessLiteral("==") > ne = CaselessLiteral("!=") > lt = CaselessLiteral("<") > gt = CaselessLiteral(">") > le = CaselessLiteral("<=") > ge = CaselessLiteral(">=") > > Why are these not just plain Literal's? Making them caseless just makes > pyparsing do extra work trying to match upper or lower case "==". Sorry, I forgot to explain that: In the actual code, I don't use those literals; I use variables because those tokens can be overridden by the user (e.g., the equality token could be "is" instead of "=="), so I want those tokens to be case-insensitive if applicable. > 2. > relationals = eq ^ ne ^ lt ^ gt ^ le ^ ge > > '^' can be relatively expensive for pyparsing if there are many > alternatives, since all alternatives will be checked. This version is > equivalent: > > relationals = eq | ne | le | ge | lt | gt Thanks for the hint. I used pipes initially but for some reason some tests didn't pass... I've just restored the pipes and the all the tests pass now, though :-S > (I *did* have to be careful about the order, having to check for "<=" > before "<", and ">=" before ">".) Could you please elaborate on that? Is it for performance reasons? > 3. > not_ = Suppress("~") > and_ = Suppress("&") > in_or = Suppress("|") > ex_or = Suppress("^") > > Do you really want to suppress these operators? Without them, you will > have a dickens of a time evaluating the parsed expression. These should > probably be Literal's, too. I think so: In the actual code, I have my own parse nodes (for operations and operands), and I use the parse actions to convert Pyparsing's parse trees into my own parse tree. For example, the parse action for and_ looks like this: def make_and(tokens): left_operand = tokens[0][0] right_operand = tokens[0][1] return BooleanoAnd(left_operand, right_operand) So the operator isn't necessary to make the new parse tree. > 4. The parsed results show an empty [] before every identifier, I think > this is to represent the empty leading namespace for each. You might want > to try this for identifier: > > identifier = Combine(namespace.setResultsName("namespace_parts") + > identifier0.setResultsName("identifier")) > > Combine will return the match as a single string, but you can still access > the individual parts of the identifier by their results names if you need > to. (Combine will also ensure that you match only contiguous characters as > an identifier.) Nice, I've just updated my code accordingly. > Looks interesting, keep us posted. Sure! I'm getting closer to the first alpha release, by the way (for the impatient: https://launchpad.net/booleano) ;-) Cheers! -- Gustavo Narea <xri://=Gustavo>. | Tech blog: =Gustavo/(+blog)/tech ~ About me: =Gustavo/about | |
From: Ralph C. <ra...@in...> - 2009-07-11 12:43:46
|
hi Gustavo, > > 2. > > relationals = eq ^ ne ^ lt ^ gt ^ le ^ ge > > > > '^' can be relatively expensive for pyparsing if there are many > > alternatives, since all alternatives will be checked. This version > > is equivalent: > > > > relationals = eq | ne | le | ge | lt | gt > > Thanks for the hint. I used pipes initially but for some reason some > tests didn't pass... I've just restored the pipes and the all the > tests pass now, though :-S > > > (I *did* have to be careful about the order, having to check for > > "<=" before "<", and ">=" before ">".) > > Could you please elaborate on that? Is it for performance reasons? `^' causes all alternatives to be tested, `|' stops at the first one that succeeds. If using `|', unless you test for `<=' before `<' then `<=' would never be seen because it would always look like a `<'. With `^', I assume the longest one that succeeds is used so both `<' and `<=' match but the latter is used. Cheers, Ralph. |
From: Gustavo N. <me...@gu...> - 2009-07-12 11:09:17
|
Hello, Ralph. Ralph said: > `^' causes all alternatives to be tested, `|' stops at the first one > that succeeds. If using `|', unless you test for `<=' before `<' then > `<=' would never be seen because it would always look like a `<'. With > `^', I assume the longest one that succeeds is used so both `<' and `<=' > match but the latter is used. Hmm, OK, I got it now. Thank you very much for the explanation! ;-) Cheers. -- Gustavo Narea <xri://=Gustavo>. | Tech blog: =Gustavo/(+blog)/tech ~ About me: =Gustavo/about | |