Thread: [Pyparsing] Order problem in binary operation with operationPrecedence()

Brought to you by: ptmcg

pyparsing-users

[Pyparsing] Order problem in binary operation with operationPrecedence()

From: Gustavo N. <me...@gu...> - 2009-07-06 20:00:27

Hello again, everybody.

I've defined a Pyparsing grammar for boolean expressions, based on 
operationPrecedence() and I have an strange problem.

This grammar has five types of operands: 3 literals (string, number, set) and 
2 non-literals (variable, function).

The following expressions work like a charm:
 * "pi == 3.1416"
 * "pi > 3.1416"
 * "pi < 3.1416"
 * "pi => 3.1416"
 * "pi <= 3.1416"

But if I rearrange the operands, I'd get an exception (like "Expected end of 
text (at char 7), (line:1, col:8)"):
 * "3.1416 == pi"
 * "3.1416 < pi"
 * "3.1416 > pi"
 * etc.

Every other combination of operands work, AFAIK, except when the first operand 
in a binary operation is a number:
 * "hello" in {"hi", "bye", "hello", "good-bye"}
 * "e < pi"
 * "pi > e"

What's going wrong? I've attached an excerpt from my library for you to 
reproduce it easily.

Thanks in advance!
-- 
Gustavo Narea <xri://=Gustavo>.
| Tech blog: =Gustavo/(+blog)/tech  ~  About me: =Gustavo/about |

Re: [Pyparsing] Order problem in binary operation withoperationPrecedence()

From: Paul M. <pt...@au...> - 2009-07-06 20:08:37

> 
> I've defined a Pyparsing grammar for boolean expressions, based on
> operationPrecedence() and I have an strange problem.
> 
> <snip>
>
> What's going wrong? I've attached an excerpt from my library for you to
> reproduce it easily.
> 

Gustavo -

I'm afraid your attachments didn't make it through.  Could you paste your
code extract to http://pyparsing.pastebin.com and send us the link?

-- Paul

Re: [Pyparsing] Order problem in binary operation withoperationPrecedence()

From: Gustavo N. <me...@gu...> - 2009-07-07 16:33:21

Hello, Paul!

Paul said:
> I'm afraid your attachments didn't make it through.  Could you paste your
> code extract to http://pyparsing.pastebin.com and send us the link?

I'm sorry about that. Here it is:
http://pyparsing.pastebin.com/f7fa3659d

Thanks in advance,
-- 
Gustavo Narea <xri://=Gustavo>.
| Tech blog: =Gustavo/(+blog)/tech  ~  About me: =Gustavo/about |

Re: [Pyparsing] Order problem in binary operation withoperationPrecedence()

From: Paul M. <pt...@au...> - 2009-07-09 14:41:37

Gustavo -

The first issue you have is your definition of identifier0, which you define
using the Unicode regex "\w", but which cannot start with a numeric digit.
Unfortunately, there are a lot more numeric digits in Unicode than just "0"
through "9" - over 300 of them!  Here is how I replaced your identifier0
expression, and got your code to at least start working:

    unidigit = Regex(u"[" + "".join(unichr(c) for c in xrange(0x10000) if
unichr(c).isdigit()) + "]", re.UNICODE)
    identifier0 = Combine(~unidigit + Regex("[\w]+", re.UNICODE))


Some other comments:

1. 

    eq = CaselessLiteral("==")
    ne = CaselessLiteral("!=")
    lt = CaselessLiteral("<")
    gt = CaselessLiteral(">")
    le = CaselessLiteral("<=")
    ge = CaselessLiteral(">=")

Why are these not just plain Literal's?  Making them caseless just makes
pyparsing do extra work trying to match upper or lower case "==".


2. 
    relationals = eq ^ ne ^ lt ^ gt ^ le ^ ge

'^' can be relatively expensive for pyparsing if there are many
alternatives, since all alternatives will be checked.  This version is
equivalent:

    relationals = eq | ne | le | ge | lt | gt

(I *did* have to be careful about the order, having to check for "<=" before
"<", and ">=" before ">".)


3. 
    not_ = Suppress("~")
    and_ = Suppress("&")
    in_or = Suppress("|")
    ex_or = Suppress("^")

Do you really want to suppress these operators?  Without them, you will have
a dickens of a time evaluating the parsed expression.  These should probably
be Literal's, too.


4. The parsed results show an empty [] before every identifier, I think this
is to represent the empty leading namespace for each.  You might want to try
this for identifier:

    identifier = Combine(namespace.setResultsName("namespace_parts") +
                  identifier0.setResultsName("identifier"))

Combine will return the match as a single string, but you can still access
the individual parts of the identifier by their results names if you need
to.  (Combine will also ensure that you match only contiguous characters as
an identifier.)


Looks interesting, keep us posted.

-- Paul

Re: [Pyparsing] Order problem in binary operation withoperationPrecedence()

From: Gustavo N. <me...@gu...> - 2009-07-09 20:14:50

Hello, Paul!

First of all, thank you very much for your help. :)

Paul said:
> Gustavo -
>
> The first issue you have is your definition of identifier0, which you
> define using the Unicode regex "\w", but which cannot start with a numeric
> digit. Unfortunately, there are a lot more numeric digits in Unicode than
> just "0" through "9" - over 300 of them!  Here is how I replaced your
> identifier0 expression, and got your code to at least start working:
>
>     unidigit = Regex(u"[" + "".join(unichr(c) for c in xrange(0x10000) if
> unichr(c).isdigit()) + "]", re.UNICODE)
>     identifier0 = Combine(~unidigit + Regex("[\w]+", re.UNICODE))
 
Thanks, I've just replaced my solution (which used a parse action) with this.


> Some other comments:
>
> 1.
>
>     eq = CaselessLiteral("==")
>     ne = CaselessLiteral("!=")
>     lt = CaselessLiteral("<")
>     gt = CaselessLiteral(">")
>     le = CaselessLiteral("<=")
>     ge = CaselessLiteral(">=")
>
> Why are these not just plain Literal's?  Making them caseless just makes
> pyparsing do extra work trying to match upper or lower case "==".

Sorry, I forgot to explain that: In the actual code, I don't use those 
literals; I use variables because those tokens can be overridden by the user 
(e.g., the equality token could be "is" instead of "=="), so I want those 
tokens to be case-insensitive if applicable.


> 2.
>     relationals = eq ^ ne ^ lt ^ gt ^ le ^ ge
>
> '^' can be relatively expensive for pyparsing if there are many
> alternatives, since all alternatives will be checked.  This version is
> equivalent:
>
>     relationals = eq | ne | le | ge | lt | gt

Thanks for the hint. I used pipes initially but for some reason some tests 
didn't pass... I've just restored the pipes and the all the tests pass now, 
though :-S


> (I *did* have to be careful about the order, having to check for "<="
> before "<", and ">=" before ">".)

Could you please elaborate on that? Is it for performance reasons?


> 3.
>     not_ = Suppress("~")
>     and_ = Suppress("&")
>     in_or = Suppress("|")
>     ex_or = Suppress("^")
>
> Do you really want to suppress these operators?  Without them, you will
> have a dickens of a time evaluating the parsed expression.  These should
> probably be Literal's, too.

I think so: In the actual code, I have my own parse nodes (for operations and 
operands), and I use the parse actions to convert Pyparsing's parse trees into 
my own parse tree.

For example, the parse action for and_ looks like this:
    def make_and(tokens):
        left_operand = tokens[0][0]
        right_operand = tokens[0][1]
        return BooleanoAnd(left_operand, right_operand)

So the operator isn't necessary to make the new parse tree.


> 4. The parsed results show an empty [] before every identifier, I think
> this is to represent the empty leading namespace for each.  You might want
> to try this for identifier:
>
>     identifier = Combine(namespace.setResultsName("namespace_parts") +
>                   identifier0.setResultsName("identifier"))
>
> Combine will return the match as a single string, but you can still access
> the individual parts of the identifier by their results names if you need
> to.  (Combine will also ensure that you match only contiguous characters as
> an identifier.)

Nice, I've just updated my code accordingly.


> Looks interesting, keep us posted.

Sure! I'm getting closer to the first alpha release, by the way (for the 
impatient: https://launchpad.net/booleano) ;-)

Cheers!
-- 
Gustavo Narea <xri://=Gustavo>.
| Tech blog: =Gustavo/(+blog)/tech  ~  About me: =Gustavo/about |

Re: [Pyparsing] Order problem in binary operation withoperationPrecedence()

From: Ralph C. <ra...@in...> - 2009-07-11 12:43:46

hi Gustavo,

> > 2.
> >     relationals = eq ^ ne ^ lt ^ gt ^ le ^ ge
> >
> > '^' can be relatively expensive for pyparsing if there are many
> > alternatives, since all alternatives will be checked.  This version
> > is equivalent:
> >
> >     relationals = eq | ne | le | ge | lt | gt
> 
> Thanks for the hint. I used pipes initially but for some reason some
> tests didn't pass... I've just restored the pipes and the all the
> tests pass now, though :-S
> 
> > (I *did* have to be careful about the order, having to check for
> > "<=" before "<", and ">=" before ">".)
> 
> Could you please elaborate on that? Is it for performance reasons?

`^' causes all alternatives to be tested, `|' stops at the first one
that succeeds.  If using `|', unless you test for `<=' before `<' then
`<=' would never be seen because it would always look like a `<'.  With
`^', I assume the longest one that succeeds is used so both `<' and `<='
match but the latter is used.

Cheers,


Ralph.

Re: [Pyparsing] Order problem in binary operation withoperationPrecedence()

From: Gustavo N. <me...@gu...> - 2009-07-12 11:09:17

Hello, Ralph.

Ralph said:
> `^' causes all alternatives to be tested, `|' stops at the first one
> that succeeds.  If using `|', unless you test for `<=' before `<' then
> `<=' would never be seen because it would always look like a `<'.  With
> `^', I assume the longest one that succeeds is used so both `<' and `<='
> match but the latter is used.

Hmm, OK, I got it now. Thank you very much for the explanation! ;-)

Cheers.
-- 
Gustavo Narea <xri://=Gustavo>.
| Tech blog: =Gustavo/(+blog)/tech  ~  About me: =Gustavo/about |