Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

Bug in Each()?

Tal Weiss
2009-09-13
2013-05-14
  • Tal Weiss
    Tal Weiss
    2009-09-13

    I may have found a bug in Each -

    Example text and parser:

        the_input = "Major Tal Weiss"
        parser1 =  (Optional('Tal') + Optional('Weiss')) & Keyword('Major')
        parser2 =  Optional(Optional('Tal') + Optional('Weiss')) & Keyword('Major')
        print parser1.parseString( the_input)
        print parser2.parseString( the_input)

    The results are:

    and the second result is the one that looks correct. I guess the addition of 2 parser expressions of type Optional should return an Optional? Or maybe the Each should just treat it as Optional.
    This is a real scenario I came across…
    I can change the code myself but I'm afraid to break anything. Are there any unit tests to the code (to make sure I don't do more damage than good)?

    Thanks,

    Tal.

     
  • Paul McGuire
    Paul McGuire
    2009-09-13

    If you send me a patch file, I'd be happy to test your change and let you know the outcome.  The Each class does get a bit touchy around Optionals and repetitive classes like OneOrMore and ZeroOrMore.

    Yes, there are a collection of unit tests, but I've not published them because some of them test equipment data files that are proprietary in format.  This has come up often enough, though, that I should go through and publish those that are publishable, so that people can do at least some testing of the changes they propose.

    Thanks,
    - Paul

     
  • Paul McGuire
    Paul McGuire
    2009-09-14

    Well, oddly enough, when I step through your submitted code using the debugger, your unexpected results are actually the correct ones.  You have found an unusual case in which a parser defined as "A & B" behaves differently from "B & A".

    The basic implementation of Each works like this (I've left out special handling if there are OneOrMore or ZeroOrMore expressions):

        unmatched required expressions =
           
        unmatched optional expressions =

        successful_match_order =
       
        while unmatched expressions exist in the list of unordered expressions:
       
            for expression in list of unmatched required expressions
           
                if expression matches at current parse locn
               
                    add expression to successful match order
                    remove expression from unmatched expressions
                    advance parse locn
                   
            (try contained expr for all unmatched optional expressions)
           
        if no unmatched expressions matched at current parse locn
            raise ParseException

    At this point, the `successful_match_order` list contains the correct sequence of the expressions to parse the source text, so the Each code processes this list in order and returns the parsed results.

    So in your case, you provided this Each expression:

        parser1 =  (Optional('Tal') + Optional('Weiss')) & Keyword('Major')

    With this input

        the_input = "Major Tal Weiss"

    The implementation of Each does this:
    - at locn 0, evaluate the first expression, (Optional('Tal') + Optional('Weiss'))
    - this expression matches (both elements *were* optional) so put it on the successful match order
    - evaluate the next expression, Keyword('Major')
    - this expression matches, so put in on the successful match order

    If you were to parse this string with parseAll=True, you would fail, because at this point, the Each has completed its match, but not matched the full input string.

    The reason this works if you wrap the initial expression in an Optional(as in parser2) is more by luck than design.  Each's implementation reorders the expressions to put all of the required expressions first, followed by the optionals.  So the Keyword('Major') matches first, so that by the time the Each implementation gets to matching Optional('Tal')+Optional('Weiss'), the parser location has advanced to the 'T' in 'Tal', so it matches successfully.

    I'm not thrilled that there is a case here where A&B and B&A don't return the same results.  A solution does not jump out at me though.

    • Paul
     
  • Tal Weiss
    Tal Weiss
    2010-01-31

    How about iterating over all permutations (stopping if complete string is parsed)?
    I would do something like this:

    from itertools import permutations
        parser_elements = [(Optional('Tal') + Optional('Weiss')) , Keyword('Major')]
        for parser in permutations(parser_elements):
            grammar = And(parser)
            print grammar.parseString(the_input)
    

    Obviously, I wouldn't want to calculate the And expression every time the grammar is used.

    Can you please just expose the unit tests for the Each element so I could play around with it?
    I promise to contribute the code back when I'm done.

    I thought I managed to work around this issue but it keeps on popping up with my problem.
    I can solve it outside pyparsing but I think the cleanest way would be to fix the Each behavior.

    Thanks,

    Tal.

     
  • Tal Weiss
    Tal Weiss
    2010-01-31

    Which, on second thought, does not make any sentence because my "sentence" has 10 parts and 10! is 3.6M…
    So your approach does make sense, but I still need to better handle Optionals inside Each.
    Any thoughts?

     
  • Tal Weiss
    Tal Weiss
    2010-01-31

    Switching the order of the implementation of Each to:

    tmpExprs = tmpOpt + tmpReqd + self.multioptionals + self.multirequired

    gives better results, but I can still see how it may fail…

     
  • Paul McGuire
    Paul McGuire
    2010-02-01

    So I took a fresh look at your original problem, and I may have a solution.  In any event, I get better results.

    I'm not sure what state your current Each class looks like, so I've copied my current version to this pastebin page: http://pyparsing.pastebin.com/m1ec738f4

    The significant change was this:

                opt1 =
                opt2 =
                self.optionals = opt1 + opt2

    Originally, self.optionals only included the elements in opt1.  I ran this against your original case and now I get the same results for parser1 and parser2.

    How does this look with your actual application?

    • Paul
     
  • Tal Weiss
    Tal Weiss
    2010-02-01

    It looks good! Well, good enough to pass the bug-ball over to my court…
    Thank you!

    I didn't know pastebin - very cool - thanks again!

    Tal.

     
  • Tal Weiss
    Tal Weiss
    2010-02-08

    OK, it still does not look good.

    I thought about the implementation a bit more -
    The current code tries to find good parses (with no exception raised), trying to optimize the parse while putting some thought into the order of the elements in Each. The problem is the every Optional (and also some other elements that may return empty) return  a "success" result (no exception raised) even if they did not parse anything.

    The ideal implementation (assuming infinite resources) is of course to try all permutations of the order of the elements in Each and choose the best parse (assuming longest string parsed is best). The problem is that this is not feasible ( I already have 11 elements in my Each resulting in millions of potential permutations).

    The assumption we need to make is the one element will NOT parse parts of the string that should be parsed by a different element so we don't need to test all the element permutations, thus the current approach of testing the elements one by one and putting successful parsers in the "success list" is good.

    The only problem I have is that the measurement of a successful parse is no exception was raise.
    I think we can get perfect results with the following improvement: a parse is successful if the result location is larger than the current location, meaning that some of the string was parsed. Or maybe even better - compare all elements and choose which element parsed the most (advanced the location most). I'm assuming of course that even the Optionals will parse better than a mandatory parser at some point in the string, if they fit. If they don't, well, they are Optional…

    What do you think?

    If you have tests for the Each module I would be happy to play with it and try to implement this.

    Thanks!

    Tal.