Menu

Unexpected interaction between Combine and infixNotation

Dave Rawks
2015-12-09
2015-12-10
  • Dave Rawks

    Dave Rawks - 2015-12-09

    After spending a few hours bending my mind to understand pyparsing I've found it to be pretty awesome. I've however been banging my head against this puzzling behavior of the Combine function.

    The documentation lead me to crafting this working bit:

    from pyparsing import *
    ident = Word(alphanums)
    first = ident('first')
    last = ident('last')
    name = first + last
    name.parseString("Bob        Smith")
    (['Bob', 'Smith'], {'last': [('Smith', 1)], 'first': [('Bob', 0)]})
    name = Combine(first + last, joinString=" ", adjacent=False)
    name.parseString("Bob       Smith")
    (['Bob Smith'], {'last': [('Smith', 0)], 'first': [('Bob', 0)]})
    

    You can see that Combine's joinString and adjecent options are used together to provide me a concat'd version of the "name" which has been join by a single whitespace as I intended. However (please excuse the verbosity of this example)

    ~~~~~
    from pyparsing import *
    ident = Word(alphanums)
    boolops = oneOf('and or xor not', caseless=True)
    rolename = ident('rolename')
    roleExpr = infixNotation(
    rolename,
    (boolops, 2, opAssoc.LEFT)

    )('roleexpr')
    Combine(roleExpr, joinString=" ", adjacent=False).parseString('this not that or ( theotherthing xor yetanotherthing)')

    (['thisnotthatortheotherthingxoryetanotherthing'], {'roleexpr': [((['this', 'not', 'that', 'or', (['theotherthing', '
    xor', 'yetanotherthing']
    , {'rolename': [('theotherthing', 0), ('yetanotherthing', 2)]})], {'rolename': [('this', 0),
    ('that', 2)]
    }), 0)]})
    ~~~~~~

    I would expect to get a field which similarly has all the operators and idents concat'd with the joinString to result in a nicely sanitized/normalized version of the input string.

    Alternately, perhaps I'm skinning this cat ina completely wrong way.... Is there a better method to use pyparsing to validate the syntax of arbitrary input and then "lint" it into a normalized equivalent string?

     
    • Paul McGuire

      Paul McGuire - 2015-12-10

      Dave -

      Knowing that infixNotation will return a nested structure, I knew right away that there would need to be a few things to happen:
      - we would need to flatten the structure
      - we would need to reinsert the grouping '()'s

      Combine really is only appropriate for collapsing 2 or more primitive/terminal expressions - it really gets lost if given a structure. So I think we'll need to post-process the parsed data. (This could be done in a parse action if it has to be part of a bigger parser.)

      I borrowed the internal __flatten() method from pyparsing used by transformString, and inserted ()'s around any nested list. Then I fixed some issues with your simple example, mostly that 'not' is not a binary operator, but a right-associative unary op. Just to demonstrate the canonicalization possible when representing a more detailed hierarchy in the operators argument to infixNotation, here is your demo, with some better output:

      def flatten(L):
          ret = []
          for i in L:
              if isinstance(i,list):
                  ret.append('(')
                  ret.extend(flatten(i))
                  ret.append(')')
              else:
                  ret.append(i)
          return ret
      
      roleExpr = infixNotation(rolename, 
          [
          ('not', 1, opAssoc.RIGHT),
          ('xor', 2, opAssoc.LEFT),
          ('and', 2, opAssoc.LEFT),
          ('or', 2, opAssoc.LEFT),
          ]
          )
      
      print ' '.join(flatten(roleExpr.parseString(test_string).asList()))
      

      Prints:

      ( ( this and ( not that ) ) or ( theotherthing xor yetanotherthing ) )
      

      What do you think?

      -- Paul

       

Log in to post a comment.