Menu

How to use Optional() properly?

mas ibro
2009-04-15
2013-05-14
  • mas ibro

    mas ibro - 2009-04-15

    Hi, I'm new to pyparsing (and parsing business) and I'm trying to parse directory schema, but got stuck in using Optional().

    Here are the code:

    [[code]]
    from pyparsing import *

    number = Combine(Word(nums) + Optional("." + Word(nums)))
    oid = Group(Combine(OneOrMore(Word(nums)+".")) + Word(nums) + Optional("{" + Group(Word(nums)) + "}"))
    identifier= Word(alphas)
    singval = identifier ^ number ^ sglQuotedString ^ oid
    multival = "(" + Group(singval + ZeroOrMore(Optional("$") + singval)) + ")"
    attrval = singval ^ multival
    capitals = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    attrname = Word(capitals)
    attr = Group(attrname + Optional(attrval))
    decltype = Word(alphas)
    schemaentry = decltype + "(" + oid + Group(OneOrMore(attr)) + ")"

    teststr1 = """\
    attributetype ( 2.5.4.27 NAME 'destinationIndicator'
            DESC 'RFC2256: destination indicator'
            EQUALITY caseIgnoreMatch
            SUBSTR caseIgnoreSubstringsMatch
            SYNTAX 1.3.6.1.4.1.1466.115.121.1.44{128} )
    """

    teststr2 = """\
    objectclass ( 2.5.6.19 NAME 'cRLDistributionPoint'
            SUP top STRUCTURAL
            MUST ( cn )
            MAY ( certificateRevocationList $ authorityRevocationList $
                    deltaRevocationList ) )
    """

    moo = (schemaentry + stringEnd).parseString(teststr1)
    print moo

    moo = (schemaentry + stringEnd).parseString(teststr2)
    print moo
    [[code]]
    ===================================================

    It works on teststr1 but throws 'pyparsing.ParseException: Expected ")" (at char 91), (line:3, col:14)' on teststr2. (SUP top STRUCTURAL)

    Am I doing wrong at using the Optional() ?

     
    • mas ibro

      mas ibro - 2009-04-15

      Forgot to say,

      "It's a cool stuff!"

      -- Another satisfied pyparsing user

       
    • Paul McGuire

      Paul McGuire - 2009-04-15

      I don't think Optional is necessarily a problem.  Here are some tips on debugging your parser.

      Use setName and setDebug on your parser subexpressions.  This will help you identify when an expression matches and what tokens it matches.  This is usually informative and/or surprising.

      I inserted this code in your program just before the second call to parseString to set names and debug on a number of variables:
      [code]
      for varname in "oid identifier singval multival attrval number attrname attr schemaentry decltype".split():
          vars()[varname].setName(varname)
          vars()[varname].setDebug()
      [/code]

      This gives the following output:
      [code]
      Match schemaentry at loc 0(1,1)
      Match decltype at loc 0(1,1)
      Matched decltype -> ['objectclass']
      Match oid at loc 13(1,14)
      Matched oid -> [['2.5.6.', '19']]
      Match attr at loc 23(1,24)
      Match attrname at loc 23(1,24)
      Matched attrname -> ['NAME']
      Match attrval at loc 27(1,28)
      Match identifier at loc 27(1,28)
      Exception raised:Expected W:(abcd...) (at char 28), (line:1, col:29)
      Match number at loc 27(1,28)
      Exception raised:Expected W:(0123...) (at char 28), (line:1, col:29)
      Match oid at loc 27(1,28)
      Exception raised:Expected W:(0123...) (at char 28), (line:1, col:29)
      Match multival at loc 27(1,28)
      Exception raised:Expected "(" (at char 28), (line:1, col:29)
      Matched attrval -> ["'cRLDistributionPoint'"]
      Matched attr -> [['NAME', "'cRLDistributionPoint'"]]
      Match attr at loc 50(1,51)
      Match attrname at loc 52(2,1)
      Matched attrname -> ['SUP']
      Match attrval at loc 55(2,4)
      Match identifier at loc 55(2,4)
      Matched identifier -> ['top']
      Match number at loc 55(2,4)
      Exception raised:Expected W:(0123...) (at char 56), (line:2, col:5)
      Match oid at loc 55(2,4)
      Exception raised:Expected W:(0123...) (at char 56), (line:2, col:5)
      Match multival at loc 55(2,4)
      Exception raised:Expected "(" (at char 56), (line:2, col:5)
      Match identifier at loc 55(2,4)
      Matched identifier -> ['top']
      Matched attrval -> ['top']
      Matched attr -> [['SUP', 'top']]
      Match attr at loc 59(2,8)
      Match attrname at loc 60(2,9)
      Matched attrname -> ['STRUCTURAL']
      Match attrval at loc 70(2,19)
      Match identifier at loc 70(2,19)
      Matched identifier -> ['MUST']
      Match number at loc 70(2,19)
      Exception raised:Expected W:(0123...) (at char 72), (line:3, col:1)
      Match oid at loc 70(2,19)
      Exception raised:Expected W:(0123...) (at char 72), (line:3, col:1)
      Match multival at loc 70(2,19)
      Exception raised:Expected "(" (at char 72), (line:3, col:1)
      Match identifier at loc 70(2,19)
      Matched identifier -> ['MUST']
      Matched attrval -> ['MUST']
      Matched attr -> [['STRUCTURAL', 'MUST']]
      Match attr at loc 76(3,5)
      Match attrname at loc 77(3,6)
      Exception raised:Expected W:(ABCD...) (at char 77), (line:3, col:6)
      Exception raised:Expected W:(ABCD...) (at char 77), (line:3, col:6)
      Exception raised:Expected ")" (at char 77), (line:3, col:6)
      Traceback (most recent call last):
        File "opt2.py", line 41, in <module>
          moo = (schemaentry + stringEnd).parseString(teststr2)
        File "C:\Python25\lib\site-packages\pyparsing.py", line 1076, in parseString
          raise exc
      pyparsing.ParseException: Expected ")" (at char 77), (line:3, col:6)
      [/code]

      Note that "SUP top" was matched as an attr, followed by "STRUCTURAL MUST" as another attr.  Then when the '(' is encountered after MUST, this was unexpected, and caused the parser to fail.

      If "MUST" is not a valid attrval, then you need to add a negative lookahead to attrval.  If I modify your code to read this:
      [code]
      capitals = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
      attrname = Word(capitals)
      attrval = ~attrname + (singval ^ multival)
      [/code]

      Some other comments:
      - Group(Word(nums)) doesn't accomplish anything, just leave as Word(nums).
      - I think oid is better defined as:
      oid = Group(Combine(OneOrMore(Word(nums)+".") + Word(nums)) + Optional("{" + Word(nums) + "}"))

      - Define schemaentry as:
      schemaentry = decltype + "(" + oid + Dict(OneOrMore(attr))("attrs") + ")"
      This will automatically define results names for each attr name.  If you print out the results using print moo.dump(), you'll get:
         
      ['objectclass', '(', ['2.5.6.19'], [['NAME', "'cRLDistributionPoint'"], ['SUP', 'top'], ['STRUCTURAL'], ['MUST', '(', ['cn'], ')'], ['MAY', '(', ['certificateRevocationList', '$', 'authorityRevocationList', '$', 'deltaRevocationList'], ')']], ')']
      - attrs: [['NAME', "'cRLDistributionPoint'"], ['SUP', 'top'], ['STRUCTURAL'], ['MUST', '(', ['cn'], ')'], ['MAY', '(', ['certificateRevocationList', '$', 'authorityRevocationList', '$', 'deltaRevocationList'], ')']]
        - MAY: ['(', ['certificateRevocationList', '$', 'authorityRevocationList', '$', 'deltaRevocationList'], ')']
        - MUST: ['(', ['cn'], ')']
        - NAME: 'cRLDistributionPoint'
        - STRUCTURAL:
        - SUP: top

      You can also access individual fields as:

      print moo.attrs.NAME
      print moo.attrs.keys()

      - I would also suppress the '('s, ')'s and $'s, they don't add anything to your results, and the grouping they imply is already done by the Group construct.

      Good luck with your project, you look pretty far along already.

      -- Paul

       

      Related

      Code: code

    • mas ibro

      mas ibro - 2009-04-15

      Well, this is actually for the first time I use pyparsing, after read about it for some time, and it's surprisingly easy.

      This is exactly what I (and most users do) need, a hint/guide to debug  parser.
      It is very helpful!

      Thank you, keep the good project working.
      I believe more and more users will use pyparsing for their project.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.