Hi, I'm new to pyparsing (and parsing business) and I'm trying to parse directory schema, but got stuck in using Optional().

Here are the code:

[[code]]
from pyparsing import *

number = Combine(Word(nums) + Optional("." + Word(nums)))
oid = Group(Combine(OneOrMore(Word(nums)+".")) + Word(nums) + Optional("{" + Group(Word(nums)) + "}"))
identifier= Word(alphas)
singval = identifier ^ number ^ sglQuotedString ^ oid
multival = "(" + Group(singval + ZeroOrMore(Optional("$") + singval)) + ")"
attrval = singval ^ multival
capitals = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
attrname = Word(capitals)
attr = Group(attrname + Optional(attrval))
decltype = Word(alphas)
schemaentry = decltype + "(" + oid + Group(OneOrMore(attr)) + ")"

teststr1 = """\
attributetype ( 2.5.4.27 NAME 'destinationIndicator'
        DESC 'RFC2256: destination indicator'
        EQUALITY caseIgnoreMatch
        SUBSTR caseIgnoreSubstringsMatch
        SYNTAX 1.3.6.1.4.1.1466.115.121.1.44{128} )
"""

teststr2 = """\
objectclass ( 2.5.6.19 NAME 'cRLDistributionPoint'
        SUP top STRUCTURAL
        MUST ( cn )
        MAY ( certificateRevocationList $ authorityRevocationList $
                deltaRevocationList ) )
"""

moo = (schemaentry + stringEnd).parseString(teststr1)
print moo

moo = (schemaentry + stringEnd).parseString(teststr2)
print moo
[[code]]
===================================================

It works on teststr1 but throws 'pyparsing.ParseException: Expected ")" (at char 91), (line:3, col:14)' on teststr2. (SUP top STRUCTURAL)

Am I doing wrong at using the Optional() ?

I don't think Optional is necessarily a problem. Here are some tips on debugging your parser.

Use setName and setDebug on your parser subexpressions. This will help you identify when an expression matches and what tokens it matches. This is usually informative and/or surprising.

I inserted this code in your program just before the second call to parseString to set names and debug on a number of variables:
[code]
for varname in "oid identifier singval multival attrval number attrname attr schemaentry decltype".split():
vars()[varname].setName(varname)
vars()[varname].setDebug()
[/code]

This gives the following output:
[code]
Match schemaentry at loc 0(1,1)
Match decltype at loc 0(1,1)
Matched decltype -> ['objectclass']
Match oid at loc 13(1,14)
Matched oid -> [['2.5.6.', '19']]
Match attr at loc 23(1,24)
Match attrname at loc 23(1,24)
Matched attrname -> ['NAME']
Match attrval at loc 27(1,28)
Match identifier at loc 27(1,28)
Exception raised:Expected W:(abcd...) (at char 28), (line:1, col:29)
Match number at loc 27(1,28)
Exception raised:Expected W:(0123...) (at char 28), (line:1, col:29)
Match oid at loc 27(1,28)
Exception raised:Expected W:(0123...) (at char 28), (line:1, col:29)
Match multival at loc 27(1,28)
Exception raised:Expected "(" (at char 28), (line:1, col:29)
Matched attrval -> ["'cRLDistributionPoint'"]
Matched attr -> [['NAME', "'cRLDistributionPoint'"]]
Match attr at loc 50(1,51)
Match attrname at loc 52(2,1)
Matched attrname -> ['SUP']
Match attrval at loc 55(2,4)
Match identifier at loc 55(2,4)
Matched identifier -> ['top']
Match number at loc 55(2,4)
Exception raised:Expected W:(0123...) (at char 56), (line:2, col:5)
Match oid at loc 55(2,4)
Exception raised:Expected W:(0123...) (at char 56), (line:2, col:5)
Match multival at loc 55(2,4)
Exception raised:Expected "(" (at char 56), (line:2, col:5)
Match identifier at loc 55(2,4)
Matched identifier -> ['top']
Matched attrval -> ['top']
Matched attr -> [['SUP', 'top']]
Match attr at loc 59(2,8)
Match attrname at loc 60(2,9)
Matched attrname -> ['STRUCTURAL']
Match attrval at loc 70(2,19)
Match identifier at loc 70(2,19)
Matched identifier -> ['MUST']
Match number at loc 70(2,19)
Exception raised:Expected W:(0123...) (at char 72), (line:3, col:1)
Match oid at loc 70(2,19)
Exception raised:Expected W:(0123...) (at char 72), (line:3, col:1)
Match multival at loc 70(2,19)
Exception raised:Expected "(" (at char 72), (line:3, col:1)
Match identifier at loc 70(2,19)
Matched identifier -> ['MUST']
Matched attrval -> ['MUST']
Matched attr -> [['STRUCTURAL', 'MUST']]
Match attr at loc 76(3,5)
Match attrname at loc 77(3,6)
Exception raised:Expected W:(ABCD...) (at char 77), (line:3, col:6)
Exception raised:Expected W:(ABCD...) (at char 77), (line:3, col:6)
Exception raised:Expected ")" (at char 77), (line:3, col:6)
Traceback (most recent call last):
File "opt2.py", line 41, in <module>
moo = (schemaentry + stringEnd).parseString(teststr2)
File "C:\Python25\lib\site-packages\pyparsing.py", line 1076, in parseString
raise exc
pyparsing.ParseException: Expected ")" (at char 77), (line:3, col:6)
[/code]

Note that "SUP top" was matched as an attr, followed by "STRUCTURAL MUST" as another attr. Then when the '(' is encountered after MUST, this was unexpected, and caused the parser to fail.

If "MUST" is not a valid attrval, then you need to add a negative lookahead to attrval. If I modify your code to read this:
[code]
capitals = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
attrname = Word(capitals)
attrval = ~attrname + (singval ^ multival)
[/code]

Some other comments:
- Group(Word(nums)) doesn't accomplish anything, just leave as Word(nums).
- I think oid is better defined as:
oid = Group(Combine(OneOrMore(Word(nums)+".") + Word(nums)) + Optional("{" + Word(nums) + "}"))

- Define schemaentry as:
schemaentry = decltype + "(" + oid + Dict(OneOrMore(attr))("attrs") + ")"
This will automatically define results names for each attr name. If you print out the results using print moo.dump(), you'll get:

['objectclass', '(', ['2.5.6.19'], [['NAME', "'cRLDistributionPoint'"], ['SUP', 'top'], ['STRUCTURAL'], ['MUST', '(', ['cn'], ')'], ['MAY', '(', ['certificateRevocationList', '$', 'authorityRevocationList', '$', 'deltaRevocationList'], ')']], ')']
- attrs: [['NAME', "'cRLDistributionPoint'"], ['SUP', 'top'], ['STRUCTURAL'], ['MUST', '(', ['cn'], ')'], ['MAY', '(', ['certificateRevocationList', '$', 'authorityRevocationList', '$', 'deltaRevocationList'], ')']]
- MAY: ['(', ['certificateRevocationList', '$', 'authorityRevocationList', '$', 'deltaRevocationList'], ')']
- MUST: ['(', ['cn'], ')']
- NAME: 'cRLDistributionPoint'
- STRUCTURAL:
- SUP: top

You can also access individual fields as:

print moo.attrs.NAME
print moo.attrs.keys()

- I would also suppress the '('s, ')'s and $'s, they don't add anything to your results, and the grouping they imply is already done by the Group construct.

Good luck with your project, you look pretty far along already.

-- Paul

Code: code

How to use Optional() properly?

Forums

Help

How to use Optional() properly?

Here are the code:

Related

How to use Optional() properly?

Forums

Help

How to use Optional() properly? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Here are the code:

Related

How to use Optional() properly?