I don't think Optional is necessarily a problem. Here are some tips on debugging your parser.
Use setName and setDebug on your parser subexpressions. This will help you identify when an expression matches and what tokens it matches. This is usually informative and/or surprising.
I inserted this code in your program just before the second call to parseString to set names and debug on a number of variables:
[code]
for varname in "oid identifier singval multival attrval number attrname attr schemaentry decltype".split():
vars()[varname].setName(varname)
vars()[varname].setDebug()
[/code]
This gives the following output:
[code]
Match schemaentry at loc 0(1,1)
Match decltype at loc 0(1,1)
Matched decltype -> ['objectclass']
Match oid at loc 13(1,14)
Matched oid -> [['2.5.6.', '19']]
Match attr at loc 23(1,24)
Match attrname at loc 23(1,24)
Matched attrname -> ['NAME']
Match attrval at loc 27(1,28)
Match identifier at loc 27(1,28)
Exception raised:Expected W:(abcd...) (at char 28), (line:1, col:29)
Match number at loc 27(1,28)
Exception raised:Expected W:(0123...) (at char 28), (line:1, col:29)
Match oid at loc 27(1,28)
Exception raised:Expected W:(0123...) (at char 28), (line:1, col:29)
Match multival at loc 27(1,28)
Exception raised:Expected "(" (at char 28), (line:1, col:29)
Matched attrval -> ["'cRLDistributionPoint'"]
Matched attr -> [['NAME', "'cRLDistributionPoint'"]]
Match attr at loc 50(1,51)
Match attrname at loc 52(2,1)
Matched attrname -> ['SUP']
Match attrval at loc 55(2,4)
Match identifier at loc 55(2,4)
Matched identifier -> ['top']
Match number at loc 55(2,4)
Exception raised:Expected W:(0123...) (at char 56), (line:2, col:5)
Match oid at loc 55(2,4)
Exception raised:Expected W:(0123...) (at char 56), (line:2, col:5)
Match multival at loc 55(2,4)
Exception raised:Expected "(" (at char 56), (line:2, col:5)
Match identifier at loc 55(2,4)
Matched identifier -> ['top']
Matched attrval -> ['top']
Matched attr -> [['SUP', 'top']]
Match attr at loc 59(2,8)
Match attrname at loc 60(2,9)
Matched attrname -> ['STRUCTURAL']
Match attrval at loc 70(2,19)
Match identifier at loc 70(2,19)
Matched identifier -> ['MUST']
Match number at loc 70(2,19)
Exception raised:Expected W:(0123...) (at char 72), (line:3, col:1)
Match oid at loc 70(2,19)
Exception raised:Expected W:(0123...) (at char 72), (line:3, col:1)
Match multival at loc 70(2,19)
Exception raised:Expected "(" (at char 72), (line:3, col:1)
Match identifier at loc 70(2,19)
Matched identifier -> ['MUST']
Matched attrval -> ['MUST']
Matched attr -> [['STRUCTURAL', 'MUST']]
Match attr at loc 76(3,5)
Match attrname at loc 77(3,6)
Exception raised:Expected W:(ABCD...) (at char 77), (line:3, col:6)
Exception raised:Expected W:(ABCD...) (at char 77), (line:3, col:6)
Exception raised:Expected ")" (at char 77), (line:3, col:6)
Traceback (most recent call last):
File "opt2.py", line 41, in <module>
moo = (schemaentry + stringEnd).parseString(teststr2)
File "C:\Python25\lib\site-packages\pyparsing.py", line 1076, in parseString
raise exc
pyparsing.ParseException: Expected ")" (at char 77), (line:3, col:6)
[/code]
Note that "SUP top" was matched as an attr, followed by "STRUCTURAL MUST" as another attr. Then when the '(' is encountered after MUST, this was unexpected, and caused the parser to fail.
If "MUST" is not a valid attrval, then you need to add a negative lookahead to attrval. If I modify your code to read this:
[code]
capitals = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
attrname = Word(capitals)
attrval = ~attrname + (singval ^ multival)
[/code]
Some other comments:
- Group(Word(nums)) doesn't accomplish anything, just leave as Word(nums).
- I think oid is better defined as:
oid = Group(Combine(OneOrMore(Word(nums)+".") + Word(nums)) + Optional("{" + Word(nums) + "}"))
- Define schemaentry as:
schemaentry = decltype + "(" + oid + Dict(OneOrMore(attr))("attrs") + ")"
This will automatically define results names for each attr name. If you print out the results using print moo.dump(), you'll get:
- I would also suppress the '('s, ')'s and $'s, they don't add anything to your results, and the grouping they imply is already done by the Group construct.
Good luck with your project, you look pretty far along already.
Hi, I'm new to pyparsing (and parsing business) and I'm trying to parse directory schema, but got stuck in using Optional().
Here are the code:
[[code]]
from pyparsing import *
number = Combine(Word(nums) + Optional("." + Word(nums)))
oid = Group(Combine(OneOrMore(Word(nums)+".")) + Word(nums) + Optional("{" + Group(Word(nums)) + "}"))
identifier= Word(alphas)
singval = identifier ^ number ^ sglQuotedString ^ oid
multival = "(" + Group(singval + ZeroOrMore(Optional("$") + singval)) + ")"
attrval = singval ^ multival
capitals = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
attrname = Word(capitals)
attr = Group(attrname + Optional(attrval))
decltype = Word(alphas)
schemaentry = decltype + "(" + oid + Group(OneOrMore(attr)) + ")"
teststr1 = """\
attributetype ( 2.5.4.27 NAME 'destinationIndicator'
DESC 'RFC2256: destination indicator'
EQUALITY caseIgnoreMatch
SUBSTR caseIgnoreSubstringsMatch
SYNTAX 1.3.6.1.4.1.1466.115.121.1.44{128} )
"""
teststr2 = """\
objectclass ( 2.5.6.19 NAME 'cRLDistributionPoint'
SUP top STRUCTURAL
MUST ( cn )
MAY ( certificateRevocationList $ authorityRevocationList $
deltaRevocationList ) )
"""
moo = (schemaentry + stringEnd).parseString(teststr1)
print moo
moo = (schemaentry + stringEnd).parseString(teststr2)
print moo
[[code]]
===================================================
It works on teststr1 but throws 'pyparsing.ParseException: Expected ")" (at char 91), (line:3, col:14)' on teststr2. (SUP top STRUCTURAL)
Am I doing wrong at using the Optional() ?
Forgot to say,
"It's a cool stuff!"
-- Another satisfied pyparsing user
I don't think Optional is necessarily a problem. Here are some tips on debugging your parser.
Use setName and setDebug on your parser subexpressions. This will help you identify when an expression matches and what tokens it matches. This is usually informative and/or surprising.
I inserted this code in your program just before the second call to parseString to set names and debug on a number of variables:
[code]
for varname in "oid identifier singval multival attrval number attrname attr schemaentry decltype".split():
vars()[varname].setName(varname)
vars()[varname].setDebug()
[/code]
This gives the following output:
[code]
Match schemaentry at loc 0(1,1)
Match decltype at loc 0(1,1)
Matched decltype -> ['objectclass']
Match oid at loc 13(1,14)
Matched oid -> [['2.5.6.', '19']]
Match attr at loc 23(1,24)
Match attrname at loc 23(1,24)
Matched attrname -> ['NAME']
Match attrval at loc 27(1,28)
Match identifier at loc 27(1,28)
Exception raised:Expected W:(abcd...) (at char 28), (line:1, col:29)
Match number at loc 27(1,28)
Exception raised:Expected W:(0123...) (at char 28), (line:1, col:29)
Match oid at loc 27(1,28)
Exception raised:Expected W:(0123...) (at char 28), (line:1, col:29)
Match multival at loc 27(1,28)
Exception raised:Expected "(" (at char 28), (line:1, col:29)
Matched attrval -> ["'cRLDistributionPoint'"]
Matched attr -> [['NAME', "'cRLDistributionPoint'"]]
Match attr at loc 50(1,51)
Match attrname at loc 52(2,1)
Matched attrname -> ['SUP']
Match attrval at loc 55(2,4)
Match identifier at loc 55(2,4)
Matched identifier -> ['top']
Match number at loc 55(2,4)
Exception raised:Expected W:(0123...) (at char 56), (line:2, col:5)
Match oid at loc 55(2,4)
Exception raised:Expected W:(0123...) (at char 56), (line:2, col:5)
Match multival at loc 55(2,4)
Exception raised:Expected "(" (at char 56), (line:2, col:5)
Match identifier at loc 55(2,4)
Matched identifier -> ['top']
Matched attrval -> ['top']
Matched attr -> [['SUP', 'top']]
Match attr at loc 59(2,8)
Match attrname at loc 60(2,9)
Matched attrname -> ['STRUCTURAL']
Match attrval at loc 70(2,19)
Match identifier at loc 70(2,19)
Matched identifier -> ['MUST']
Match number at loc 70(2,19)
Exception raised:Expected W:(0123...) (at char 72), (line:3, col:1)
Match oid at loc 70(2,19)
Exception raised:Expected W:(0123...) (at char 72), (line:3, col:1)
Match multival at loc 70(2,19)
Exception raised:Expected "(" (at char 72), (line:3, col:1)
Match identifier at loc 70(2,19)
Matched identifier -> ['MUST']
Matched attrval -> ['MUST']
Matched attr -> [['STRUCTURAL', 'MUST']]
Match attr at loc 76(3,5)
Match attrname at loc 77(3,6)
Exception raised:Expected W:(ABCD...) (at char 77), (line:3, col:6)
Exception raised:Expected W:(ABCD...) (at char 77), (line:3, col:6)
Exception raised:Expected ")" (at char 77), (line:3, col:6)
Traceback (most recent call last):
File "opt2.py", line 41, in <module>
moo = (schemaentry + stringEnd).parseString(teststr2)
File "C:\Python25\lib\site-packages\pyparsing.py", line 1076, in parseString
raise exc
pyparsing.ParseException: Expected ")" (at char 77), (line:3, col:6)
[/code]
Note that "SUP top" was matched as an attr, followed by "STRUCTURAL MUST" as another attr. Then when the '(' is encountered after MUST, this was unexpected, and caused the parser to fail.
If "MUST" is not a valid attrval, then you need to add a negative lookahead to attrval. If I modify your code to read this:
[code]
capitals = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
attrname = Word(capitals)
attrval = ~attrname + (singval ^ multival)
[/code]
Some other comments:
- Group(Word(nums)) doesn't accomplish anything, just leave as Word(nums).
- I think oid is better defined as:
oid = Group(Combine(OneOrMore(Word(nums)+".") + Word(nums)) + Optional("{" + Word(nums) + "}"))
- Define schemaentry as:
schemaentry = decltype + "(" + oid + Dict(OneOrMore(attr))("attrs") + ")"
This will automatically define results names for each attr name. If you print out the results using print moo.dump(), you'll get:
['objectclass', '(', ['2.5.6.19'], [['NAME', "'cRLDistributionPoint'"], ['SUP', 'top'], ['STRUCTURAL'], ['MUST', '(', ['cn'], ')'], ['MAY', '(', ['certificateRevocationList', '$', 'authorityRevocationList', '$', 'deltaRevocationList'], ')']], ')']
- attrs: [['NAME', "'cRLDistributionPoint'"], ['SUP', 'top'], ['STRUCTURAL'], ['MUST', '(', ['cn'], ')'], ['MAY', '(', ['certificateRevocationList', '$', 'authorityRevocationList', '$', 'deltaRevocationList'], ')']]
- MAY: ['(', ['certificateRevocationList', '$', 'authorityRevocationList', '$', 'deltaRevocationList'], ')']
- MUST: ['(', ['cn'], ')']
- NAME: 'cRLDistributionPoint'
- STRUCTURAL:
- SUP: top
You can also access individual fields as:
print moo.attrs.NAME
print moo.attrs.keys()
- I would also suppress the '('s, ')'s and $'s, they don't add anything to your results, and the grouping they imply is already done by the Group construct.
Good luck with your project, you look pretty far along already.
-- Paul
Related
Code: code
Well, this is actually for the first time I use pyparsing, after read about it for some time, and it's surprisingly easy.
This is exactly what I (and most users do) need, a hint/guide to debug parser.
It is very helpful!
Thank you, keep the good project working.
I believe more and more users will use pyparsing for their project.