I just started to use pyparsing and stucked already in a problem.
I have following string to parse :
A keyword can be "SCR#123" or "REQUEST#123" or something else (Word(alphanums))
But if we detect "SCR" or "REQUEST" : these words must be followed by "#" and some numbers, if not then a parse exception should occur.
examples :
- "SCR" --> wrong
- "SCR#" --> wrong
- "SCR#1" --> correct match
- "foo" --> correct match
- "REQ" --> correct match
- "REQUEST" --> wrong
The easy one :
SCR_REQUESTTokens = oneOf("SCR REQUEST", caseless=True) + Word("#"+nums, min=2)
'~' is the same as NotAny, and I'd say you were on the right track with ~GeneralToken. But I got confused in what was a GeneralToken vs a Token, so I tried to define your expressions with some different terminology:
data = """
- SCR --> wrong
- SCR# --> wrong
- SCR#1 --> correct match
- foo --> correct match
- REQ --> correct match
- REQUEST --> wrong
"""
from pyparsing import oneOf, Combine, Word, alphas, nums,line,col,Regex, Keyword
SCR_REQUESTprefix = oneOf("SCR REQUEST")
SCR_REQUESTtokens = Combine(SCR_REQUESTprefix + "#" + Word(nums))
nonSCR_REQUESTtokens = ~SCR_REQUESTprefix + Word(alphas)
# comment out next line to see difference
nonSCR_REQUESTtokens = ~SCR_REQUESTprefix + Regex(r"\b\w+\b")
for tokens,startLoc,endLoc in searchGrammar.scanString(data):
print line(startLoc,data)
print " "*(col(startLoc,data)-1)+tokens[0]
print
prints out:
- SCR#1 --> correct match
SCR#1
- foo --> correct match
foo
- REQ --> correct match
REQ
I had to cheat a little since I was just using scanString to step through the input string a character at a time looking for matches, and used a Regex with leading and trailing '\b' expressions, meaning "word break before and after". You may or may not need this, depending on how you end up using these expressions in a larger grammar.
Anyway, I hope this gives you some leads on where to go from here, write back if you continue to struggle.
Cheers,
-- Paul
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I just started to use pyparsing and stucked already in a problem.
I have following string to parse :
A keyword can be "SCR#123" or "REQUEST#123" or something else (Word(alphanums))
But if we detect "SCR" or "REQUEST" : these words must be followed by "#" and some numbers, if not then a parse exception should occur.
examples :
- "SCR" --> wrong
- "SCR#" --> wrong
- "SCR#1" --> correct match
- "foo" --> correct match
- "REQ" --> correct match
- "REQUEST" --> wrong
The easy one :
SCR_REQUESTTokens = oneOf("SCR REQUEST", caseless=True) + Word("#"+nums, min=2)
I tried to use notAny, but no luck
GeneralToken = oneOf("SCR REQUEST", caseless=True)
Token = (Group(OneOrMore(~GeneralToken + SCR_REQUESTTokens))) + (Word(alphanums))
I tried also FollowedBy, but without examples, it is not easy to understand :-(
thanks,
eric_vb,
'~' is the same as NotAny, and I'd say you were on the right track with ~GeneralToken. But I got confused in what was a GeneralToken vs a Token, so I tried to define your expressions with some different terminology:
data = """
- SCR --> wrong
- SCR# --> wrong
- SCR#1 --> correct match
- foo --> correct match
- REQ --> correct match
- REQUEST --> wrong
"""
from pyparsing import oneOf, Combine, Word, alphas, nums,line,col,Regex, Keyword
SCR_REQUESTprefix = oneOf("SCR REQUEST")
SCR_REQUESTtokens = Combine(SCR_REQUESTprefix + "#" + Word(nums))
nonSCR_REQUESTtokens = ~SCR_REQUESTprefix + Word(alphas)
# comment out next line to see difference
nonSCR_REQUESTtokens = ~SCR_REQUESTprefix + Regex(r"\b\w+\b")
ignorables = oneOf("wrong correct match")
searchGrammar = SCR_REQUESTtokens | nonSCR_REQUESTtokens
searchGrammar.ignore(ignorables)
for tokens,startLoc,endLoc in searchGrammar.scanString(data):
print line(startLoc,data)
print " "*(col(startLoc,data)-1)+tokens[0]
print
prints out:
- SCR#1 --> correct match
SCR#1
- foo --> correct match
foo
- REQ --> correct match
REQ
I had to cheat a little since I was just using scanString to step through the input string a character at a time looking for matches, and used a Regex with leading and trailing '\b' expressions, meaning "word break before and after". You may or may not need this, depending on how you end up using these expressions in a larger grammar.
Anyway, I hope this gives you some leads on where to go from here, write back if you continue to struggle.
Cheers,
-- Paul