Python parsing module / Discussion / Help/Open Discussion: use of oneOf, but in negative way (notOneOf)

use of oneOf, but in negative way (notOneOf)

Forum: Help/Open Discussion

Creator: eric_vb

Created: 2007-01-19

Updated: 2013-05-14

eric_vb - 2007-01-19

I just started to use pyparsing and stucked already in a problem.
I have following string to parse :
A keyword can be "SCR#123" or "REQUEST#123" or something else (Word(alphanums))
But if we detect "SCR" or "REQUEST" : these words must be followed by "#" and some numbers, if not then a parse exception should occur.

examples :
- "SCR" --> wrong
- "SCR#" --> wrong
- "SCR#1" --> correct match
- "foo" --> correct match
- "REQ" --> correct match
- "REQUEST" --> wrong

The easy one :
SCR_REQUESTTokens = oneOf("SCR REQUEST", caseless=True) + Word("#"+nums, min=2)

I tried to use notAny, but no luck

GeneralToken = oneOf("SCR REQUEST", caseless=True)
Token = (Group(OneOrMore(~GeneralToken + SCR_REQUESTTokens))) + (Word(alphanums))

I tried also FollowedBy, but without examples, it is not easy to understand :-(

thanks,

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Paul McGuire - 2007-01-24
  
  eric_vb,
  
  '~' is the same as NotAny, and I'd say you were on the right track with ~GeneralToken. But I got confused in what was a GeneralToken vs a Token, so I tried to define your expressions with some different terminology:
  
  data = """
  - SCR --> wrong
  - SCR# --> wrong
  - SCR#1 --> correct match
  - foo --> correct match
  - REQ --> correct match
  - REQUEST --> wrong
  """
  from pyparsing import oneOf, Combine, Word, alphas, nums,line,col,Regex, Keyword
  
  SCR_REQUESTprefix = oneOf("SCR REQUEST")
  SCR_REQUESTtokens = Combine(SCR_REQUESTprefix + "#" + Word(nums))
  nonSCR_REQUESTtokens = ~SCR_REQUESTprefix + Word(alphas)
  # comment out next line to see difference
  nonSCR_REQUESTtokens = ~SCR_REQUESTprefix + Regex(r"\b\w+\b")
  
  ignorables = oneOf("wrong correct match")
  
  searchGrammar = SCR_REQUESTtokens | nonSCR_REQUESTtokens
  searchGrammar.ignore(ignorables)
  
  for tokens,startLoc,endLoc in searchGrammar.scanString(data):
      print line(startLoc,data)
      print " "*(col(startLoc,data)-1)+tokens[0]
      print
  
  prints out:
  
  - SCR#1 --> correct match
  SCR#1
  
  - foo --> correct match
  foo
  
  - REQ --> correct match
  REQ
  
  I had to cheat a little since I was just using scanString to step through the input string a character at a time looking for matches, and used a Regex with leading and trailing '\b' expressions, meaning "word break before and after". You may or may not need this, depending on how you end up using these expressions in a larger grammar.
  
  Anyway, I hope this gives you some leads on where to go from here, write back if you continue to struggle.
  
  Cheers,
  -- Paul
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

use of oneOf, but in negative way (notOneOf)

Forums

Help

use of oneOf, but in negative way (notOneOf) document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

use of oneOf, but in negative way (notOneOf)