use of oneOf, but in negative way (notOneOf)

eric_vb
2007-01-19
2013-05-14
  • eric_vb

    eric_vb - 2007-01-19

    I just started to use pyparsing and stucked already in a problem.
    I have following string to parse :
    A keyword can be "SCR#123" or "REQUEST#123" or something else (Word(alphanums))
    But if we detect "SCR" or "REQUEST" : these words must be followed by "#" and some numbers, if not then a parse exception should occur.

    examples :
    - "SCR"  --> wrong
    - "SCR#"  --> wrong
    - "SCR#1"  --> correct match
    - "foo" --> correct match
    - "REQ" --> correct match
    - "REQUEST" --> wrong

    The easy one :
    SCR_REQUESTTokens = oneOf("SCR REQUEST", caseless=True) + Word("#"+nums, min=2)

    I tried to use notAny, but no luck

    GeneralToken = oneOf("SCR REQUEST", caseless=True)
    Token = (Group(OneOrMore(~GeneralToken + SCR_REQUESTTokens))) + (Word(alphanums))

    I tried also FollowedBy, but without examples, it is not easy to understand :-(

    thanks,

     
    • Paul McGuire

      Paul McGuire - 2007-01-24

      eric_vb,

      '~' is the same as NotAny, and I'd say you were on the right track with ~GeneralToken.  But I got confused in what was a GeneralToken vs a Token, so I tried to define your expressions with some different terminology:

      data = """
      - SCR  --> wrong
      - SCR#  --> wrong
      - SCR#1  --> correct match
      - foo --> correct match
      - REQ --> correct match
      - REQUEST --> wrong
      """
      from pyparsing import oneOf, Combine, Word, alphas, nums,line,col,Regex, Keyword

      SCR_REQUESTprefix = oneOf("SCR REQUEST")
      SCR_REQUESTtokens = Combine(SCR_REQUESTprefix + "#" + Word(nums))
      nonSCR_REQUESTtokens = ~SCR_REQUESTprefix + Word(alphas)
      # comment out next line to see difference
      nonSCR_REQUESTtokens = ~SCR_REQUESTprefix + Regex(r"\b\w+\b")

      ignorables = oneOf("wrong correct match")

      searchGrammar = SCR_REQUESTtokens | nonSCR_REQUESTtokens
      searchGrammar.ignore(ignorables)

      for tokens,startLoc,endLoc in searchGrammar.scanString(data):
          print line(startLoc,data)
          print " "*(col(startLoc,data)-1)+tokens[0]
          print

      prints out:

      - SCR#1  --> correct match
        SCR#1

      - foo --> correct match
        foo

      - REQ --> correct match
        REQ

      I had to cheat a little since I was just using scanString to step through the input string a character at a time looking for matches, and used a Regex with leading and trailing '\b' expressions, meaning "word break before and after".  You may or may not need this, depending on how you end up using these expressions in a larger grammar.

      Anyway, I hope this gives you some leads on where to go from here, write back if you continue to struggle.

      Cheers,
      -- Paul

       

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks