Python parsing module / Discussion / Help/Open Discussion: Match with a negative expression

Michael Murdock - 2005-11-10

Hello,

I have been able to use pyparsing quite effectively for parsing natural language text containing some particular expressions. Until now.

I need to match a phrase with 0 or more words. But it has to stop matching on one of a particular set of words. Here's what I mean.

I define the following constraints:

Years = Word('1', nums, exact=4)
Months = oneOf('Jan Feb Mar Apr')
Days = Word(nums,min=1,max=2)
Places = OneOrMore(Word(alphas, alphas + '.' + ',')

I define the following parse rule:

r = (CaselessLiteral('arrived') ^
     CaselessLiteral('departed')) +
     Optional(oneOf('in on at from')) +
     Optional(Places) +
     Optional(Months) +
     Optional(Days) +
      Years

testString1 = 'Departed from Kansas Jan 4 1987'
testString2 = 'Departed from Kansas City Jan 4 1987'
testString2 = 'Arrived in Kansas Feb 4 1988'
testString3 = 'Arrived in New York Mar 6 1989'

When I do something like the following:

   for match in r.scanString(testString1)

the parser matches Places with 'Kansas' and 'Jan' but Months doesn't get matched. Day and Year match correctly.

What I would like to be able to do is to define a ParserElement subclass like I have done with Places, but somehow tell it to exclude the words: 'Jan', 'Feb', 'Mar, and 'Apr'. Then this definition would allow the month to match correctly.

I tried using NotAny and CharsNotIn without any luck. Is there a way to specify a ParserElement subclass with the normal Word() syntax for what _should_ match but also with a specific set of words that must not cause a match?

Thanks,

~Michael.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Paul McGuire - 2005-11-10
  
  The short answer is, try changing Places to:
  
  Places = Group(OneOrMore(~Months+Word(alphas, alphas + '.' + ',')))
  
  (~ is operator shorthand for NotAny)
  
  What this does is, before accepting another Word, first makes sure it is *not* a Months - if it is, the OneOrMore will stop reading Words and go on to the next part of your expression.
  
  The Group is there to keep all your Places words together - otherwise, you just end up with a list of tokens that you'll have to pick apart again later - this way pyparsing keeps track of them while you are parsing.
  
  Glad to hear pyparsing is working well for you!
  -- Paul
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Paul McGuire - 2005-11-10
  
  As a nicer-looking alternative to Group, you can specify Combine with a join string of " ", and adjacent=False, as in:
  
  Places = Combine(OneOrMore(~Months+Word(alphas, alphas + '.' + ','))," ",adjacent=False)
  
  This will give you parsing results like:
  ['departed', 'from', 'Kansas', 'Jan', '4', '1987']
  ['departed', 'from', 'Kansas City', 'Jan', '4', '1987']
  ['arrived', 'in', 'Kansas', 'Feb', '4', '1988']
  ['arrived', 'in', 'New York', 'Mar', '6', '1989']
  
  instead of
  ['departed', 'from', ['Kansas'], 'Jan', '4', '1987']
  ['departed', 'from', ['Kansas', 'City'], 'Jan', '4', '1987']
  ['arrived', 'in', ['Kansas'], 'Feb', '4', '1988']
  ['arrived', 'in', ['New', 'York'], 'Mar', '6', '1989']
  
  -- Paul
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Match with a negative expression

Forums

Help

Match with a negative expression document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Match with a negative expression