Python parsing module / Bugs / #88 New Parsers (attached)

Dan Strohl - 2016-02-13

And the tests for these.

test_pyparsing_add.py

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Paul McGuire - 2016-02-14
  
  Dan -
  
  Thanks for taking the time to write up these proposed classes to be added to
  Pyparsing.
  
  In the interests of keeping the API small and easy to learn, I have a high
  barrier for adding new classes to Pyparsing. In many of my own parsers, I
  will create small functions or closures to generate repetitive expressions
  or parse actions.
  
  Please look over these alternatives to your proposed new classes, mostly
  using variations on parse actions and conditions (newly added in a recent
  release):
  
  define some baseline expressions - an integer is a word made of nums, and
  
  an oddnum is an integer that ends with 1, 3, 5 ,7 or 9
  
  integer = Word(nums)
  
  integers = OneOrMore(integer)
  
  oddnum = integer().addCondition(lambda t: t[0][-1] in set('13579'))
  
  CountIn
  
  expr1 = integers()
  
  expr1.addCondition(lambda t: list(t).count(oddnum) == 2)
  
  Count
  
  expr2 = integers()
  
  expr2.addCondition(lambda t: len(t) == 3)
  
  Len
  
  expr3 = integers()
  
  expr3 = locatedExpr(expr3)
  
  expr3.addCondition(lambda t: t[0].locn_end - t[0].locn_start == 5)
  
  expr3.addParseAction(lambda t: t[0].value)
  
  for expr in (expr1, expr2, expr3):
  
  print expr.parseString("1 2 3")
  
  In any event, these feel fairly specialized to me still, so for the moment,
  I'm going to hold off on incorporating them into the standard Pyparsing
  release. For your application, you might consider making yourself these
  little macro functions (note that "expr()" is the new shorthand for
  "expr.copy()"):
  
  CountIn = lambda expr, match, n: expr().addCondition(lambda t:
  list(t).count(match) == n)
  
  Count = lambda expr, n: expr().addCondition(lambda t: len(t) == n)
  
  Len = lambda expr, n: locatedExpr(expr).addCondition(lambda t: t[0].locn_end
  - t[0].locn_start == n).addParseAction(lambda t: t[0].value)
  
  (I'm especially pleased with how easy CountIn is to write, using the
  standard count() method of lists to do equality checking, and using the '=='
  override that allows you to test the matching of an expression with a
  string, to give you the count of tokens that match another parse expression
  - in this case, finding the number of odd numbers in a list of matched
  integers.)
  
  Len was probably the one that gave me the most trouble, using the
  locatedExpr helper, a condition, and a parse action to return back the
  original matched tokens. But I would rather work with the actual start and
  end locations as the length to be evaluated, rather than running the tokens
  together using ''.join().
  
  Thanks for this submission - if you like, I can repackage them in the
  Pyparsing examples, as they are a novel and non-trivial use of some of the
  newer features in pyparsing.
  
  Regards,
  
  -- Paul
  
  This email has been checked for viruses by Avast antivirus software.
  https://www.avast.com/antivirus
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Dan Strohl - 2016-02-14
    
    I do have a request though, (or more of a suggestion I guess)..,
    
    for the examples / documentation, it woudl be really nice to have a list of the techniques / functions used per example, and possibly an index.of these... sometimes you note them in the descriptions, but other times it just says "A dice roll parser and evaluator for evaluating strings such as "4d20+5.5+4d6.takeHighest(3)".", which would be great if I was trying to figure out how to roll some dice, but not so much in telling me that it has an example of operatorPrecedence and CaselessLiteral in there.
    
    It's not a big thing, but it woudl be nice.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Dan Strohl - 2016-02-14
    
    re: "But I would rather work with the actual start and
    end locations as the length to be evaluated, rather than running the tokens
    together using ''.join()."
    
    I thought about that, but I wanted to account for things like content replacemetns or not measuring .suppress()ed tokens in my measurements.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Paul McGuire - 2016-02-15
      
      Good point, this is also a problem for originalTextFor (which I thought of
      using for Len instead that goofy locatedExpr mess, but it discards the
      originally parsed tokens).
      
      I've gotten a number of suggestions for similar recipes, parse action,
      pre-defined expressions (like a Regex for a floating point number). The
      itertools module contains a number of recipes in its documentation, maybe I
      should capture a bunch of these in an example or the docs. (One user took a
      stab at this in the public Pyparsing wiki, but it never got much traction.)
      
      -- Paul
      
      From: Dan Strohl [mailto:dstrohl@users.sf.net]
      Sent: Sunday, February 14, 2016 4:49 PM
      To: [pyparsing:bugs] 88@bugs.pyparsing.p.re.sf.net
      Subject: [pyparsing:bugs] Re: #88 New Parsers (attached)
      
      re: "But I would rather work with the actual start and
      end locations as the length to be evaluated, rather than running the tokens
      together using ''.join()."
      
      I thought about that, but I wanted to account for things like content
      replacemetns or not measuring .suppress()ed tokens in my measurements.
      
      This email has been checked for viruses by Avast antivirus software.
      https://www.avast.com/antivirus
      
      alternate
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul McGuire - 2016-02-14

Dan -

Thanks for taking the time to write up these proposed classes to be added to
Pyparsing.

In the interests of keeping the API small and easy to learn, I have a high
barrier for adding new classes to Pyparsing. In many of my own parsers, I
will create small functions or closures to generate repetitive expressions
or parse actions.

Please look over these alternatives to your proposed new classes, mostly
using variations on parse actions and conditions (newly added in a recent
release):

# define some baseline expressions - an integer is a word made of nums,

and an oddnum is an integer that ends with 1, 3, 5 ,7 or 9

integer = Word(nums) integers = OneOrMore(integer) oddnum = integer().addCondition(lambda t: t[0][-1] in set('13579')) # CountIn expr1 = integers() expr1.addCondition(lambda t: list(t).count(oddnum) == 2) # Count expr2 = integers() expr2.addCondition(lambda t: len(t) == 3) # Len expr3 = integers() expr3 = locatedExpr(expr3) expr3.addCondition(lambda t: t[0].locn_end - t[0].locn_start == 5) expr3.addParseAction(lambda t: t[0].value) for expr in (expr1, expr2, expr3): print expr.parseString("1 2 3")

In any event, these feel fairly specialized to me still, so for the moment,
I'm going to hold off on incorporating them into the standard Pyparsing
release. For your application, you might consider making yourself these
little macro functions (note that "expr()" is the new shorthand for
"expr.copy()"):

CountIn = lambda expr, match, n: expr().addCondition(lambda t:

list(t).count(match) == n)

Count = lambda expr, n: expr().addCondition(lambda t: len(t) == n) Len = lambda expr, n: locatedExpr(expr).addCondition(lambda t:

t[0].locn_end - t[0].locn_start == n).addParseAction(lambda t: t[0].value)

(I'm especially pleased with how easy CountIn is to write, using the
standard count() method of lists to do equality checking, and using the '=='
override that allows you to test the matching of an expression with a
string, to give you the count of tokens that match another parse expression
- in this case, finding the number of odd numbers in a list of matched
integers.)

Len was probably the one that gave me the most trouble, using the
locatedExpr helper, a condition, and a parse action to return back the
original matched tokens. But I would rather work with the actual start and
end locations as the length to be evaluated, rather than running the tokens
together using ''.join().

Thanks for this submission - if you like, I can repackage them in the
Pyparsing examples, as they are a novel and non-trivial use of some of the
newer features in pyparsing.

Regards,

-- Paul

This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

alternate
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dan Strohl - 2016-02-14

Thanks, I didnt see the .addCondition() method, (I was looking for something like that, I thought about using .addAction(), but I was not sure if raising an exception at that point was a good idea.

No problem on not including them, especially since it looks pretty easy to do without these. (I am always a fan of keeping things simple.)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dan Strohl - 2016-02-14

Ok, actually, in looking again, I did see the addCondition, but was not sure how to use it, the docs are pretty light for that method.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

New Parsers (attached)

Group

Searches

Help

#88 New Parsers (attached)

Discussion

define some baseline expressions - an integer is a word made of nums, and

CountIn

Count

Len