In a parsign project I am workign on (validating domain names), I needed to be able to both validate the length of a token, and validate the number of tokens, so I created the attached additional parsers. In case they are of interest, I am passing them back to you if you want to include them. (I am not as familiar with sourceforge as I am with git, so I dont know how to do a pull request here, sorry).
These work from a basic POV, however I did not include any of the debug methods or other associated thigns that they probably need to fit in the eco-system, I am happy to add this stuff if you could give me an example or starting point. I looked at the existing ones, but was not able to easilly figure out which ones I need to override, and which ones I could just change a property or otherwise ignore.
And the tests for these.
Dan -
Thanks for taking the time to write up these proposed classes to be added to
Pyparsing.
In the interests of keeping the API small and easy to learn, I have a high
barrier for adding new classes to Pyparsing. In many of my own parsers, I
will create small functions or closures to generate repetitive expressions
or parse actions.
Please look over these alternatives to your proposed new classes, mostly
using variations on parse actions and conditions (newly added in a recent
release):
define some baseline expressions - an integer is a word made of nums, and
an oddnum is an integer that ends with 1, 3, 5 ,7 or 9
integer = Word(nums)
integers = OneOrMore(integer)
oddnum = integer().addCondition(lambda t: t[0][-1] in set('13579'))
CountIn
expr1 = integers()
expr1.addCondition(lambda t: list(t).count(oddnum) == 2)
Count
expr2 = integers()
expr2.addCondition(lambda t: len(t) == 3)
Len
expr3 = integers()
expr3 = locatedExpr(expr3)
expr3.addCondition(lambda t: t[0].locn_end - t[0].locn_start == 5)
expr3.addParseAction(lambda t: t[0].value)
for expr in (expr1, expr2, expr3):
In any event, these feel fairly specialized to me still, so for the moment,
I'm going to hold off on incorporating them into the standard Pyparsing
release. For your application, you might consider making yourself these
little macro functions (note that "expr()" is the new shorthand for
"expr.copy()"):
CountIn = lambda expr, match, n: expr().addCondition(lambda t:
list(t).count(match) == n)
Count = lambda expr, n: expr().addCondition(lambda t: len(t) == n)
Len = lambda expr, n: locatedExpr(expr).addCondition(lambda t: t[0].locn_end
- t[0].locn_start == n).addParseAction(lambda t: t[0].value)
(I'm especially pleased with how easy CountIn is to write, using the
standard count() method of lists to do equality checking, and using the '=='
override that allows you to test the matching of an expression with a
string, to give you the count of tokens that match another parse expression
- in this case, finding the number of odd numbers in a list of matched
integers.)
Len was probably the one that gave me the most trouble, using the
locatedExpr helper, a condition, and a parse action to return back the
original matched tokens. But I would rather work with the actual start and
end locations as the length to be evaluated, rather than running the tokens
together using ''.join().
Thanks for this submission - if you like, I can repackage them in the
Pyparsing examples, as they are a novel and non-trivial use of some of the
newer features in pyparsing.
Regards,
-- Paul
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
I do have a request though, (or more of a suggestion I guess)..,
for the examples / documentation, it woudl be really nice to have a list of the techniques / functions used per example, and possibly an index.of these... sometimes you note them in the descriptions, but other times it just says "A dice roll parser and evaluator for evaluating strings such as "4d20+5.5+4d6.takeHighest(3)".", which would be great if I was trying to figure out how to roll some dice, but not so much in telling me that it has an example of operatorPrecedence and CaselessLiteral in there.
It's not a big thing, but it woudl be nice.
re: "But I would rather work with the actual start and
end locations as the length to be evaluated, rather than running the tokens
together using ''.join()."
I thought about that, but I wanted to account for things like content replacemetns or not measuring .suppress()ed tokens in my measurements.
Good point, this is also a problem for originalTextFor (which I thought of
using for Len instead that goofy locatedExpr mess, but it discards the
originally parsed tokens).
I've gotten a number of suggestions for similar recipes, parse action,
pre-defined expressions (like a Regex for a floating point number). The
itertools module contains a number of recipes in its documentation, maybe I
should capture a bunch of these in an example or the docs. (One user took a
stab at this in the public Pyparsing wiki, but it never got much traction.)
-- Paul
From: Dan Strohl [mailto:dstrohl@users.sf.net]
Sent: Sunday, February 14, 2016 4:49 PM
To: [pyparsing:bugs] 88@bugs.pyparsing.p.re.sf.net
Subject: [pyparsing:bugs] Re: #88 New Parsers (attached)
re: "But I would rather work with the actual start and
end locations as the length to be evaluated, rather than running the tokens
together using ''.join()."
I thought about that, but I wanted to account for things like content
replacemetns or not measuring .suppress()ed tokens in my measurements.
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
Dan -
Thanks for taking the time to write up these proposed classes to be added to
Pyparsing.
In the interests of keeping the API small and easy to learn, I have a high
barrier for adding new classes to Pyparsing. In many of my own parsers, I
will create small functions or closures to generate repetitive expressions
or parse actions.
Please look over these alternatives to your proposed new classes, mostly
using variations on parse actions and conditions (newly added in a recent
release):
and an oddnum is an integer that ends with 1, 3, 5 ,7 or 9
In any event, these feel fairly specialized to me still, so for the moment,
I'm going to hold off on incorporating them into the standard Pyparsing
release. For your application, you might consider making yourself these
little macro functions (note that "expr()" is the new shorthand for
"expr.copy()"):
list(t).count(match) == n)
t[0].locn_end - t[0].locn_start == n).addParseAction(lambda t: t[0].value)
(I'm especially pleased with how easy CountIn is to write, using the
standard count() method of lists to do equality checking, and using the '=='
override that allows you to test the matching of an expression with a
string, to give you the count of tokens that match another parse expression
- in this case, finding the number of odd numbers in a list of matched
integers.)
Len was probably the one that gave me the most trouble, using the
locatedExpr helper, a condition, and a parse action to return back the
original matched tokens. But I would rather work with the actual start and
end locations as the length to be evaluated, rather than running the tokens
together using ''.join().
Thanks for this submission - if you like, I can repackage them in the
Pyparsing examples, as they are a novel and non-trivial use of some of the
newer features in pyparsing.
Regards,
-- Paul
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
Thanks, I didnt see the .addCondition() method, (I was looking for something like that, I thought about using .addAction(), but I was not sure if raising an exception at that point was a good idea.
No problem on not including them, especially since it looks pretty easy to do without these. (I am always a fan of keeping things simple.)
Ok, actually, in looking again, I did see the addCondition, but was not sure how to use it, the docs are pretty light for that method.