Regular expression token class
Brought to you by:
ptmcg
I've written a class called "Regex", which uses the
internal Python "re" module for my own use, and I
believe that it will be useful as a core feature in
pyparsing.
It can offer large performance increases versus trying
to construct regular-expression analogs using the
current classes (like Word, And, ZeroOrMore etc.),
although it can quite happily live alongside the
existing classes.
This is currently a somewhat immature class, and I will
probably be fixing bugs and adding features as needed
(retaining source compatibility as possible).
A patch to pyparsing.py to add the new class
Logged In: YES
user_id=1368227
And here is the code of the class if you prefer it in
non-diff form:
# You will need to "import re" for this to work
class Regex(Token):
"""Token for matching strings that match a given regular
expression.
Defined with string specifying the regular expression
in a form recognized by the inbuilt Python re module.
"""
# Yes - this class is based on code from the Word class
def __init__( self, pattern, flags=0):
"""The parameters pattern and flags are passed to
the re.compile() function as-is. See the Python re module
for an explanation of the acceptable patterns and flags."""
super(Regex,self).__init__()
self.pattern = pattern
self.flags = flags
self.re = re.compile(self.pattern, self.flags)
self.name = _ustr(self)
self.errmsg = "Expected " + self.name
self.myException.msg = self.errmsg
self.mayIndexError = False
def parseImpl( self, instring, loc, doActions=True ):
# Create a buffer object, starting at the currect
location within the input string, for use in the regular
expression pattern matcher
buf = buffer(instring, loc)
result = self.re.match(buf)
if not result:
exc = self.myException
exc.loc = loc
exc.pstr = instring
raise exc
loc += result.end()
return loc, result.group()
def __str__( self ):
try:
return super(Regex,self).__str__()
except:
pass
if self.strRepr is None:
self.strRepr = "Re:(%s)" % self.pattern
return self.strRepr
Updated patch
Logged In: YES
user_id=893320
Very cool! I've considered this option in the past, but
was stuck on how to have the regexp start not at the
beginning of the string, but at the current parsing loc
instead. Nice use of buffer instead of string slice for
this purpose. You've really taken this along pretty well,
I'll be happy to include this in the next version of
pyparsing.
Some questions:
1. You currently return the match.group() from parseImpl.
Have you had any problems with conflicts or surprises from
how the corresponding ParseResults gets built? What
happens if a regexp has named subfields - do they "play
nice" with results names as defined for a Regex
ParseElement?
2. It's been over a month since you submitted this note
(sorry for the delayed response - I thought I was
subscribed to this section, I'll double check with SF).
How has this code held up for you in that time?
3. I am strictly an RE duffer. Could you send me some
test cases that I can include in my regression tests?
Lastly, I can definitely appreciate the potential for
improved speed of these expressions. I've also considered
how at some time I might compile a pyparsing grammar
completely to RE's for evaluating an input string. But
this is definitely going to be an "advanced" usage
category feature of pyparsing - if you don't know what you
are doing with a given regexp, you can easily parse much
more than you had intended for a given expression.
(I will attribute your contribution to "greatred" as I
don't have your actual name - send me an e-mail if you'd
like me to be able to give you a more personal credit -
I'll just post your name, not the e-mail!)
Nice work!
-- Paul
Logged In: YES
user_id=1368227
1. I've not actually looked into result names as such, in
that I don't yet understand what they are. If you mean named
groups in the regex, that's certainly an extra feature that
I can see being very useful, and something that I would like
to see added to the class (although as I had no immediate
need for it at the time it just didn't happen ;). I can't
immediately see how I might serve the intentions of pulling
named groups out from a Regex object as it fits into
pyparsing (or making it "play nicely" in a sensible way) -
my knowledge of the internals of pyparsing is somewhat
limited. (The code for Regex being based off the Word class
- which I felt was the nearest matching code already in
pyparsing)
2. I've not had any suprises from how the class works so
far. It was a class I developed as I needed to make a regex
matcher in order to match some BNF-like grammar more
closely, and thus far it has worked nicely - at least in the
fashion that I have utilised it (which is parsing a CSS
stylesheet based loosely upon the BNF-like grammar from the
CSS spec - which can have some quite complex regular
expressions for matching strings, URLs and so on).
3. I can certainly submit some Regex constructs that I'm
using in my own code if that is useful. I'll have a look at
this presently.
My real name is John Beisley, I've updated my SF profile so
it should show now.
Logged In: YES
user_id=1368227
Okay, here's a little bit of quickly knocked up test code.
It doesn't test the Regex in the larger context of a full
grammar, but rather tests it on a few small things. I've
thrown in a test which doesn't quite work as one might
desire for a named group, but then, nothing has been stated
as to which group should be extracted (there is nothing in
the code to specifically pull out the named groups, as
observed).
Logged In: YES
user_id=1368227
Woops, I've actually attached it this time :)
Test code
Logged In: YES
user_id=893320
Thanks John!
I've just checked your changes into my SVN repository,
plus the unit tests.
I'm going to give the named groups a little more thought,
and see if I can get them to look more native to the
pyparsing ParseResults.
-- Paul
Logged In: YES
user_id=893320
John -
Please check out the latest 1.4 beta1 release, for my
inclusion of your Regex work.
Thanks again!
-- Paul
Logged In: NO
Good stuff, I'll be taking a look at that when I get back to
work after Christmas!
Logged In: YES
user_id=893320
Released in version 1.4
I'm curious to find out what blog system you're utilizing? I'm having some small security problems with my latest blog and I'd like to find something more safe. Do you have any solutions?
cheap north face coats http://jacketsnorthface.overblog.com/