Python parsing module / Feature Requests / #2 Regular expression token class

John Beisley - 2005-10-26

A patch to pyparsing.py to add the new class

pyparsing.py.diff

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Beisley - 2005-10-26

Logged In: YES
user_id=1368227

And here is the code of the class if you prefer it in
non-diff form:

# You will need to "import re" for this to work

class Regex(Token):
"""Token for matching strings that match a given regular
expression.
Defined with string specifying the regular expression
in a form recognized by the inbuilt Python re module.
"""
# Yes - this class is based on code from the Word class
def __init__( self, pattern, flags=0):
"""The parameters pattern and flags are passed to
the re.compile() function as-is. See the Python re module
for an explanation of the acceptable patterns and flags."""
super(Regex,self).__init__()
self.pattern = pattern
self.flags = flags

self.re = re.compile(self.pattern, self.flags)

self.name = _ustr(self)
self.errmsg = "Expected " + self.name
self.myException.msg = self.errmsg
self.mayIndexError = False

def parseImpl( self, instring, loc, doActions=True ):
# Create a buffer object, starting at the currect
location within the input string, for use in the regular
expression pattern matcher
buf = buffer(instring, loc)
result = self.re.match(buf)
if not result:
exc = self.myException
exc.loc = loc
exc.pstr = instring
raise exc

loc += result.end()

return loc, result.group()

def __str__( self ):
try:
return super(Regex,self).__str__()
except:
pass

if self.strRepr is None:
self.strRepr = "Re:(%s)" % self.pattern

return self.strRepr

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Beisley - 2005-11-06

Updated patch

pyparsing.py.diff

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul McGuire - 2005-12-07

Logged In: YES
user_id=893320

Very cool! I've considered this option in the past, but
was stuck on how to have the regexp start not at the
beginning of the string, but at the current parsing loc
instead. Nice use of buffer instead of string slice for
this purpose. You've really taken this along pretty well,
I'll be happy to include this in the next version of
pyparsing.

Some questions:
1. You currently return the match.group() from parseImpl.
Have you had any problems with conflicts or surprises from
how the corresponding ParseResults gets built? What
happens if a regexp has named subfields - do they "play
nice" with results names as defined for a Regex
ParseElement?
2. It's been over a month since you submitted this note
(sorry for the delayed response - I thought I was
subscribed to this section, I'll double check with SF).
How has this code held up for you in that time?
3. I am strictly an RE duffer. Could you send me some
test cases that I can include in my regression tests?

Lastly, I can definitely appreciate the potential for
improved speed of these expressions. I've also considered
how at some time I might compile a pyparsing grammar
completely to RE's for evaluating an input string. But
this is definitely going to be an "advanced" usage
category feature of pyparsing - if you don't know what you
are doing with a given regexp, you can easily parse much
more than you had intended for a given expression.

(I will attribute your contribution to "greatred" as I
don't have your actual name - send me an e-mail if you'd
like me to be able to give you a more personal credit -
I'll just post your name, not the e-mail!)

Nice work!
-- Paul

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Beisley - 2005-12-07

Logged In: YES
user_id=1368227

1. I've not actually looked into result names as such, in
that I don't yet understand what they are. If you mean named
groups in the regex, that's certainly an extra feature that
I can see being very useful, and something that I would like
to see added to the class (although as I had no immediate
need for it at the time it just didn't happen ;). I can't
immediately see how I might serve the intentions of pulling
named groups out from a Regex object as it fits into
pyparsing (or making it "play nicely" in a sensible way) -
my knowledge of the internals of pyparsing is somewhat
limited. (The code for Regex being based off the Word class
- which I felt was the nearest matching code already in
pyparsing)

2. I've not had any suprises from how the class works so
far. It was a class I developed as I needed to make a regex
matcher in order to match some BNF-like grammar more
closely, and thus far it has worked nicely - at least in the
fashion that I have utilised it (which is parsing a CSS
stylesheet based loosely upon the BNF-like grammar from the
CSS spec - which can have some quite complex regular
expressions for matching strings, URLs and so on).

3. I can certainly submit some Regex constructs that I'm
using in my own code if that is useful. I'll have a look at
this presently.

My real name is John Beisley, I've updated my SF profile so
it should show now.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Beisley - 2005-12-07

Logged In: YES
user_id=1368227

Okay, here's a little bit of quickly knocked up test code.
It doesn't test the Regex in the larger context of a full
grammar, but rather tests it on a few small things. I've
thrown in a test which doesn't quite work as one might
desire for a named group, but then, nothing has been stated
as to which group should be extracted (there is nothing in
the code to specifically pull out the named groups, as
observed).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Beisley - 2005-12-07

Logged In: YES
user_id=1368227

Woops, I've actually attached it this time :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Beisley - 2005-12-07

Test code

test_code.py

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul McGuire - 2005-12-08

Logged In: YES
user_id=893320

Thanks John!

I've just checked your changes into my SVN repository,
plus the unit tests.

I'm going to give the named groups a little more thought,
and see if I can get them to look more native to the
pyparsing ParseResults.

-- Paul

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul McGuire - 2005-12-22

Logged In: YES
user_id=893320

John -

Please check out the latest 1.4 beta1 release, for my
inclusion of your Regex work.

Thanks again!
-- Paul

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2005-12-22

Logged In: NO

Good stuff, I'll be taking a look at that when I get back to
work after Christmas!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul McGuire - 2006-01-22

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Paul McGuire - 2006-01-22

Logged In: YES
user_id=893320

Released in version 1.4

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2012-11-03

I'm curious to find out what blog system you're utilizing? I'm having some small security problems with my latest blog and I'd like to find something more safe. Do you have any solutions?
cheap north face coats http://jacketsnorthface.overblog.com/

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Regular expression token class

Group

Searches

Help

#2 Regular expression token class

Discussion