[Pyre2-devel] Re: changes to re2 library

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Pierre!,

On 8/4/2005, "Pierre Barbier de Reuille" <pie...@ci...>
wrote:

>Well, first feedback is a bit sad :I found a bug :S
>
>Here is the expression cuasing the bug :
>
>"(((?:\s*)(?P<num>\d+))+,)+\s*(?P<logic>[^ ]+)((?P<lastnum>\s*\d+)+)"
>
>.. you cannot compile it with re2 but you can do it with re !

That's not sad!  I'm suprised it coped at all with that monster!  ;-)

I notice from experimenting with that expression myself that the bug must
have something to do with the '(?:', I notice when you change it to
'(' or
'(?P<x>' it works.

FYI the re2 code has been hacked together in an eXtreme Programming type
of way.
ie. think up some unit tests then get the code to pass those tests as
quickly
and simply as you can - making sure nothing is "over designed".

I think true believers in eXtreme Programming also suggest you re-factor
the
code and make it more efficient on successive iterations, making sure all
the
unit tests still pass.

Ummm... let's just say I haven't got up to that bit yet.  ;-)

You should look at the nasty bit of code that splits up the expression
into its subgroups!  (which btw is where that bug you mention will be)
(Hey, are you any good with state machines or parsers?  You don't want
to try
your hand at fixing it do you?)

>Talking about efficiency: I think you first need a working module,
> with good user/developper documentation.

Yes.  Exactly!

> Then, time to optimize ! You'll want=20
> to profile your code and, if needed, code some parts (or all of it) in=20
> another language (most probably C) and export it in Python. But I think=20
> that will be in the future, when the interface and the functionnalities=20
> will have stabilize enough.

Yes, true too.
But, it's the _method_ that lacks efficiency.  When there is no need for
nested groups re will always be more efficient than re2.

>By the way, if you could tell us as most as possible about the principle=20
>behind your modules ... you did part of it in this mail, but it would=20
>help us tracking bugs or even improving the code ...

Will do.
Although I first wanted to get the python-dev people interested; tempting
them with small, yet intriguing pieces of a puzzle, hopefully allowing
them
to germinate some ideas, before thrusting upon them some overly verbose
attempt at a solution.

But I guess that's what PEPs are for.  ;-)

So BTW, how are you finding that new lack of a '_value' key?

eg.
>>> m=3Dre2.extract('(?P<x>([a-z]X)+)', 'bXcXdXeX')
>>> print m
bXcXdXeX
>>> m
{'x': {0: ['bX', 'cX', 'dX', 'eX']}}
>>> print m.dump()
---
x:
    0:
        - bX
        - cX
        - dX
        - eX

You can't really tell from the dump that 'x' actually has the value
'bXcXdXeX' can you?

Perhaps it would be more intuative like this:

>>> m
{'x': ('bXcXdXeX', {0: ['bX', 'cX', 'dX', 'eX']})}
>>> print m.dump()
---
x:
    - bXcXdXeX
    -
        0:
            - bX
            - cX
            - dX
            - eX

I can't, off hand, think of an example where this representation would be
ambiguous.  Can you?

However, the problem with doing it that way, is that it breaks this
functionality:

>>> m['x'][0][3] =3D=3D 'eX'

Maybe the trick is to write our own dump() method.
eg. so the output looks like this:

>>> print m.dump()
---
x (bXcXdXeX):
    0:
        - bX
        - cX
        - dX
        - eX

BTW, I've also been doing some background research:

  http://py.redsoft.be/pyre2/wiki/index.php?title=3DBackground_research

I even notice someone started writing a PEP very similar to this last
year.

Chris.