[Pyre2-devel] Re: changes to re2 library
Status: Beta
Brought to you by:
ottrey
|
From: <ot...@py...> - 2005-04-08 00:11:51
|
On 4/7/2005, "Pierre Barbier de Reuille" <pie...@ci...>
wrote:
>Well, I really like your solution. It's even better than what I first
>thought !
Great. Thanx!
>I'll have a look at your latest code and test it a bit !
Its always good to have someone else coming up with use cases, to get
completely new and different ideas thrown at it.
>You'll have feedback soon (I hope ...)
That'd be great.
I've also thought some more about how to merge with the re module.
As I think (and everyone else seems to think) re2 shouldn't be a separate
module.
The only problem with it is that the re2.compile() is actually a lot less
efficient than the re.compile() function. ( And therefore I don't think
re2.compile() should replace re.compile() )
The ineffeciency comes from the fact that re2.compile() actually splits
the
regular expression up into it's component groups at each level of the
group hierarchy. It then compiles a pattern for each node in the
hierarchy,
substituting nested groups at each node with non-groups '(?:'.
It then uses that hierarchy of patterns to extract a hierarchy of results.
This is done by recursively matching each node, then passing the results
of
that match down the hierarchy to be subsequently matched by its children.
All this, I'd imagine (as I haven't actually done any performance
testing),
is going to ~really~ slow things down - probably mostly at the compile()
stage, but also during the matching, depending on the data set.
Although if a match at the top of the hierarchy fails then subsequent
attempts to match the descendants won't be performed - so that
will provide ~some~ relief. But I don't think re2 can or should ever be
a
replacement for re.
If it is going to get merged with the re library then I'm thinking it
would
best slot in as a new compile() method.
So some names I have thought up are:
- compile2()
- recursive_compile() - rcompile() - compile_recursive()
- hierarchical_compile() - hcompile() - ...
- nested_compile() - nc... - ...
>> I've made some changes to the re2 library based on your suggestion.
>>
>> now it ~actually~ behaves like this:
>>
>>>>>buf=3D"123 234 345, 123 256, and 123 289"
>>>>>regex=3Dr'^(( *\d+)+,)+ *(?P<logic>[^ ]+)(( *\d+)+).*$'
>>>>>pat2=3Dre2.compile(regex)
>>>>>x=3Dpat2.extract(buf)
>>>>>x
>>
>> {0: [{0: ['123', ' 234', ' 345']}, {0: [' 123', ' 256']}], 2:
>> {0: [' 123', '
>> 289']}, 'logic': 'and'}
>>
>>>>>print x.dump()
>>
>> ---
>> 0:
>> -
>> 0:
>> - '123'
>> - ' 234'
>> - ' 345'
>> -
>> 0:
>> - ' 123'
>> - ' 256'
>> 2:
>> 0:
>> - ' 123'
>> - ' 289'
>> logic: and
>>
>>
>>>>>x[0]
>>
>> [{0: ['123', ' 234', ' 345']}, {0: [' 123', ' 256']}]
>>
>>>>>x[0][0]
>>
>> {0: ['123', ' 234', ' 345']}
>>
>>>>>str(x[0][0])
>>
>> '123 234 345,'
>>
>>>>>x[0][1]
>>
>> {0: [' 123', ' 256']}
>>
>>>>>str(x[0][1])
>>
>> ' 123 256,'
>>
>>>>>x['logic']
>>
>> 'and'
>>
>>>>>x.logic
>>
>> 'and'
>>
>>>>>x[2]
>>
>> {0: [' 123', ' 289']}
>>
>>>>>str(x[2])
>>
>> ' 123 289'
|