[Pyre2-devel] Re: changes to re2 library
Status: Beta
Brought to you by:
ottrey
From: <ot...@py...> - 2005-04-08 00:11:51
|
On 4/7/2005, "Pierre Barbier de Reuille" <pie...@ci...> wrote: >Well, I really like your solution. It's even better than what I first >thought ! Great. Thanx! >I'll have a look at your latest code and test it a bit ! Its always good to have someone else coming up with use cases, to get completely new and different ideas thrown at it. >You'll have feedback soon (I hope ...) That'd be great. I've also thought some more about how to merge with the re module. As I think (and everyone else seems to think) re2 shouldn't be a separate module. The only problem with it is that the re2.compile() is actually a lot less efficient than the re.compile() function. ( And therefore I don't think re2.compile() should replace re.compile() ) The ineffeciency comes from the fact that re2.compile() actually splits the regular expression up into it's component groups at each level of the group hierarchy. It then compiles a pattern for each node in the hierarchy, substituting nested groups at each node with non-groups '(?:'. It then uses that hierarchy of patterns to extract a hierarchy of results. This is done by recursively matching each node, then passing the results of that match down the hierarchy to be subsequently matched by its children. All this, I'd imagine (as I haven't actually done any performance testing), is going to ~really~ slow things down - probably mostly at the compile() stage, but also during the matching, depending on the data set. Although if a match at the top of the hierarchy fails then subsequent attempts to match the descendants won't be performed - so that will provide ~some~ relief. But I don't think re2 can or should ever be a replacement for re. If it is going to get merged with the re library then I'm thinking it would best slot in as a new compile() method. So some names I have thought up are: - compile2() - recursive_compile() - rcompile() - compile_recursive() - hierarchical_compile() - hcompile() - ... - nested_compile() - nc... - ... >> I've made some changes to the re2 library based on your suggestion. >> >> now it ~actually~ behaves like this: >> >>>>>buf=3D"123 234 345, 123 256, and 123 289" >>>>>regex=3Dr'^(( *\d+)+,)+ *(?P<logic>[^ ]+)(( *\d+)+).*$' >>>>>pat2=3Dre2.compile(regex) >>>>>x=3Dpat2.extract(buf) >>>>>x >> >> {0: [{0: ['123', ' 234', ' 345']}, {0: [' 123', ' 256']}], 2: >> {0: [' 123', ' >> 289']}, 'logic': 'and'} >> >>>>>print x.dump() >> >> --- >> 0: >> - >> 0: >> - '123' >> - ' 234' >> - ' 345' >> - >> 0: >> - ' 123' >> - ' 256' >> 2: >> 0: >> - ' 123' >> - ' 289' >> logic: and >> >> >>>>>x[0] >> >> [{0: ['123', ' 234', ' 345']}, {0: [' 123', ' 256']}] >> >>>>>x[0][0] >> >> {0: ['123', ' 234', ' 345']} >> >>>>>str(x[0][0]) >> >> '123 234 345,' >> >>>>>x[0][1] >> >> {0: [' 123', ' 256']} >> >>>>>str(x[0][1]) >> >> ' 123 256,' >> >>>>>x['logic'] >> >> 'and' >> >>>>>x.logic >> >> 'and' >> >>>>>x[2] >> >> {0: [' 123', ' 289']} >> >>>>>str(x[2]) >> >> ' 123 289' |