pyre2-devel Mailing List for python hierarchical regular expressions
Status: Beta
Brought to you by:
ottrey
You can subscribe to this list here.
2005 |
Jan
|
Feb
|
Mar
|
Apr
(6) |
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
---|
From: Pierre B. de R. <pie...@ci...> - 2005-05-09 15:01:02
|
Ok, I just read your document (sorry for the delay but I couldn't do it before). First, you should make it clear in your PEP that the different points may be discussed separatly (ie. the functions thing and the tree thing). I even wonder if that shouldn't be 2 PEPs ... Otherwise, your document is clear. But it lacks the implementation part. I think you should differ the details this part until you proposed it to the main python-dev list. But do not forget to mention that you implemented it in a python module ! Implementation is very important ! The only point I disagree is about backward compatibility. What you propose does NOT break any existing code (you took great care about that). And you should note it clearly. From what you wrote, one could infer just the opposite. At last, do not forget to send a message to python-dev and follow closely what happens there ;) Pierre -- Pierre Barbier de Reuille INRA - UMR Cirad/Inra/Cnrs/Univ.MontpellierII AMAP Botanique et Bio-informatique de l'Architecture des Plantes TA40/PSII, Boulevard de la Lironde 34398 MONTPELLIER CEDEX 5, France tel : (33) 4 67 61 65 77 fax : (33) 4 67 61 56 68 |
From: Pierre B. de R. <pie...@ci...> - 2005-05-07 07:58:09
|
Ok, just to tell I'm taking a close look on that and I'll report as soon=20 as I'm sure to have understand everything ! (If I don't you'll have=20 questions from me ;) ) Keep up the good work in any case :) Pierre Chris Ottrey a =E9crit : > Hi Gustavo & Pierre, >=20 > I'm about to submit a PEP and was wondering if you guys could look over >=20 > it for me. >=20 > Apart from the PEP itself I've tried to address some other issues. >=20 > One issue is the fact there is so much noise on that python-dev > mailing list that I really can't afford to maintain a subscription > to it. >=20 > As you know I set up this pyre2-dev mailing list to try and de-couple=20 > pyre2 from python-dev. >=20 > I sent a message to python-dev asking people to come over and join us=20 > and the following is a list of email addresses that have since=20 > subscribed: >=20 > [] >=20 > I also told them about the wiki. >=20 > There have only been two entries on the wiki (apart from Gustavo & I). > The first was some wiki graffiti. > The second was from a guy who had written a parser that he thought did > something similar: >=20 > http://www.dalkescientific.com/Martel/ >=20 > And the wiki reports it has only been viewed 200 times (and I'm pretty=20 > sure most of that was from me.) >=20 >=20 > ... Anyway in an attempt to steer python-dev people away from their > mailing list (which I really don't want to have to sift through) and=20 > towards the wiki (where everything can be stored and monitored eg. - > with RSS feeds) I have attempted to piggy-back a wiki on to the PEP. >=20 > Every section in the PEP has a [discuss] link back to a wiki page, > where that section of the PEP can be discussed. >=20 > (I did this by hacking the code that generates a PEP from structured > text.) >=20 > I'm currently hosting the PEP and the piggy-backed wiki at: >=20 > http://pyre2.sourceforge.net/pep.html >=20 >=20 > When you look at the "Abstract" section and click on the [discuss] link > you get sent to the Talk:Abstract discussion wiki page. > And when you click on the 'article" tab in the wiki you get > redirected back to the PEP. >=20 > I'm not sure how robust this will be, eg. when I change headings, and > currently I have to set up those re-directs by hand. > (Actually I've only bothered to do a redirect for the "Abstract".) >=20 > As for the old wiki - I guess I'll try importing it into the new one > at some stage. >=20 > Anyway I was wondering if you guys could take a look and tell me what > you think. >=20 > A. About the PEP itself > and > B. About decoupling the PEP from python-dev >=20 >=20 > Cheers. >=20 > Chris. >=20 > Find local movie times and trailers on Yahoo! Movies. > http://au.movies.yahoo.com >=20 >=20 > ------------------------------------------------------- > This SF.Net email is sponsored by: NEC IT Guy Games. > Get your fingers limbered up and give it your best shot. 4 great events= , 4 > opportunities to win big! Highest score wins.NEC IT Guy Games. Play to > win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=3D20 > _______________________________________________ > Pyre2-devel mailing list > Pyr...@li... > https://lists.sourceforge.net/lists/listinfo/pyre2-devel >=20 --=20 Pierre Barbier de Reuille INRA - UMR Cirad/Inra/Cnrs/Univ.MontpellierII AMAP Botanique et Bio-informatique de l'Architecture des Plantes TA40/PSII, Boulevard de la Lironde 34398 MONTPELLIER CEDEX 5, France tel : (33) 4 67 61 65 77 fax : (33) 4 67 61 56 68 |
From: Chris O. <ot...@ya...> - 2005-05-06 06:12:30
|
Hi Gustavo & Pierre, I'm about to submit a PEP and was wondering if you guys could look over it for me. Apart from the PEP itself I've tried to address some other issues. One issue is the fact there is so much noise on that python-dev mailing list that I really can't afford to maintain a subscription to it. As you know I set up this pyre2-dev mailing list to try and de-couple pyre2 from python-dev. I sent a message to python-dev asking people to come over and join us and the following is a list of email addresses that have since subscribed: [] I also told them about the wiki. There have only been two entries on the wiki (apart from Gustavo & I). The first was some wiki graffiti. The second was from a guy who had written a parser that he thought did something similar: http://www.dalkescientific.com/Martel/ And the wiki reports it has only been viewed 200 times (and I'm pretty sure most of that was from me.) ... Anyway in an attempt to steer python-dev people away from their mailing list (which I really don't want to have to sift through) and towards the wiki (where everything can be stored and monitored eg. - with RSS feeds) I have attempted to piggy-back a wiki on to the PEP. Every section in the PEP has a [discuss] link back to a wiki page, where that section of the PEP can be discussed. (I did this by hacking the code that generates a PEP from structured text.) I'm currently hosting the PEP and the piggy-backed wiki at: http://pyre2.sourceforge.net/pep.html When you look at the "Abstract" section and click on the [discuss] link you get sent to the Talk:Abstract discussion wiki page. And when you click on the 'article" tab in the wiki you get redirected back to the PEP. I'm not sure how robust this will be, eg. when I change headings, and currently I have to set up those re-directs by hand. (Actually I've only bothered to do a redirect for the "Abstract".) As for the old wiki - I guess I'll try importing it into the new one at some stage. Anyway I was wondering if you guys could take a look and tell me what you think. A. About the PEP itself and B. About decoupling the PEP from python-dev Cheers. Chris. Find local movie times and trailers on Yahoo! Movies. http://au.movies.yahoo.com |
From: <ot...@py...> - 2005-04-11 08:32:58
|
Hi Pierre, On 4/8/2005, "Pierre Barbier de Reuille" <pie...@ci...> wrote: >Here is the expression cuasing the bug : > >"(((?:\s*)(?P<num>\d+))+,)+\s*(?P<logic>[^ ]+)((?P<lastnum>\s*\d+)+)" > >.. you cannot compile it with re2 but you can do it with re ! I've eXtreme Program fixed that bug. ie spent ~minimal~ time thinking about and just blindly added a test case for it then made the code work for that test case aswell as all the other ones. ...sure it should be refactored at some stage, but at least the prototype is getting a good work out. ;-) Anyway, I've checked the fix in to the subversion repository (Rev 24). http://pintje.servebeer.com/svn/pyre2/trunk/ Although I'm not ~quite~ sure the expected result for the test case I have created is correct. Would you be able to check it for me? And any other comments? (particularly on the interface) |
From: <ot...@py...> - 2005-04-08 12:12:48
|
Hi Pierre!, On 8/4/2005, "Pierre Barbier de Reuille" <pie...@ci...> wrote: >Well, first feedback is a bit sad :I found a bug :S > >Here is the expression cuasing the bug : > >"(((?:\s*)(?P<num>\d+))+,)+\s*(?P<logic>[^ ]+)((?P<lastnum>\s*\d+)+)" > >.. you cannot compile it with re2 but you can do it with re ! That's not sad! I'm suprised it coped at all with that monster! ;-) I notice from experimenting with that expression myself that the bug must have something to do with the '(?:', I notice when you change it to '(' or '(?P<x>' it works. FYI the re2 code has been hacked together in an eXtreme Programming type of way. ie. think up some unit tests then get the code to pass those tests as quickly and simply as you can - making sure nothing is "over designed". I think true believers in eXtreme Programming also suggest you re-factor the code and make it more efficient on successive iterations, making sure all the unit tests still pass. Ummm... let's just say I haven't got up to that bit yet. ;-) You should look at the nasty bit of code that splits up the expression into its subgroups! (which btw is where that bug you mention will be) (Hey, are you any good with state machines or parsers? You don't want to try your hand at fixing it do you?) >Talking about efficiency: I think you first need a working module, > with good user/developper documentation. Yes. Exactly! > Then, time to optimize ! You'll want=20 > to profile your code and, if needed, code some parts (or all of it) in=20 > another language (most probably C) and export it in Python. But I think=20 > that will be in the future, when the interface and the functionnalities=20 > will have stabilize enough. Yes, true too. But, it's the _method_ that lacks efficiency. When there is no need for nested groups re will always be more efficient than re2. >By the way, if you could tell us as most as possible about the principle=20 >behind your modules ... you did part of it in this mail, but it would=20 >help us tracking bugs or even improving the code ... Will do. Although I first wanted to get the python-dev people interested; tempting them with small, yet intriguing pieces of a puzzle, hopefully allowing them to germinate some ideas, before thrusting upon them some overly verbose attempt at a solution. But I guess that's what PEPs are for. ;-) So BTW, how are you finding that new lack of a '_value' key? eg. >>> m=3Dre2.extract('(?P<x>([a-z]X)+)', 'bXcXdXeX') >>> print m bXcXdXeX >>> m {'x': {0: ['bX', 'cX', 'dX', 'eX']}} >>> print m.dump() --- x: 0: - bX - cX - dX - eX You can't really tell from the dump that 'x' actually has the value 'bXcXdXeX' can you? Perhaps it would be more intuative like this: >>> m {'x': ('bXcXdXeX', {0: ['bX', 'cX', 'dX', 'eX']})} >>> print m.dump() --- x: - bXcXdXeX - 0: - bX - cX - dX - eX I can't, off hand, think of an example where this representation would be ambiguous. Can you? However, the problem with doing it that way, is that it breaks this functionality: >>> m['x'][0][3] =3D=3D 'eX' Maybe the trick is to write our own dump() method. eg. so the output looks like this: >>> print m.dump() --- x (bXcXdXeX): 0: - bX - cX - dX - eX BTW, I've also been doing some background research: http://py.redsoft.be/pyre2/wiki/index.php?title=3DBackground_research I even notice someone started writing a PEP very similar to this last year. Chris. |
From: Pierre B. de R. <pie...@ci...> - 2005-04-08 10:22:28
|
Well, first feedback is a bit sad :I found a bug :S Here is the expression cuasing the bug : "(((?:\s*)(?P<num>\d+))+,)+\s*(?P<logic>[^ ]+)((?P<lastnum>\s*\d+)+)" .. you cannot compile it with re2 but you can do it with re ! Talking about efficiency: I think you first need a working module, with=20 good user/developper documentation. Then, time to optimize ! You'll want=20 to profile your code and, if needed, code some parts (or all of it) in=20 another language (most probably C) and export it in Python. But I think=20 that will be in the future, when the interface and the functionnalities=20 will have stabilize enough. By the way, if you could tell us as most as possible about the principle=20 behind your modules ... you did part of it in this mail, but it would=20 help us tracking bugs or even improving the code ... Pierre ot...@py... a =E9crit : > On 4/7/2005, "Pierre Barbier de Reuille" <pie...@ci...> > wrote: >=20 >>Well, I really like your solution. It's even better than what I first >>thought ! >=20 >=20 > Great. Thanx! >=20 >=20 >>I'll have a look at your latest code and test it a bit ! >=20 >=20 > Its always good to have someone else coming up with use cases, to get > completely new and different ideas thrown at it. >=20 >=20 >>You'll have feedback soon (I hope ...) >=20 >=20 > That'd be great. >=20 >=20 > I've also thought some more about how to merge with the re module. > As I think (and everyone else seems to think) re2 shouldn't be a separa= te > module. >=20 > The only problem with it is that the re2.compile() is actually a lot le= ss > efficient than the re.compile() function. ( And therefore I don't thin= k > re2.compile() should replace re.compile() ) >=20 > The ineffeciency comes from the fact that re2.compile() actually splits > the > regular expression up into it's component groups at each level of the > group hierarchy. It then compiles a pattern for each node in the > hierarchy, > substituting nested groups at each node with non-groups '(?:'. > It then uses that hierarchy of patterns to extract a hierarchy of resul= ts. > This is done by recursively matching each node, then passing the result= s > of > that match down the hierarchy to be subsequently matched by its childre= n. >=20 > All this, I'd imagine (as I haven't actually done any performance > testing), > is going to ~really~ slow things down - probably mostly at the compile(= ) > stage, but also during the matching, depending on the data set. >=20 > Although if a match at the top of the hierarchy fails then subsequent > attempts to match the descendants won't be performed - so that > will provide ~some~ relief. But I don't think re2 can or should ever b= e > a > replacement for re. >=20 > If it is going to get merged with the re library then I'm thinking it > would > best slot in as a new compile() method. >=20 > So some names I have thought up are: > - compile2() > - recursive_compile() - rcompile() - compile_recursive() > - hierarchical_compile() - hcompile() - ... > - nested_compile() - nc... - ... >=20 >=20 >=20 >=20 >>>I've made some changes to the re2 library based on your suggestion. >>> >>> now it ~actually~ behaves like this: >>> >>> >>>>>>buf=3D"123 234 345, 123 256, and 123 289" >>>>>>regex=3Dr'^(( *\d+)+,)+ *(?P<logic>[^ ]+)(( *\d+)+).*$' >>>>>>pat2=3Dre2.compile(regex) >>>>>>x=3Dpat2.extract(buf) >>>>>>x >>> >>>{0: [{0: ['123', ' 234', ' 345']}, {0: [' 123', ' 256']}], 2: >>>{0: [' 123', ' >>>289']}, 'logic': 'and'} >>> >>> >>>>>>print x.dump() >>> >>>--- >>>0: >>> - >>> 0: >>> - '123' >>> - ' 234' >>> - ' 345' >>> - >>> 0: >>> - ' 123' >>> - ' 256' >>>2: >>> 0: >>> - ' 123' >>> - ' 289' >>>logic: and >>> >>> >>> >>>>>>x[0] >>> >>>[{0: ['123', ' 234', ' 345']}, {0: [' 123', ' 256']}] >>> >>> >>>>>>x[0][0] >>> >>>{0: ['123', ' 234', ' 345']} >>> >>> >>>>>>str(x[0][0]) >>> >>>'123 234 345,' >>> >>> >>>>>>x[0][1] >>> >>>{0: [' 123', ' 256']} >>> >>> >>>>>>str(x[0][1]) >>> >>>' 123 256,' >>> >>> >>>>>>x['logic'] >>> >>>'and' >>> >>> >>>>>>x.logic >>> >>>'and' >>> >>> >>>>>>x[2] >>> >>>{0: [' 123', ' 289']} >>> >>> >>>>>>str(x[2]) >>> >>>' 123 289' >=20 >=20 --=20 Pierre Barbier de Reuille INRA - UMR Cirad/Inra/Cnrs/Univ.MontpellierII AMAP Botanique et Bio-informatique de l'Architecture des Plantes TA40/PSII, Boulevard de la Lironde 34398 MONTPELLIER CEDEX 5, France tel : (33) 4 67 61 65 77 fax : (33) 4 67 61 56 68 |
From: <ot...@py...> - 2005-04-08 00:11:51
|
On 4/7/2005, "Pierre Barbier de Reuille" <pie...@ci...> wrote: >Well, I really like your solution. It's even better than what I first >thought ! Great. Thanx! >I'll have a look at your latest code and test it a bit ! Its always good to have someone else coming up with use cases, to get completely new and different ideas thrown at it. >You'll have feedback soon (I hope ...) That'd be great. I've also thought some more about how to merge with the re module. As I think (and everyone else seems to think) re2 shouldn't be a separate module. The only problem with it is that the re2.compile() is actually a lot less efficient than the re.compile() function. ( And therefore I don't think re2.compile() should replace re.compile() ) The ineffeciency comes from the fact that re2.compile() actually splits the regular expression up into it's component groups at each level of the group hierarchy. It then compiles a pattern for each node in the hierarchy, substituting nested groups at each node with non-groups '(?:'. It then uses that hierarchy of patterns to extract a hierarchy of results. This is done by recursively matching each node, then passing the results of that match down the hierarchy to be subsequently matched by its children. All this, I'd imagine (as I haven't actually done any performance testing), is going to ~really~ slow things down - probably mostly at the compile() stage, but also during the matching, depending on the data set. Although if a match at the top of the hierarchy fails then subsequent attempts to match the descendants won't be performed - so that will provide ~some~ relief. But I don't think re2 can or should ever be a replacement for re. If it is going to get merged with the re library then I'm thinking it would best slot in as a new compile() method. So some names I have thought up are: - compile2() - recursive_compile() - rcompile() - compile_recursive() - hierarchical_compile() - hcompile() - ... - nested_compile() - nc... - ... >> I've made some changes to the re2 library based on your suggestion. >> >> now it ~actually~ behaves like this: >> >>>>>buf=3D"123 234 345, 123 256, and 123 289" >>>>>regex=3Dr'^(( *\d+)+,)+ *(?P<logic>[^ ]+)(( *\d+)+).*$' >>>>>pat2=3Dre2.compile(regex) >>>>>x=3Dpat2.extract(buf) >>>>>x >> >> {0: [{0: ['123', ' 234', ' 345']}, {0: [' 123', ' 256']}], 2: >> {0: [' 123', ' >> 289']}, 'logic': 'and'} >> >>>>>print x.dump() >> >> --- >> 0: >> - >> 0: >> - '123' >> - ' 234' >> - ' 345' >> - >> 0: >> - ' 123' >> - ' 256' >> 2: >> 0: >> - ' 123' >> - ' 289' >> logic: and >> >> >>>>>x[0] >> >> [{0: ['123', ' 234', ' 345']}, {0: [' 123', ' 256']}] >> >>>>>x[0][0] >> >> {0: ['123', ' 234', ' 345']} >> >>>>>str(x[0][0]) >> >> '123 234 345,' >> >>>>>x[0][1] >> >> {0: [' 123', ' 256']} >> >>>>>str(x[0][1]) >> >> ' 123 256,' >> >>>>>x['logic'] >> >> 'and' >> >>>>>x.logic >> >> 'and' >> >>>>>x[2] >> >> {0: [' 123', ' 289']} >> >>>>>str(x[2]) >> >> ' 123 289' |
From: <ot...@py...> - 2005-04-05 05:07:18
|
I've checked in the new compile functions change. ie. The code no longer tries to magically search for the functions in the calling frame - now they must be explicitly passed in to the compile() function as keyword arguments. Which IMO is a lot easier for the user to understand and therefore trust and use. So now functions (until we think up a better name for "functions") are passed in as keyword arguments to compile() like this: >>> from string import upper >>> buf=3D'0x0C drummers drumming, 0x0B pipers piping, 0x0A lords a-leaping' >>> regex=3D'(?P<verse>(?P<number:int_16>0x[\dA-F]+) \ ... (?P<activity:upper>[^,]+))(,)?' >>> pat=3Dre.compile(regex, int_16=3Dlambda x: int(x, 16), upper=3Dupper) >>> pat.extract(buf) [{'number': 12, 'activity': 'DRUMMERS DRUMMING'}, {'number': 11, 'activity': 'PIPERS PIPING'}, {'number': 10, 'activity': 'LORDS A-LEAPING'}] (BTW I'm not ~requiring~ builtin functions like "int()" to be passed in, they are still being retrieved from __builtins__.) I've also updated the wiki and the sourceforge website. I wont bother making a new sourceforge release because A: the sourceforge file release system is a major pain to work with and B: no one seems to have downloaded this "work in progress" yet and anyway. ;-) Chris. |
From: <ot...@py...> - 2005-04-04 06:22:31
|
[http://pyre2.sourceforge.net/ Home Page at Source Forge] ([http://sourceforge.net/projects/pyre2 Project Page]) [http://pintje.servebeer.com/svn/pyre2/trunk/ SVN Repository] [http://lists.sourceforge.net/lists/listinfo/pyre2-devel Mail List] ([http://sourceforge.net/mailarchive/forum.php?forum=3D3D3Dpyre2-devel Archives]) |