From: Lionel B. <lio...@bo...> - 2005-02-15 16:35:26
|
Michel Bouissou wrote the following on 15.02.2005 16:48 : >Le Mardi 15 F=E9vrier 2005 16:31, Lionel Bouton a =E9crit : > =20 > >>Thanks, I'm worried about the size of the regexp though. There are two >>things on my mind : >>- is it maintainable ? >> =20 >> > >I don't think it will need much maintenance. It's based on a (yet more=20 >complex ;-) > Even more ! > regexp I have built over years, and that very seldom needs=20 >changes -- and the changes are improvements that are not strictly speaki= ng=20 >necessary nor urgent. > >Maintaining such a regexp is not that complex if you are careful ;-)=20 >especially about line breaks if you split it into several lines (it seem= s=20 >that an escaped line break should NOT be put after a ) or } or ? or the=20 >regexp won't work. I limit myself to splitting after "regular characters= " and=20 >before a "|". > =20 > I see. > =20 > >>- how much processing time is needed for these regexp ? >> =20 >> > >Given that we just process a short hostname and not a long file, and giv= en=20 >that Perl will compile the regexp only once except for the one that cont= ains=20 >part of the IP as a variable, I believe the processing time should be=20 >negligible (compared to the database accesses etc.) > =20 > Regexp can be both really quick and slow. I've not yet enough experience=20 with perl regexps to know only with a quick look at a regexp if perl=20 would handle hundreds of thousands of match/second or just hundreds/secon= d. > =20 > >>I'd like to add this as a separate algorithm and put the regexp in >>external files that can be reloaded >> =20 >> > >I would hardcode this. I expect very little changes to this, if any. Loa= ding=20 >the regexps from external files would make this still more complex and=20 >subject to errors... > =20 > I'd prefer to have if ($fqdn =3D~ $known_server_patter) ... and so on. than the full regexp in the code ! The accidental keypress in the middle=20 of the regexp could have unforseen consequences and would be hard to=20 spot without a cvs diff, but the keypress in the middle of a var name is=20 an instant blocker with an obvious error message leading to a painless=20 resolution. Editing the regexp file would be less error-prone in my opinion. Loading regexps from file isn't really so complex. > [...] > >>I'll probably start the 1.5.x branch for this new algorithm. >> =20 >> > >Meanwhile, you can test it on your own system, I don't think you'll noti= ce any=20 >performance impact, but it will probably be more accurate that the basic= IP=20 >address test (see my last post with some examples...) > =20 > I won't notice any perf difference. Installations handling more than a=20 million mail per day are worrying me though. I'll bench the code to see how many lines per second these regexp can=20 handle on my systems, hard numbers are usually more convincing to me=20 with things as complex as regexpes. Lionel. |