From: Michel B. <mi...@bo...> - 2005-02-15 14:38:24
Attachments:
sqlgrey-1.4.4-smartsmart.patch
|
And here comes the Big Hideous Ugly Regexp ;-) The attached patch makes SQLgrey's smart decisions much smarter in deciding if C-Class or complete IP should be used for a given client. It's heuristic... So imperfect, but my tests show it gives much more accurate results compared to the original simpler algorithm. (Using a "debug" loglevel will show the decision rules that trigger) Give it a try ;-) Cheers. -- Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-15 14:51:32
Attachments:
sqlgrey-1.4.4-smartsmart.patch
|
Le Mardi 15 F=E9vrier 2005 15:38, Michel Bouissou a =E9crit : > And here comes the Big Hideous Ugly Regexp ;-) > > The attached patch makes SQLgrey's smart decisions much smarter in deci= ding > if C-Class or complete IP should be used for a given client. Oops, sorry, there was a little issue with the way the regexp were split = into=20 several lines. The attached patch replaces the previous one, and fixes the issue. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Lionel B. <lio...@bo...> - 2005-02-15 15:32:00
|
Michel Bouissou wrote the following on 15.02.2005 15:51 : >Le Mardi 15 F=E9vrier 2005 15:38, Michel Bouissou a =E9crit : > =20 > >>And here comes the Big Hideous Ugly Regexp ;-) >> >>The attached patch makes SQLgrey's smart decisions much smarter in deci= ding >>if C-Class or complete IP should be used for a given client. >> =20 >> > >Oops, sorry, there was a little issue with the way the regexp were split= into=20 >several lines. >The attached patch replaces the previous one, and fixes the issue. > =20 > Thanks, I'm worried about the size of the regexp though. There are two=20 things on my mind : - is it maintainable ? - how much processing time is needed for these regexp ? I'd like to add this as a separate algorithm and put the regexp in=20 external files that can be reloaded (ie: like the whitelists, being=20 updated from a repository). Being a switch, the second potential problem=20 wouldn't actually be one. I've not access on enough maillog to test=20 these regexps on each update. Are you willing to maintain them ? This=20 will solve the first potential problem for me :-) I'll probably start the 1.5.x branch for this new algorithm. Lionel. |
From: Michel B. <mi...@bo...> - 2005-02-15 15:39:57
|
Le Mardi 15 F=E9vrier 2005 15:38, Michel Bouissou a =E9crit : > > It's heuristic... So imperfect, but my tests show it gives much more > accurate results compared to the original simpler algorithm. Here is an example of a series of hostnames/addresses that the original=20 SQLgrey would take as "Class C" (for they don't have the end of their IP=20 address in their hostname), and my patch will consider "dynamic / end-use= r"=20 machines, and thus use the full IP address : 0x503ead5c.bynxx8.adsl-dhcp.tele.dk[80.62.173.92] 0xd5aae2a2.dhcp.kabelnettet.dk[213.170.226.162] ACB021F3.ipt.aol.com[172.176.33.243] ACB296F3.ipt.aol.com[172.178.150.243] ACB59FAD.ipt.aol.com[172.181.159.173] ACB7FAB2.ipt.aol.com[172.183.250.178] ACC8EBB5.ipt.aol.com[172.200.235.181] adsl21118.estpak.ee[80.235.8.154] adsl-dc-36cd0.adsl.wanadoo.nl[83.118.10.208] asd-z-efb1.adsl.wanadoo.nl[81.69.13.177] c8a65c72.bhz.virtua.com.br[200.166.92.114] c8fbc881.bhz.virtua.com.br[200.251.200.129] c906651b.virtua.com.br[201.6.101.27] c906712d.virtua.com.br[201.6.113.45] c90688a5.virtua.com.br[201.6.136.165] c906e92a.virtua.com.br[201.6.233.42] c9110cb5.rjo.virtua.com.br[201.17.12.181] catv-5062ae15.catv.broadband.hu[80.98.174.21] catv-506315e6.catv.broadband.hu[80.99.21.230] catv-d5de9650.catv.broadband.hu[213.222.150.80] cbl-il8-48.casscabletv.com[12.163.48.49] cc550873-a.hnglo1.ov.home.nl[217.122.248.216] cp346637-a.tilbu1.nb.home.nl[84.24.101.229] cp427353-a.tilbu1.nb.home.nl[84.24.100.163] cp644250-a.venlo1.lb.home.nl[84.29.41.12] CPE00055df38a0c-CM00407b87707e.cpe.net.cable.rogers.com[69.197.247.61] CPE00062930c118-CM014090206357.cpe.net.cable.rogers.com[24.100.251.82] CPE0008a12a42eb-CM400047235173.cpe.net.cable.rogers.com[24.100.193.230] CPE000ae6a33a8c-CM000a735f750d.cpe.net.cable.rogers.com[69.198.32.123] CPE000cf1727e77-CM0012c90feac2.cpe.net.cable.rogers.com[69.193.9.232] CPE0010dc418a71-CM012059934437.cpe.net.cable.rogers.com[24.156.89.186] CPE00402b4b28ed-CM00080d53844c.cpe.net.cable.rogers.com[69.194.52.136] CPE0080c6eaa3d6-CM013359900259.cpe.net.cable.rogers.com[24.156.43.203] CPE0080c8b37441-CM0012250232ca.cpe.net.cable.rogers.com[69.197.38.226] dial-369.lodz.dialog.net.pl[62.87.196.113] dsl81-214-11370.adsl.ttnet.net.tr[81.214.44.106] dsl81-214-11652.adsl.ttnet.net.tr[81.214.45.132] dsl81-214-11805.adsl.ttnet.net.tr[81.214.46.29] dsl81-214-12018.adsl.ttnet.net.tr[81.214.46.242] dsl81-214-29753.adsl.ttnet.net.tr[81.214.116.57] dsl81-214-39833.adsl.ttnet.net.tr[81.214.155.153] dsl81-215-21703.adsl.ttnet.net.tr[81.215.84.199] dsl81-215-24770.adsl.ttnet.net.tr[81.215.96.194] dsl81-215-29615.adsl.ttnet.net.tr[81.215.115.175] dsl81-215-30444.adsl.ttnet.net.tr[81.215.118.236] dsl81-215-30614.adsl.ttnet.net.tr[81.215.119.150] dsl81-215-30905.adsl.ttnet.net.tr[81.215.120.185] dsl81-215-4517.adsl.ttnet.net.tr[81.215.17.165] dsl81-215-53937.adsl.ttnet.net.tr[81.215.210.177] dsl81-215-54155.adsl.ttnet.net.tr[81.215.211.139] dsl81-215-54618.adsl.ttnet.net.tr[81.215.213.90] dsl81-215-55123.adsl.ttnet.net.tr[81.215.215.83] dsl81-215-62680.adsl.ttnet.net.tr[81.215.244.216] dsl-hmlgw1he3.dial.inet.fi[80.220.227.227] dsl-mm224.ez-net.com[65.172.188.109] gv-vb-3c9f.adsl.wanadoo.nl[212.129.188.159] h00061bda6f73.ne.client2.attbi.com[24.218.168.78] h00095b733a11.ne.client2.attbi.com[65.96.237.21] h00095b733a11.ne.client2.attbi.com[65.96.239.10] h00096b197238.ne.client2.attbi.com[24.61.250.89] h000ea69e7af4.ne.client2.attbi.com[24.128.63.191] h0011110f7ab3.ne.client2.attbi.com[24.128.58.142] h00118018fafb.ne.client2.attbi.com[24.128.215.26] h0040ca40da7b.ne.client2.attbi.com[24.91.82.249] h00e06fbe43f5.ne.client2.attbi.com[66.31.5.211] kf-sdm-tg06-0727.dial.kabelfoon.nl[62.45.194.216] lbc9-d9ba927f.pool.mediaWays.net[217.186.146.127] lbc9-d9ba9291.pool.mediaWays.net[217.186.146.145] lbc9-d9ba92ac.pool.mediaWays.net[217.186.146.172] lbck-d9b886b3.pool.mediaWays.net[217.184.134.179] lls-c-1e8a1.adsl.wanadoo.nl[81.70.6.161] mstr195175-28763.dial-in.ttnet.net.tr[195.175.192.92] mstr195175-28807.dial-in.ttnet.net.tr[195.175.192.136] mstr195175-29523.dial-in.ttnet.net.tr[195.175.195.84] mstr195175-29524.dial-in.ttnet.net.tr[195.175.195.85] mstr195175-29617.dial-in.ttnet.net.tr[195.175.195.178] mstr195175-30261.dial-in.ttnet.net.tr[195.175.198.54] mstr195175-30277.dial-in.ttnet.net.tr[195.175.198.70] mstr195175-30294.dial-in.ttnet.net.tr[195.175.198.87] mstr195175-30405.dial-in.ttnet.net.tr[195.175.198.198] mstr195175-30425.dial-in.ttnet.net.tr[195.175.198.218] Ottawa-HSE-ppp4085231.sympatico.ca[70.49.34.242] oxfo-dhcp-ws-186.dsl.maqs.net[66.187.40.187] oxford-dsl-26.swnebr.net[69.2.6.155] p3E9E4915.dip.t-dialin.net[62.158.73.21] p3EE0A92D.dip.t-dialin.net[62.224.169.45] p50823AF8.dip.t-dialin.net[80.130.58.248] p5082557F.dip0.t-ipconnect.de[80.130.85.127] p50837D68.dip0.t-ipconnect.de[80.131.125.104] p50837F3A.dip0.t-ipconnect.de[80.131.127.58] p50923A15.dip.t-dialin.net[80.146.58.21] p54856CB1.dip.t-dialin.net[84.133.108.177] p54857A2B.dip.t-dialin.net[84.133.122.43] p5485901E.dip0.t-ipconnect.de[84.133.144.30] p548C8DE6.dip.t-dialin.net[84.140.141.230] pcp0010439227pcs.parads01.nm.comcast.net[68.35.123.64] pcp0010846152pcs.essex01.md.comcast.net[68.48.130.57] pcp0011134165pcs.neave01.pa.comcast.net[69.248.43.94] pcp0011537562pcs.aboit01.in.comcast.net[69.245.133.123] pcp01934410pcs.nhaven01.ct.comcast.net[68.63.87.156] pcp02171061pcs.brghtn01.mi.comcast.net[68.43.207.125] pcp02861817pcs.flrnc01.al.comcast.net[68.62.232.163] pcp03267103pcs.waldlk01.mi.comcast.net[68.60.178.140] pcp03766529pcs.montvl01.pa.comcast.net[68.34.242.208] pcp03822180pcs.clintn01.ct.comcast.net[68.46.210.153] pcp04591547pcs.harimn01.tn.comcast.net[68.47.189.215] pcp08020054pcs.dalect01.va.comcast.net[68.48.155.156] pcp08774927pcs.mtlrel01.nj.comcast.net[68.36.36.37] pcp08935357pcs.trentn01.nj.comcast.net[69.141.145.106] pcp09003316pcs.spedwy01.in.comcast.net[68.58.13.174] pcp09946743pcs.hyatsv01.md.comcast.net[69.140.15.206] pD9503DE4.dip.t-dialin.net[217.80.61.228] pD9521AC6.dip.t-dialin.net[217.82.26.198] pD952496A.dip.t-dialin.net[217.82.73.106] pD95256B2.dip.t-dialin.net[217.82.86.178] pD95314D1.dip.t-dialin.net[217.83.20.209] pD9542A7A.dip.t-dialin.net[217.84.42.122] pD954ADEB.dip.t-dialin.net[217.84.173.235] pD955E999.dip.t-dialin.net[217.85.233.153] pD95731A6.dip0.t-ipconnect.de[217.87.49.166] pD958902E.dip.t-dialin.net[217.88.144.46] pD9589A3B.dip.t-dialin.net[217.88.154.59] pD95EEF12.dip.t-dialin.net[217.94.239.18] pD95F43DE.dip0.t-ipconnect.de[217.95.67.222] pD95F46CE.dip0.t-ipconnect.de[217.95.70.206] pD95F4828.dip0.t-ipconnect.de[217.95.72.40] pD95F484B.dip0.t-ipconnect.de[217.95.72.75] pD9E182A1.dip.t-dialin.net[217.225.130.161] pD9E44064.dip.t-dialin.net[217.228.64.100] pD9E488B2.dip.t-dialin.net[217.228.136.178] pD9E59AA3.dip.t-dialin.net[217.229.154.163] pD9FE2D54.dip0.t-ipconnect.de[217.254.45.84] poctnt-1-235.dialup.enter.net[216.193.169.15] ppp2582.hakata01.bbiq.jp[210.203.194.42] rt-z-23c40.adsl.wanadoo.nl[81.70.90.64] S01060000b4921e35.cg.shawcable.net[68.145.237.24] S01060001023fd7dc.cg.shawcable.net[68.144.193.49] S01060004e20311de.ed.shawcable.net[68.149.226.216] S010600055d29d6f0.vc.shawcable.net[24.85.71.78] S010600065b1cf9d8.vc.shawcable.net[24.86.104.115] S0106000795aeb64d.vc.shawcable.net[24.80.152.113] S01060008a10ccf19.vf.shawcable.net[70.68.209.121] S01060008a10ccf19.vf.shawcable.net[70.68.244.221] S01060008a11e94cc.ed.shawcable.net[68.149.249.248] S0106000b6a93aadb.vc.shawcable.net[24.84.40.203] S0106000bdb0e2be7.ok.shawcable.net[24.70.174.225] S0106000c7615dd58.wp.shawcable.net[24.77.235.208] S01060010a4991948.cg.shawcable.net[68.146.151.175] S010600112f46d19f.vc.shawcable.net[24.83.7.214] S01060040ca4003c4.wp.shawcable.net[24.77.99.183] S01060060b0a3cd95.ed.shawcable.net[68.150.64.140] S01060080c6ee43f8.vs.shawcable.net[24.84.102.153] S01060080c876a2df.vc.shawcable.net[24.83.21.154] S01060080c8e2e5a7.ed.shawcable.net[68.149.0.81] S01060090f52ad732.ed.shawcable.net[68.150.31.145] S010600d0b7c4df4a.ed.shawcable.net[68.151.5.108] user-0c6t3il.cable.mindspring.com[24.110.142.85] user-0c8htoa.cable.mindspring.com[24.136.247.10] user-0ccstki.cable.mindspring.com[24.206.118.146] user-0cdveau.cable.mindspring.com[24.223.185.94] user-0cetrmf.cable.mindspring.com[24.238.238.207] user-10lf4ta.cable.mindspring.com[65.87.147.170] user-10lf97p.cable.mindspring.com[65.87.164.249] user-12hc73a.cable.mindspring.com[69.22.28.106] user-12hcc0d.cable.mindspring.com[69.22.48.13] user-12hcqi5.cable.mindspring.com[69.22.106.69] user-12hcqph.cable.mindspring.com[69.22.107.49] user-12l2ttt.cable.mindspring.com[69.81.119.189] user-12ldaom.cable.mindspring.com[69.86.171.22] user-12lm16t.cable.mindspring.com[69.91.4.221] user-12lm33c.cable.mindspring.com[69.91.12.108] user-12lmp25.cable.mindspring.com[69.91.100.69] xdsl-2836.walbrzych.dialog.net.pl[84.40.186.20] xdsl-4689.zgora.dialog.net.pl[84.40.166.81] xdsl-4875.wroclaw.dialog.net.pl[84.40.128.11] xdsl-7183.wroclaw.dialog.net.pl[84.40.137.15] --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Lionel B. <lio...@bo...> - 2005-02-15 16:18:34
|
Michel Bouissou wrote the following on 15.02.2005 16:39 : >Le Mardi 15 F=E9vrier 2005 15:38, Michel Bouissou a =E9crit : > =20 > >>It's heuristic... So imperfect, but my tests show it gives much more >>accurate results compared to the original simpler algorithm. >> =20 >> > >Here is an example of a series of hostnames/addresses that the original=20 >SQLgrey would take as "Class C" (for they don't have the end of their IP= =20 >address in their hostname), and my patch will consider "dynamic / end-us= er"=20 >machines, and thus use the full IP address : > =20 > For comparison : on the same sample how many addresses aren't recognized=20 as "dynamic / end-user" by the regexps but are by the smartc algo ?=20 What's the total recognized by one of them. This way we'll have an idea=20 of the % of improvement. |
From: Michel B. <mi...@bo...> - 2005-02-15 16:31:15
|
Le Mardi 15 F=E9vrier 2005 17:18, Lionel Bouton a =E9crit : > > > >Here is an example of a series of hostnames/addresses that the origina= l > >SQLgrey would take as "Class C" (for they don't have the end of their = IP > >address in their hostname), and my patch will consider "dynamic / > > end-user" machines, and thus use the full IP address : > > For comparison : on the same sample how many addresses aren't recognize= d > as "dynamic / end-user" by the regexps but are by the smartc algo ? > What's the total recognized by one of them. This way we'll have an idea > of the % of improvement. I don't have total figures and percentages on hand, but I can say that: 1/ All the entries that are recognized by the original smartc algo are al= so=20 recognized byt the regexps, except for situations where the original algo= =20 could make mistakes for some mailservers that would have part of their IP= in=20 their name, and that the regexp would properly recognize as mailservers=20 (Class C). I've already seen such cases with some mailserver pools that p= ut=20 the IP of the server as part of its name, such an example would be : mxpool10-123.231.bigisp.com [10.10.123.231] Here the original code would mistake, but not the my regexp series (that = tries=20 to identify mailservers first). 2/ The original code misses real "big players" end-user networks, such as= AOL=20 (example: ACB296F3.ipt.aol.com[172.178.150.243]) or cable.rogers.com=20 (example:=20 CPE00055df38a0c-CM00407b87707e.cpe.net.cable.rogers.com[69.197.247.61]), = or=20 AT&T (example: h00095b733a11.ne.client2.attbi.com[65.96.239.10]), etc, et= c. These big players end user networks are *huge* sources of viruses and spa= m, so=20 if we can improve the code to identify them properly, I guess it is a=20 valuable improvement -- even though I don't have precise figures and no t= ime=20 to do statistics ;-)) Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-15 20:15:36
|
Le Mardi 15 F=E9vrier 2005 17:18, Lionel Bouton a =E9crit : > > For comparison : on the same sample how many addresses aren't recognize= d > as "dynamic / end-user" by the regexps but are by the smartc algo ? By the way, the current code is flawed, as it performs (only) the followi= ng=20 test : my @bytes =3D split(/\./, $addr); [...] # if last bytes are in fqdn, assume home-user address return $addr if $fqdn =3D~ /$bytes[3]/ and $fqdn =3D~ /$bytes[2]/; It doesn't use any delimiters around the numbers, so, for example mta213.somedomain.com [192.168.3.21] =3D> Match ! server18.net127.domain.org [172.16.18.12] =3D> Match ! mx25.isp.net [10.10.25.2] =3D> Match ! All these would be treated with their full IP address, and not as C-Class= ,=20 where it is probably not what is desired... The regexp solution I proposed fixes these flaws as well. Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-15 15:48:50
|
Le Mardi 15 F=E9vrier 2005 16:31, Lionel Bouton a =E9crit : > > Thanks, I'm worried about the size of the regexp though. There are two > things on my mind : > - is it maintainable ? I don't think it will need much maintenance. It's based on a (yet more=20 complex ;-) regexp I have built over years, and that very seldom needs=20 changes -- and the changes are improvements that are not strictly speakin= g=20 necessary nor urgent. Maintaining such a regexp is not that complex if you are careful ;-)=20 especially about line breaks if you split it into several lines (it seems= =20 that an escaped line break should NOT be put after a ) or } or ? or the=20 regexp won't work. I limit myself to splitting after "regular characters"= and=20 before a "|". > - how much processing time is needed for these regexp ? Given that we just process a short hostname and not a long file, and give= n=20 that Perl will compile the regexp only once except for the one that conta= ins=20 part of the IP as a variable, I believe the processing time should be=20 negligible (compared to the database accesses etc.) > I'd like to add this as a separate algorithm and put the regexp in > external files that can be reloaded I would hardcode this. I expect very little changes to this, if any. Load= ing=20 the regexps from external files would make this still more complex and=20 subject to errors... > I've not access on enough maillog to test > these regexps on each update. Are you willing to maintain them ? No problem, but as I said, expect very little change unless I discover a = major=20 boooog ;-) > I'll probably start the 1.5.x branch for this new algorithm. Meanwhile, you can test it on your own system, I don't think you'll notic= e any=20 performance impact, but it will probably be more accurate that the basic = IP=20 address test (see my last post with some examples...) Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Lionel B. <lio...@bo...> - 2005-02-15 16:35:26
|
Michel Bouissou wrote the following on 15.02.2005 16:48 : >Le Mardi 15 F=E9vrier 2005 16:31, Lionel Bouton a =E9crit : > =20 > >>Thanks, I'm worried about the size of the regexp though. There are two >>things on my mind : >>- is it maintainable ? >> =20 >> > >I don't think it will need much maintenance. It's based on a (yet more=20 >complex ;-) > Even more ! > regexp I have built over years, and that very seldom needs=20 >changes -- and the changes are improvements that are not strictly speaki= ng=20 >necessary nor urgent. > >Maintaining such a regexp is not that complex if you are careful ;-)=20 >especially about line breaks if you split it into several lines (it seem= s=20 >that an escaped line break should NOT be put after a ) or } or ? or the=20 >regexp won't work. I limit myself to splitting after "regular characters= " and=20 >before a "|". > =20 > I see. > =20 > >>- how much processing time is needed for these regexp ? >> =20 >> > >Given that we just process a short hostname and not a long file, and giv= en=20 >that Perl will compile the regexp only once except for the one that cont= ains=20 >part of the IP as a variable, I believe the processing time should be=20 >negligible (compared to the database accesses etc.) > =20 > Regexp can be both really quick and slow. I've not yet enough experience=20 with perl regexps to know only with a quick look at a regexp if perl=20 would handle hundreds of thousands of match/second or just hundreds/secon= d. > =20 > >>I'd like to add this as a separate algorithm and put the regexp in >>external files that can be reloaded >> =20 >> > >I would hardcode this. I expect very little changes to this, if any. Loa= ding=20 >the regexps from external files would make this still more complex and=20 >subject to errors... > =20 > I'd prefer to have if ($fqdn =3D~ $known_server_patter) ... and so on. than the full regexp in the code ! The accidental keypress in the middle=20 of the regexp could have unforseen consequences and would be hard to=20 spot without a cvs diff, but the keypress in the middle of a var name is=20 an instant blocker with an obvious error message leading to a painless=20 resolution. Editing the regexp file would be less error-prone in my opinion. Loading regexps from file isn't really so complex. > [...] > >>I'll probably start the 1.5.x branch for this new algorithm. >> =20 >> > >Meanwhile, you can test it on your own system, I don't think you'll noti= ce any=20 >performance impact, but it will probably be more accurate that the basic= IP=20 >address test (see my last post with some examples...) > =20 > I won't notice any perf difference. Installations handling more than a=20 million mail per day are worrying me though. I'll bench the code to see how many lines per second these regexp can=20 handle on my systems, hard numbers are usually more convincing to me=20 with things as complex as regexpes. Lionel. |
From: Michel B. <mi...@bo...> - 2005-02-15 17:41:48
|
Le Mardi 15 F=E9vrier 2005 17:35, Lionel Bouton a =E9crit : > > I'd prefer to have > if ($fqdn =3D~ $known_server_patter) ... > > and so on. > than the full regexp in the code ! The accidental keypress in the middl= e > of the regexp could have unforseen consequences and would be hard to > spot without a cvs diff Well, "accidental keypresses" in the middle of computer code usually have= =20 unpleasant consequences ;-) IMHO, these regexp that are part of the "smart" routine are fixed "code" = and=20 should be considered as such, not parameters or whatever. There's no spec= ific=20 reason to extract them out of the code and I don't see why somebody would= be=20 more prone to put "accidental keypresses" in there rather than elsewhere.= ..=20 So unless you want to put all the code in an external table... ;-) --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-16 09:40:12
Attachments:
sqlgrey-1.4.4-smartsmart2.patch
|
Le Mardi 15 F=E9vrier 2005 18:41, Michel Bouissou a =E9crit : > > IMHO, these regexp that are part of the "smart" routine are fixed "code= " > and should be considered as such, not parameters or whatever. The regexps themselves are OK, but I find that the "line continuations" s= till=20 cause problems with the part of the regexp immediately preceding the line= =20 continuation, that doesn't work. It seems that line continuation on regexps, that work perfectly in bash, = don't=20 work OK in Perl. I've abandoned the line continuations, even though I don't like very long= =20 lines, it's better than introducing bugs with line continuations that don= 't=20 work as expected. Please find attached a regexp patch (should be applied aftert the first o= ne)=20 that makes them each on a single line. Thinking again about your proposal to move these regexps to separate file= s,=20 event though I don't see a real interest for doing this, if you still wan= t to=20 do it, I suggest these files should be put into /usr/lib/something, and n= ot=20 in /etc/sqlgrey. They shouldn't be considered as "user editable configura= tion=20 files", but as fixed external code. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-15 16:46:49
|
Le Mardi 15 F=E9vrier 2005 17:35, Lionel Bouton a =E9crit : > > I'd prefer to have > > if ($fqdn =3D~ $known_server_patter) If you put the "big regexp" as a variable and not a constant, it will hav= e to=20 be recompiled each time it is called, and not only once... This can cause= a=20 major performance cost. I have to leave for now ;-) --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Henrik C. G. <hc...@b-...> - 2005-02-15 17:20:55
|
tir, 15 02 2005 kl. 17:46 +0100, skrev Michel Bouissou: > Le Mardi 15 F=E9vrier 2005 17:35, Lionel Bouton a =E9crit : > > > > I'd prefer to have > > > > if ($fqdn =3D~ $known_server_patter) >=20 > If you put the "big regexp" as a variable and not a constant, it will h= ave to=20 > be recompiled each time it is called, and not only once... This can cau= se a=20 > major performance cost. Yes, but it can be avoided with something like=20 ($fqdn =3D~ /$known_server_pattern/o) --=20 Henrik Christian Grove <hc...@b-...> B-one |
From: Lionel B. <lio...@bo...> - 2005-02-15 17:21:43
|
Michel Bouissou wrote the following on 15.02.2005 17:46 : >Le Mardi 15 F=E9vrier 2005 17:35, Lionel Bouton a =E9crit : > =20 > >>I'd prefer to have >> >>if ($fqdn =3D~ $known_server_patter) >> =20 >> > >If you put the "big regexp" as a variable and not a constant, it will ha= ve to=20 >be recompiled each time it is called, and not only once... This can caus= e a=20 >major performance cost. > =20 > Don't you know the "my $regexp =3D qr/value_read_from_file/;" syntax ? takes care of the compilation once and for all. |
From: Michel B. <mi...@bo...> - 2005-02-15 17:37:37
|
Le Mardi 15 F=E9vrier 2005 18:21, Lionel Bouton a =E9crit : > > Don't you know the "my $regexp =3D qr/value_read_from_file/;" syntax ? No that much. I don't know much about Perl ;-) --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |