From: Michel B. <mi...@bo...> - 2005-01-13 13:46:57
|
Le Jeudi 13 Janvier 2005 11:41, Lionel Bouton a =E9crit : > > >This is also an option, but the decision wether or not to use the "ful= l" > >algorithm in not perfect. > > Quite true, this is only an heuristic. > > > I've seen many cases where SQLgrey uses the "class > >C" algorithm for end-user DSL addresses. The way different ISPs name t= heir > >end-user pools can vary quite a lot... > > Could you report them to me ? If possible I'd like to make SQLgrey awar= e > of them. Depending upon how you isolate the last byte of the IP address, such a li= st of=20 misses could include hostnames like (analyzed from my own mailserver logs= ):=20 1Cust166.tnt34.rtm1.nld.da.uu.net[213.116.162.166] 1Cust34.tnt10.ber2.deu.da.uu.net[149.225.214.34] 1Cust63.tnt2.ber2.deu.da.uu.net[149.225.54.63] 40322032.ptr.dia.nextlink.net[64.50.32.50] ACB01AF0.ipt.aol.com[172.176.26.240] ACB2B019.ipt.aol.com[172.178.176.25] ACB4C175.ipt.aol.com[172.180.193.117] ACB59C5D.ipt.aol.com[172.181.156.93] ACB6068A.ipt.aol.com[172.182.6.138] ACB8796A.ipt.aol.com[172.184.121.106] ACBBCA2F.ipt.aol.com[172.187.202.47] anba-c34712aa.pool.mediaWays.net[195.71.18.170] asd-blm-2066b.adsl.wanadoo.nl[81.70.36.107] bxy114.neoplus.adsl.tpnet.pl[83.30.18.114] c8a65a38.bhz.virtua.com.br[200.166.90.56] c9069c83.virtua.com.br[201.6.156.131] c9113d37.rjo.virtua.com.br[201.17.61.55] cable66a249.usuarios.retecal.es[213.254.66.249] cable73a151.usuarios.retecal.es[213.254.73.151] cam29.neoplus.adsl.tpnet.pl[83.30.84.29] catv-5063672e.catv.broadband.hu[80.99.103.46] cc84041-a.hnglo1.ov.home.nl[212.204.159.14] cpc2-darl2-5-1-cust168.midd.cable.ntl.com[82.6.207.168] cpc2-darl2-5-1-cust246.midd.cable.ntl.com[82.6.207.246] CPE000c4189a793-CM014500105533.cpe.net.cable.rogers.com[24.112.207.118] dial-1159.lubin.dialog.net.pl[62.87.209.135] dialup111.sofia.spnet.net[213.169.32.111] dialup117.sofia.spnet.net[213.169.32.117] dialup13.sofia.spnet.net[213.169.32.13] dialup63.nss.ltk.is.com.fj[202.62.120.110] dsl81-214-756.adsl.ttnet.net.tr[81.214.2.244] dsl81-215-11081.adsl.ttnet.net.tr[81.215.43.73] dsl81-215-12127.adsl.ttnet.net.tr[81.215.47.95] dsl81-215-12621.adsl.ttnet.net.tr[81.215.49.77] dsl81-215-43162.adsl.ttnet.net.tr[81.215.168.154] dsl81-215-5152.adsl.ttnet.net.tr[81.215.20.32] dsl81-215-5910.adsl.ttnet.net.tr[81.215.23.22] dsl81-215-6900.adsl.ttnet.net.tr[81.215.26.244] h000d56113fb3.ne.client2.attbi.com[24.131.134.86] h0010b568f3a3.ne.client2.attbi.com[24.61.154.36] h0040cab53019.ne.client2.attbi.com[24.91.135.81] jangce-1174.adsl.datanet.hu[195.56.12.158] M339P015.dipool.highway.telekom.at[62.46.32.79] modemcable171.52-130-66.mc.videotron.ca[66.130.52.171] modemcable179.240-203-24.mc.videotron.ca[24.203.240.179] modemcable225.184-201-24.mc.videotron.ca[24.201.184.225] modemcable227.134-203-24.mc.videotron.ca[24.203.134.227] modemcable245.107-70-69.mc.videotron.ca[69.70.107.245] modemcable254.86-201-24.mc.videotron.ca[24.201.86.254] mstr195175-29437.dial-in.ttnet.net.tr[195.175.194.254] mstr195175-30267.dial-in.ttnet.net.tr[195.175.198.60] n4z78l145.broadband.ctm.net[202.175.78.145] p3E9E8E7C.dip.t-dialin.net[62.158.142.124] p508360E9.dip0.t-ipconnect.de[80.131.96.233] p50836131.dip0.t-ipconnect.de[80.131.97.49] p50837496.dip0.t-ipconnect.de[80.131.116.150] p50837EAD.dip0.t-ipconnect.de[80.131.126.173] p5084BAE5.dip.t-dialin.net[80.132.186.229] p508A9316.dip0.t-ipconnect.de[80.138.147.22] p508CC432.dip0.t-ipconnect.de[80.140.196.50] p509134B9.dip.t-dialin.net[80.145.52.185] p5480DED9.dip.t-dialin.net[84.128.222.217] p548115DE.dip.t-dialin.net[84.129.21.222] p54878AEC.dip.t-dialin.net[84.135.138.236] pcp02171061pcs.brghtn01.mi.comcast.net[68.43.207.125] pcp02587825pcs.shlb1201.mi.comcast.net[68.84.168.99] pcp08413319pcs.savana01.ga.comcast.net[68.51.166.139] pD9523D7B.dip.t-dialin.net[217.82.61.123] pD953A06D.dip.t-dialin.net[217.83.160.109] pD95B65B3.dip0.t-ipconnect.de[217.91.101.179] pD9E578A1.dip.t-dialin.net[217.229.120.161] pD9EC4616.dip0.t-ipconnect.de[217.236.70.22] pD9EC50D1.dip0.t-ipconnect.de[217.236.80.209] pD9EC52FB.dip0.t-ipconnect.de[217.236.82.251] pD9FA8F9F.dip.t-dialin.net[217.250.143.159] rt-z-23c40.adsl.wanadoo.nl[81.70.90.64] S010600055d07eddd.gv.shawcable.net[24.108.127.244] S010600055dff39b9.vc.shawcable.net[24.87.42.137] S01060007e91f3b26.vc.shawcable.net[24.80.147.116] S01060010dca27adf.vn.shawcable.net[24.85.211.61] S01060050049395bf.gv.shawcable.net[24.108.153.242] S01060050229c08e8.vs.shawcable.net[24.81.90.251] S01060050bab21b9b.cg.shawcable.net[68.144.198.200] S01060050bf78aeb5.rd.shawcable.net[70.65.89.118] S01060050bfacf890.ok.shawcable.net[24.71.140.130] S01060080c6f85ba7.vf.shawcable.net[70.68.194.161] S010600e029961f94.gv.shawcable.net[24.68.6.189] user-0cej1dr.cable.mindspring.com[24.233.133.187] user-0cetn71.cable.mindspring.com[24.238.220.225] user-0cevf7e.cable.mindspring.com[24.239.188.238] user-12hc133.cable.mindspring.com[69.22.4.99] user242.res.openband.net[65.246.82.242] I personally use in some administrative bash scripts a very complex exten= ded=20 grep regexp (which is not a Perl regexp, sorry), that is also heuristic b= ut=20 shows very few mistakes. It bases its analysis on hostname[ip_address] as= =20 found in a Postfix log. Here is the regexp, maybe you can get some ideas from it, or it could be = of=20 some use to somebody ? : egrep -i "(^|[0-9.x_-])(((c|cm|h|host|m)?0*([1-9]{1,3}[0-9]{0,2}) [._-].*\[[.0-9]+\.\5\])|(abo|broadband|(hk)?cablep?|catv|d?client2?| cust(omer)?s?|dhcp|dial?(in|up)?|dip|[asx]?dsl|dyn(amic)?|home|in-addr| modem(cable)?|(di)?pool|ppp|ptr|rev|static|user|YahooBB[0-9]{12}|c[[:alnu= m:]] {6,}(\.[a-z]{3})?\.virtua|[1-9]Cust[0-9]+|ACB[0-9A-F]{5}\.ipt|pcp[0-9]{8}= pcs| S0106[[:alnum:]]{12,}\.[a-z]{2})[0-9.x_-]|unknown\[)" (All in 1 line ;-) You can test it against your own server logs by copying/pasting (on one s= ingle=20 line) an instruction as follows, for example: [root@totor sqlgrey]# zcat /var/log/mail/info.1 | egrep=20 "postfix/smtpd\[[0-9]+\]: connect from " | cut -f4- -d: | cut -f4- -d" " = |=20 sort -u | egrep -i "(^|[0-9.x_-])(((c|cm|h|host|m)?0*([1-9]{1,3}[0-9]{0,2= }) [._-].*\[[.0-9]+\.\5\])|(abo|broadband|(hk)?cablep?|catv|d?client2?| cust(omer)?s?|dhcp|dial?(in|up)?|dip|[asx]?dsl|dyn(amic)?|home|in-addr| modem(cable)?|(di)?pool|ppp|ptr|rev|static|user|YahooBB[0-9]{12}|c[[:alnu= m:]] {6,}(\.[a-z]{3})?\.virtua|[1-9]Cust[0-9]+|ACB[0-9A-F]{5}\.ipt|pcp[0-9]{8}= pcs| S0106[[:alnum:]]{12,}\.[a-z]{2})[0-9.x_-]|unknown\[)" | less Then you'll see all that it catches. Try the same, but with "egrep -iv" instead of "egrep -i" to check what it= does=20 NOT catch. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E Etre dans le vent est une ambition de feuille morte ...ou de pet foireux. |