You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(10) |
Nov
(37) |
Dec
(66) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
(52) |
Feb
(136) |
Mar
(65) |
Apr
(38) |
May
(46) |
Jun
(143) |
Jul
(60) |
Aug
(33) |
Sep
(79) |
Oct
(29) |
Nov
(13) |
Dec
(14) |
2006 |
Jan
(25) |
Feb
(26) |
Mar
(4) |
Apr
(9) |
May
(29) |
Jun
|
Jul
(9) |
Aug
(11) |
Sep
(10) |
Oct
(9) |
Nov
(45) |
Dec
(8) |
2007 |
Jan
(82) |
Feb
(61) |
Mar
(39) |
Apr
(7) |
May
(9) |
Jun
(16) |
Jul
(2) |
Aug
(22) |
Sep
(2) |
Oct
|
Nov
(4) |
Dec
(5) |
2008 |
Jan
|
Feb
|
Mar
(5) |
Apr
(2) |
May
(8) |
Jun
|
Jul
(10) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
|
Apr
(32) |
May
|
Jun
(7) |
Jul
|
Aug
(38) |
Sep
(3) |
Oct
|
Nov
(4) |
Dec
|
2010 |
Jan
(36) |
Feb
(32) |
Mar
(2) |
Apr
(19) |
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(6) |
Nov
(8) |
Dec
|
2011 |
Jan
(3) |
Feb
|
Mar
(5) |
Apr
|
May
(2) |
Jun
(1) |
Jul
|
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
(6) |
2012 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(6) |
Dec
(10) |
2014 |
Jan
(8) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
(34) |
Aug
(6) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(18) |
Jul
(13) |
Aug
(30) |
Sep
(4) |
Oct
(1) |
Nov
|
Dec
(4) |
2016 |
Jan
(2) |
Feb
(10) |
Mar
(3) |
Apr
|
May
|
Jun
(11) |
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2017 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2019 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Lionel B. <lio...@bo...> - 2005-02-07 17:46:07
|
Michel Bouissou wrote the following on 02/07/2005 04:27 PM : > ><<< >The query planner can use a multicolumn index for queries that involve the >leftmost column in the index definition plus any number of columns listed to >the right of it, without a gap. For example, an index on (a, b, c) can be >used in queries involving all of a, b, and c, or in queries involving both a >and b, or in queries involving only a, but not in other combinations > > > > > Ouch! I didn't remember that. Seems on the next db layout change I'll at least reorder this PRIMARY KEY... |
From: Lionel B. <lio...@bo...> - 2005-02-07 17:42:33
|
Michel Bouissou wrote the following on 02/07/2005 04:42 PM : >Le Lundi 07 F=E9vrier 2005 15:20, Lionel Bouton a =E9crit : > =20 > >>For recipients I'm more than OK with it (this is the opt-in and opt-out >>TODO entry). >>For senders, as I already said, I see it as a big hole in the >>greylisting process. >> =20 >> > >If the default is to come with an empty sender whitelist, IMHO it doesn'= t=20 >"open a hole", but it gives "flexibility to the user" as it lets him man= age=20 >the tool he uses according to his own needs and specific server=20 >configuration. > >The default empty sender whitelist can even come with all the necessary=20 >warnings stating that using it is a bad idea ;-)) > >It's commonplace to say that most of the times developpers of tools don'= t have=20 >exactly the same needs and views about the tools they develop, than some= =20 >users may have among a broad userbase. > >Now the philosophical debate is about whether a developper's goal should= be to=20 >keep the tool fitting exactly to his own personal view (you get qmail an= d=20 >most of DJB's production ;-) > You hit below the belt :-) At least I did release SQLgrey under the GPL=20 (in fact I had no choice, although there isn't much code left from it,=20 SQLgrey is a postgrey fork). Someone else can fork SQLgrey itself and=20 not have to look behind (managing lists of patches like the ones for=20 djbdns). That said, my reluctance to add some things to SQLgrey is based on=20 different things : - usefulness to the users, - impact on the code complexity and performance, - time needed to code. For example, you were quite close to convince me to add the counter in=20 one previous mail by presenting a use case. My problem with it is only=20 that I think this data won't be meaningful enough, when you start to=20 make stats, you want more than a simple counter. > or if the developper should try to bring=20 >features that his users would like to see as long as they are not=20 >incompatible with the tool... > > =20 > Resistance to new features is a very important part of project=20 management. For the counter, it didn't yet pass the "usefulness" test=20 for the reasons I gave. For the sender whitelists, it's not that far,=20 presented like you did (with a default conf file with appropriate=20 warnings) I'm not really against it. It's not on the top of my list=20 though (but patches are welcomed...). Best regards, Lionel. |
From: Michel B. <mi...@bo...> - 2005-02-07 16:20:33
|
Le Lundi 07 F=E9vrier 2005 16:59, HaJo Schatz a =E9crit : > error: Bad exit status from /var/tmp/rpm-tmp.95341 (%install) No better luck here : Erreur de construction de RPM: Fichier non=20 trouv=E9: /var/tmp/sqlgrey-1.4.3-build/usr/share/man/man1/sqlgrey.1.gz But this one is easy : Mandrake builds manpages as .bz2 and not .gz, so i= t's=20 normal the .gz file can't be found. I'll give a shot at the .spec when I have time, maybe tomorrow. If I prod= uce a=20 working RPM, I'll give an URL here. For the moment I stick with 1.4.2 ;-) --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-07 16:18:10
|
Le Lundi 07 F=E9vrier 2005 17:12, Rene Joergensen a =E9crit : > On Mon, Feb 07, 2005 at 11:59:18PM +0800, HaJo Schatz wrote: > > PS: Just recogn'd -- what does that blurry piano labeled "sqlgrey" at > > the top-left of sqlgrey.sf.net mean? > > I think that the piano is blurred because it's falling on "SPAM" :-) Is it a piano or some kind of plugin / connector with 3 stings ? --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Rene J. <rg...@ba...> - 2005-02-07 16:12:55
|
On Mon, Feb 07, 2005 at 11:59:18PM +0800, HaJo Schatz wrote: > PS: Just recogn'd -- what does that blurry piano labeled "sqlgrey" at > the top-left of sqlgrey.sf.net mean? I think that the piano is blurred because it's falling on "SPAM" :-) --=20 -Ren=E9 |
From: HaJo S. <ha...@ha...> - 2005-02-07 15:59:34
|
On Sun, 2005-02-06 at 02:13 +0100, Lionel Bouton wrote: > Meanwhile you can "rpmbuild -ta sqlgrey-1.4.3.tar.bz2". Hmm, not really: [...] install -m 644 sqlgrey.1 /var/tmp/sqlgrey-1.4.3-build/usr/share/man/man1 install init/sqlgrey /var/tmp/sqlgrey-1.4.3-build/etc/init.d + /usr/lib/rpm/find- debuginfo.sh /home/users/hajo/install/rpmbuild/BUILD/sqlgrey-1.4.3 0 blocks find: /var/tmp/sqlgrey-1.4.3-build/usr/lib/debug: No such file or directory + /usr/lib/rpm/check-rpaths /usr/lib/rpm/check-buildroot /var/tmp/rpm-tmp.95341: line 33: /usr/lib/rpm/check-rpaths: No such file or directory error: Bad exit status from /var/tmp/rpm-tmp.95341 (%install) Never looked into rolling my own RPMs, back to the good-old manual installation then. HaJo PS: Just recogn'd -- what does that blurry piano labeled "sqlgrey" at the top-left of sqlgrey.sf.net mean? -- HaJo Schatz <ha...@ha...> http://www.HaJo.Net PGP-Key: http://www.hajo.net/hajonet/keys/pgpkey_hajo.txt |
From: Michel B. <mi...@bo...> - 2005-02-07 15:53:24
|
Le Lundi 07 F=E9vrier 2005 16:27, Michel Bouissou a =E9crit : > > > > > >Without this index, PostgreSQL has to read the whole table sequentia= lly, > > > > Under normal circumstances it shouldn't do a sequential scan : the > > primary key index should be used. > > Not when the SQL query involves only (sender_domain, host_ip) as does > SQLgrey 1.4.2 lines 646 and 692-693. > > These queries don't use the leftmost column of the primay key > =ABfrom_awl_pkey=BB cl=E9 primaire, btree (sender_name, sender_domain, = host_ip), > and, per file:/usr/share/doc/postgresql-docs-7.4.5/indexes-multicolumn.= html > : > <<< > The query planner can use a multicolumn index for queries that involve = the > leftmost column in the index definition plus any number of columns list= ed > to the right of it, without a gap. For example, an index on (a, b, c) c= an > be used in queries involving all of a, b, and c, or in queries involvin= g > both a and b, or in queries involving only a, but not in other combinat= ions > >>> But of course, if you decided that you prefer to modify the order of the=20 primary key, and turn it into (host_ip, sender_domain, sender_name), then= it=20 could be used both for requests involving all of the 3 fields, or for=20 requests involving only (host_ip, sender_domain). The global organization of SQLgrey makes me feel that (host_ip, sender_do= main,=20 sender_name) would probably be the best order for the indexes on all tabl= es=20 (except for domain_awl that has no sender_name of course). --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-07 15:43:05
|
Le Lundi 07 F=E9vrier 2005 15:20, Lionel Bouton a =E9crit : > For recipients I'm more than OK with it (this is the opt-in and opt-out > TODO entry). > For senders, as I already said, I see it as a big hole in the > greylisting process. If the default is to come with an empty sender whitelist, IMHO it doesn't= =20 "open a hole", but it gives "flexibility to the user" as it lets him mana= ge=20 the tool he uses according to his own needs and specific server=20 configuration. The default empty sender whitelist can even come with all the necessary=20 warnings stating that using it is a bad idea ;-)) It's commonplace to say that most of the times developpers of tools don't= have=20 exactly the same needs and views about the tools they develop, than some=20 users may have among a broad userbase. Now the philosophical debate is about whether a developper's goal should = be to=20 keep the tool fitting exactly to his own personal view (you get qmail and= =20 most of DJB's production ;-) or if the developper should try to bring=20 features that his users would like to see as long as they are not=20 incompatible with the tool... But that's not technique anymore, that's philosophy ;-)) --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-07 15:28:54
|
Le Lundi 07 F=E9vrier 2005 15:20, Lionel Bouton a =E9crit : > > >The counter would allow, for example, to easily extract the ratio of > > sender that have been seen only once compared to the ratio of "repeat= ing" > > senders present in the database. For analyzing the database, this is > > useful (and easy to get), and a log parsing tool won't give this > > information. > > Now, that's more an argument I can understand for storing this > information. But won't someone prefer a "previously_seen" Not for me, thanks ;-) --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-07 15:27:35
|
Le Lundi 07 F=E9vrier 2005 15:29, Lionel Bouton a =E9crit : > Michel Bouissou wrote the following on 02/07/2005 02:52 PM : > >I tried to figure out what kind of indexes could be useful in SQLgrey,= and > > I found one that my PostgreSQL seems to like on from_awl: > > > >=ABfrom_awl_sender_domain_host_ip=BB btree (sender_domain, host_ip) > > > >For each time a new entry is to be added in from_awl, SQLgrey will loo= k > > for the count(*) of addresses with the same domain and IP, to determi= ne > > whether this couple should be moved to domain_awl. > > > >Without this index, PostgreSQL has to read the whole table sequentiall= y, > > Under normal circumstances it shouldn't do a sequential scan : the > primary key index should be used. Not when the SQL query involves only (sender_domain, host_ip) as does SQL= grey=20 1.4.2 lines 646 and 692-693. These queries don't use the leftmost column of the primay key =ABfrom_awl= _pkey=BB=20 cl=E9 primaire, btree (sender_name, sender_domain, host_ip), and, per=20 file:/usr/share/doc/postgresql-docs-7.4.5/indexes-multicolumn.html : <<< The query planner can use a multicolumn index for queries that involve th= e=20 leftmost column in the index definition plus any number of columns listed= to=20 the right of it, without a gap. For example, an index on (a, b, c) can be= =20 used in queries involving all of a, b, and c, or in queries involving bot= h a=20 and b, or in queries involving only a, but not in other combinations >>> > However it is very well possible that the index you advise would be > scanned faster than the primary key index as it is the ideal index for > the task unlike the PKEY. Not only, but mainly because the primary key can't be used... --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Lionel B. <lio...@bo...> - 2005-02-07 14:29:29
|
Michel Bouissou wrote the following on 02/07/2005 02:52 PM : >I tried to figure out what kind of indexes could be useful in SQLgrey, a= nd I=20 >found one that my PostgreSQL seems to like on from_awl: > >=ABfrom_awl_sender_domain_host_ip=BB btree (sender_domain, host_ip) > >For each time a new entry is to be added in from_awl, SQLgrey will look = for=20 >the count(*) of addresses with the same domain and IP, to determine whet= her=20 >this couple should be moved to domain_awl. > >Without this index, PostgreSQL has to read the whole table sequentially, > Under normal circumstances it shouldn't do a sequential scan : the=20 primary key index should be used. However it is very well possible that the index you advise would be=20 scanned faster than the primary key index as it is the ideal index for=20 the task unlike the PKEY. I'm really interested in numbers on this : could you run the SELECT=20 count(*) with and without this index (after an ANALYZE to make sure the=20 optimizer doesn't fall back to sequential scan because of borked=20 statistics). Anyone could do the same for SQLite and MySQL ? > and=20 >the from_awl is a tables that grows big... > > =20 > Yep Lionel. |
From: Lionel B. <lio...@bo...> - 2005-02-07 14:20:45
|
Michel Bouissou wrote the following on 02/07/2005 02:31 PM : >Le Dimanche 06 F=E9vrier 2005 16:13, Lionel Bouton a =E9crit : > =20 > >>I'm not inclined to add stuff just because it isn't a big deal, especia= lly >>in the database schema which is the kind of thing I learned to change w= ith >>caution.=20 >> =20 >> > >Sure, but the database schema will have to change anyway (to include=20 >first_seen and rename an IP address field). So it would be the good mome= nt to=20 >add one more field that costs little. Changing important fields in a dat= abase=20 >schema must be done with caution, I agree, but adding a purely informati= ve=20 >field (that won't be used as a key or calculation base or whatever) has = no=20 >consequences... > > =20 > This makes sense to me. But there are so many purely informative fields.=20 For example it just occured to me that you *may* want to have a=20 "previously_seen" field in order to do queries like that : SELECT sender_domain, host_ip, last_seen - previously_seen FROM=20 domain_awl ORDER BY (last_seen - previously_seen) LIMIT 50; I'd even argue that this will be more useful than a counter field... but=20 still less useful than a log parsing tool. >>Look at the TODO, there are already several things with a clear need... >> =20 >> > >Yes. About the todo, a couple of remarks : > >1/ I object against integrating SPF in any way in SQLgrey. SPF and greyl= isting=20 >are completely different systems, with different goals and approaches. S= PF is=20 >implemented in separate patches (I use a Postfix patch) or policy server= s. I=20 >don't see the interest of integrating a goat and a cow together ;-) and = using=20 >SPF to determine whether or not greylisiting should be applied would sur= ely=20 >be an easy way for spammers to defeat greylisting... > =20 > It may be, this entry is only a reminder for me. I know for sure that=20 blindly trusting SPF is a no-no, the "experiment" only means that I'm=20 wondering if SQLgrey rejecting SPF invalid senders instead of=20 greylisting them may be useful (the question is merely to find out if=20 there's a point combining both informations outside Postfix in the=20 policy server or not) or if relying on a separate policy server is the=20 way to go (and document this in the HOWTO). Don't pay too much attention=20 to this TODO entry. >2/ I still would love to get sender and recipient based whitelisting in=20 >SQLgrey. Using Postfix tables for this purpose is not a satisfactory=20 >solution, for one can have a whole series of tests in Postfix, and diffe= rent=20 >exceptions for each kind of test. One may want to skip greylisting for s= ome=20 >sender (i.e. somebody@somedomain), but for example still want to perform= SPF=20 >tests on somedomain. Using a Postfix table with "somebody@somedomain =3D= > OK"=20 >would cause *all* subsequent tests to be skipped for this message, not o= nly=20 >greylisting. And it makes it a headache in ordering tests if using diffe= rent=20 >Postfix tables for this... > =20 > I don't find test ordering in Postfix the most intuitive thing either :-) >It would sound logical and easier to me that each "policy server" embark= s its=20 >own independent whitelisting for conditions under which this given test=20 >should be performed or not... > > =20 > For recipients I'm more than OK with it (this is the opt-in and opt-out=20 TODO entry). For senders, as I already said, I see it as a big hole in the=20 greylisting process. > =20 > >>In my opinion a separate log parsing tool would bring far more useful >>stats. >> =20 >> > >Sure, a log parsing tool is most useful, and probably most mail admins h= ave=20 >something like this. But a counter gives *different* information that ca= n be=20 >seen in the databaseat a glimpse, i.e. "is this sender an usual, frequen= t=20 >correspondent, or did he send only once" ? (as some spammers or viruses = do,=20 >and yes, sometimes, they can pass thru greylisting)... > >The counter would allow, for example, to easily extract the ratio of sen= der=20 >that have been seen only once compared to the ratio of "repeating" sende= rs=20 >present in the database. For analyzing the database, this is useful (and= easy=20 >to get), and a log parsing tool won't give this information. > =20 > Now, that's more an argument I can understand for storing this=20 information. But won't someone prefer a "previously_seen" (which by the=20 way is slightly more complex to implement) ? If the entry can't be found=20 more than once in the logs covering the awl ttl period, you'll have=20 nearly the same information... Lionel. |
From: Michel B. <mi...@bo...> - 2005-02-07 13:52:34
|
I tried to figure out what kind of indexes could be useful in SQLgrey, an= d I=20 found one that my PostgreSQL seems to like on from_awl: =ABfrom_awl_sender_domain_host_ip=BB btree (sender_domain, host_ip) For each time a new entry is to be added in from_awl, SQLgrey will look f= or=20 the count(*) of addresses with the same domain and IP, to determine wheth= er=20 this couple should be moved to domain_awl. Without this index, PostgreSQL has to read the whole table sequentially, = and=20 the from_awl is a tables that grows big... Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-07 13:31:33
|
Le Dimanche 06 F=E9vrier 2005 16:13, Lionel Bouton a =E9crit : > > >I would say that IMHO the way to consider the counter is that: > >1/ Some people would seem to like it (including me, for non-rational > >reasons ;-) > >2/ It should be a very easy addition > >3/ It doesn't hurt performance > Sorry but I need more than that :-) Sniff ;-( So what does it take ? ;-) > I'm not inclined to add stuff just because it isn't a big deal, especia= lly > in the database schema which is the kind of thing I learned to change w= ith > caution.=20 Sure, but the database schema will have to change anyway (to include=20 first_seen and rename an IP address field). So it would be the good momen= t to=20 add one more field that costs little. Changing important fields in a data= base=20 schema must be done with caution, I agree, but adding a purely informativ= e=20 field (that won't be used as a key or calculation base or whatever) has n= o=20 consequences... > Look at the TODO, there are already several things with a clear need... Yes. About the todo, a couple of remarks : 1/ I object against integrating SPF in any way in SQLgrey. SPF and greyli= sting=20 are completely different systems, with different goals and approaches. SP= F is=20 implemented in separate patches (I use a Postfix patch) or policy servers= . I=20 don't see the interest of integrating a goat and a cow together ;-) and u= sing=20 SPF to determine whether or not greylisiting should be applied would sure= ly=20 be an easy way for spammers to defeat greylisting... 2/ I still would love to get sender and recipient based whitelisting in=20 SQLgrey. Using Postfix tables for this purpose is not a satisfactory=20 solution, for one can have a whole series of tests in Postfix, and differ= ent=20 exceptions for each kind of test. One may want to skip greylisting for so= me=20 sender (i.e. somebody@somedomain), but for example still want to perform = SPF=20 tests on somedomain. Using a Postfix table with "somebody@somedomain =3D>= OK"=20 would cause *all* subsequent tests to be skipped for this message, not on= ly=20 greylisting. And it makes it a headache in ordering tests if using differ= ent=20 Postfix tables for this... It would sound logical and easier to me that each "policy server" embarks= its=20 own independent whitelisting for conditions under which this given test=20 should be performed or not... > In my opinion a separate log parsing tool would bring far more useful > stats. Sure, a log parsing tool is most useful, and probably most mail admins ha= ve=20 something like this. But a counter gives *different* information that can= be=20 seen in the databaseat a glimpse, i.e. "is this sender an usual, frequent= =20 correspondent, or did he send only once" ? (as some spammers or viruses d= o,=20 and yes, sometimes, they can pass thru greylisting)... The counter would allow, for example, to easily extract the ratio of send= er=20 that have been seen only once compared to the ratio of "repeating" sender= s=20 present in the database. For analyzing the database, this is useful (and = easy=20 to get), and a log parsing tool won't give this information. Cheers. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E L=E0 o=F9 il n'y a pas de r=E9ponse, il n'y a pas de question. |
From: Lionel B. <lio...@bo...> - 2005-02-06 15:14:52
|
Michel Bouissou wrote the following on 02/06/05 10:25 : >Le Samedi 05 F=E9vrier 2005 13:49, Lionel Bouton a =E9crit : > =20 > >>For first_seen, my plan is to make the update process set them to >>last_seen. >> =20 >> > >Sounds logical > > =20 > >>If counters are really needed, they will be set to 0.=20 >> =20 >> > >"1" would probably be better. If we have an entry in an awl, we have see= n it=20 >at least once ;-) > > =20 > I could make that 2 for the domain_awl if the aggregation level is > 1=20 ;-) But I'm not seing counters in the schema yet... |
From: Lionel B. <lio...@bo...> - 2005-02-06 15:13:37
|
Michel Bouissou wrote the following on 02/06/05 10:24 : >Le Samedi 05 F=E9vrier 2005 15:07, Lionel Bouton a =E9crit : > =20 > >>>I don't think that updating last_seen + counter would result in any >>>noticeable performance difference compared to updating last_seen alone= ... >>> =20 >>> >>Of course you are right. But the need for counters isn't clear for me >>yet... >> =20 >> > >I would say that IMHO the way to consider the counter is that: >1/ Some people would seem to like it (including me, for non-rational=20 >reasons ;-) >2/ It should be a very easy addition >3/ It doesn't hurt performance > >So... > > =20 > Sorry but I need more than that :-) I'm not inclined to add stuff just=20 because it isn't a big deal, especially in the database schema which is=20 the kind of thing I learned to change with caution. Look at the TODO, there are already several things with a clear need... In my opinion a separate log parsing tool would bring far more useful sta= ts. Lionel. |
From: Lionel B. <lio...@bo...> - 2005-02-06 15:04:41
|
Michel Bouissou wrote the following on 02/06/05 10:22 : >Le Samedi 05 F=E9vrier 2005 21:19, Lionel Bouton a =E9crit : > =20 > >>>I suggest that we should replace by # any series of possibly HEX numbe= rs >>>[0-9A-Fa-f] separated from the rest of the address by one of [._-] >>> =20 >>> >>Something like s/([._-])[0-9A-Fa-f]*([._-])/$1#$2/g (not tested, >>probably flawed, but you get the idea) ? >> =20 >> > >I would say s/([._-])[0-9A-Fa-f]+([._-])/$1#$2/g > > =20 > Of course. I'll test this regexp -> TODO for 1.4.4. >>Why the '.' ? I don't see it in your examples. >> =20 >> > >I don't know all the VERP sender models out there. "." could probably be= used=20 >as a separator by some, just as "-" and "_" are... >Possibly "=3D", too... > > =20 > Yep, but I'd better check SRS before them if '=3D' is to be used... >SRS0=3D#=3D#=3Dsomedomain.org=3Ds...@bo... >[...] >SRS1=3D#=3Dbouissou.net=3D=3D#=3D#=3Dsomedomain.org=3Dsomebody@bouissou.= com > > >Hope this helps. > =20 > Yep, it confirms that I correctly understood SRS :-) I'll give it a shot in 1.4.4. Lionel |
From: Michel B. <mi...@bo...> - 2005-02-06 09:25:15
|
Le Samedi 05 F=E9vrier 2005 13:49, Lionel Bouton a =E9crit : > > For first_seen, my plan is to make the update process set them to > last_seen. Sounds logical > If counters are really needed, they will be set to 0.=20 "1" would probably be better. If we have an entry in an awl, we have seen= it=20 at least once ;-) --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-06 09:24:09
|
Le Samedi 05 F=E9vrier 2005 15:07, Lionel Bouton a =E9crit : > > >I don't think that updating last_seen + counter would result in any > > noticeable performance difference compared to updating last_seen alon= e... > > Of course you are right. But the need for counters isn't clear for me > yet... I would say that IMHO the way to consider the counter is that: 1/ Some people would seem to like it (including me, for non-rational=20 reasons ;-) 2/ It should be a very easy addition 3/ It doesn't hurt performance So... --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Michel B. <mi...@bo...> - 2005-02-06 09:22:31
|
Le Samedi 05 F=E9vrier 2005 21:19, Lionel Bouton a =E9crit : > > > >I suggest that we should replace by # any series of possibly HEX numbe= rs > >[0-9A-Fa-f] separated from the rest of the address by one of [._-] > > Something like s/([._-])[0-9A-Fa-f]*([._-])/$1#$2/g (not tested, > probably flawed, but you get the idea) ? I would say s/([._-])[0-9A-Fa-f]+([._-])/$1#$2/g > Why the '.' ? I don't see it in your examples. I don't know all the VERP sender models out there. "." could probably be = used=20 as a separator by some, just as "-" and "_" are... Possibly "=3D", too... > >Also, I believe that sqlgrey has no provision to recognize single-use > > sender addresses including hashes such as what systems like SRS, SES = or > > BATV can produce... > > Given they all generate a per mail local-part, the from_awl won't be > used. But the domain_awl will be. Yes. > Are SES and BATV popular ? on www.libsrs2.org, they don't have kind > words for SES for example... I don't know. All these are rather experimental, and each of these method= s=20 often criticize the others... I would personally prefer SRS, but I'm still waiting for a working Postfi= x=20 implementation, and it seems that it hasn't evolved at all since last jul= y. There's a Postfix patch, which is broken, and nobody has yet started fixi= ng=20 it :-/ > For SRS, I need to look more into it to tune proper regexps, meanwhile > domain_awl should help. All SRS addresses usually begin with "SRS0=3D" or "SRS1=3D", which should= make=20 them easy to identify. The most common form is that an address such as "som...@so..."= =20 processed on the "bouissou.net" forwarder would be transformed into: SRS0=3DydTPp65P=3DQU=3Dsomedomain.org=3Ds...@bo... =3D is the separator between fields SRS0 identifies and SRS type 0 address ydTPp65P is the hash (length dependign on setup, will usually be between = 4 and=20 8 chars) QU is a timestamp, always 2 chars somedomain.org is the original domain somebody is the original localpart @bouissou.net is the forwarding domain For storing this, SQLgrey should probably ignore the hash and timestamp, = so it=20 should turn such an address into: SRS0=3D#=3D#=3Dsomedomain.org=3Ds...@bo... If such ans SRS0 address is reprocessed by a 2nd SRS forwarder, it would = turn=20 into an SRS1 address of the form: SRS1=3D5uVcjerW=3Dbouissou.net=3D=3DydTPp65P=3DQU=3Dsomedomain.org=3Dsome= bo...@bo... (SRS1=3D<hash>=3D<1st_forwarder>=3D=3D<hash>=3D<timestamp>=3D<original_do= main>=3D<original_localpart>@<forwarder_domain>) So SQLgrey should probably store this as: SRS1=3D#=3Dbouissou.net=3D=3D#=3D#=3Dsomedomain.org=3Dsomebody@bouissou.c= om Hope this helps. --=20 Michel Bouissou <mi...@bo...> OpenPGP ID 0xDDE8AC6E |
From: Lionel B. <lio...@bo...> - 2005-02-06 01:14:05
|
Hi, 1.4.3 is on sourceforge. # What's new update_sqlgrey_whitelists. The name says it all... This is a small bash script relying on wget, md5sum and diff (to show the changes applied). Adding this to a daily cron (with a random offset in the day to avoid spikes on the poor central webserver) is advised. As requested, logging to stdout is back when not in daemon mode (this is for runit and daemontools users). # No RPM I didn't post rpms this time as I don't have any RPM-based distro at hand. If there's a need I'll set up a Fedora Core with User-mode Linux to make RPMs. Meanwhile you can "rpmbuild -ta sqlgrey-1.4.3.tar.bz2". Best regards, Lionel |
From: Michael S. <Mic...@lr...> - 2005-02-05 21:27:43
|
Before I go deeper into the discussion of greylisting, I want to introduce the Leibniz-Rechenzentrum, where Max and I are working. I think this will be necessary to understand some of my ideas or the way I am looking at greylisting and email in general. The Leibniz-Rechenzentrum is part of the Bavarian Academy of Science. We are the ISP/ASP for the scientific organizations in the Munich area, for example 10 universities (general, technical, applied sciences). We are responsible for the Munich Scientific Network with more than 50.000 computers. In addition we are the highperformance computing center for all the other universities in Bavaria and one of the three Supercomputer Centers of Germany. For more information go to http://www.lrz-muenchen.de/wir/intro/en/. Going back to email. At the moment we house about 65.000 mailboxes under 180 different domains. We process about 1.2 million emails per day. Our software comes from Syntegra, a part of Britisch Telecom (formerly Control Data Systems, and before that Control Data Corporation = CDC) and is a little bit similar to Postfix. Max programmed the glue software to interface the receiving daemon (SMTPserver) with the policy daemon sqlgrey. Any I will hopefully be able to discuss new features as far as my time allows it. Having said this, you will understand that we will have different opinions, because we do not use Postfix on the one side and maybe process more emails than others on this list on the other side. This means we need maybe more complex algorithms than others, but are able to throw in more hardware. For example we will have a dedicated high available mySQL server (at least I hope that we will have it :-) and a dedicated server for the policy server itself. I hope we all on this list will be able to get our needs fullfilled with sqlgrey otherwise we must develope our own branch of sqlgrey, because we do not want to force our opinions onto others. But beginning our own branch would be a pity. Regards, Michael Storz ------------------------------------------------- Leibniz-Rechenzentrum ! <mailto:St...@lr...> Barer Str. 21 ! Fax: +49 89 2809460 80333 Muenchen, Germany ! Tel: +49 89 289-28840 |
From: Lionel B. <lio...@bo...> - 2005-02-05 20:19:19
|
Michel Bouissou wrote the following on 02/05/05 10:35 : >Hello, > >Here are some samples or VERP-style single-use sender addresses local parts >that sqlgrey currently doesn't recognize, and that cause every mail from the >same origin to get greylisted again. > >Some may cause the sending domain to end in the "domain_awl" if the sample >gets big enough, but some being weekly or monthly newsletters may never >produce the requested number, and get greylisted forever... > >bounce-#-d03ca8744c532662291bfe53e62dddb3eab6aa94-# > >photoservice-xtc-9x9-0lac-dd-c2t6m >photoservice-xtc-9x9-zj8o-dd-c2t6m > >fr2_2743_html-2743_4 >fr_2760_html-2760_1098 >fr_2785_html-2785_1098 >fr_2811_html-2811_1137 >fr_2833_html-2833_1137 > >I suggest that we should replace by # any series of possibly HEX numbers >[0-9A-Fa-f] separated from the rest of the address by one of [._-] > > > Something like s/([._-])[0-9A-Fa-f]*([._-])/$1#$2/g (not tested, probably flawed, but you get the idea) ? Why the '.' ? I don't see it in your examples. >The "photoservice" samples would probably be harder to recognize. > > > The whitelists are your friends... >Also, I believe that sqlgrey has no provision to recognize single-use sender >addresses including hashes such as what systems like SRS, SES or BATV can >produce... > > > Given they all generate a per mail local-part, the from_awl won't be used. But the domain_awl will be. Are SES and BATV popular ? on www.libsrs2.org, they don't have kind words for SES for example... For SRS, I need to look more into it to tune proper regexps, meanwhile domain_awl should help. Lionel. |
From: Michael S. <Mic...@lr...> - 2005-02-05 18:59:53
|
On Sat, 5 Feb 2005, Lionel Bouton wrote: > > I see the need for a first_seen : logs are mostly useless for this > information. Ok, first_seen is the one, I need most. With this field added, I can test new algorithms via cron-job without changing the code of sqlgrey. > > counters : what for ? As I previously said, I'm not sure they will be > usefull as logs already hold more relevant information. It will hurt the > performance of people willing to use them too (if I make them optionnal > as requested) as this will need an update to the database on each and > every mail SQLgrey will see (db updates are slow...). > If someone tells me what it will be used for that can't be done with a > simple log parsing tool, I'd be more inclined to put it in my TODO list. > In the other case, I'll add a logparsing tool in the TODO... > ... > >Not only "client_name" would be meaningless for all the class-C records, but > >also a reverse DNS entry is something that can change over time. If we put a > >client_name column, we should update it with the client_name Postfix gives > >everytime we update a record. (note that we could very well put there the > >last client_name seen even when we use a class-C entry type, as this would > >still give an indication about the calling client). > >But I'm not sure that having this in the tables would be very useful. > > For the fields counter and client_name I must admit that I have no direct use to enhance greylisting at the moment. The reason I want to have these fields is, that I think greylisting as it is implemented at the moment is a sharp sword. But over time spammers will try to circumvent greylisting. My experience over the years showed me that the human brain is very efficient in finding patterns or trying to order chaos. What it needs for this is a lot of information. Digging around in the sql databse with select statements is much easier and faster than writing a log parser. That's the reason why I would like to have these fileds in the database, but as I said before I have no hard facts for inclusion. About performance: I must admit, that I have nearly no knowledge about databases. When I studied computer science long time ago, I somehow managed it to avoid databases. The knowledge I have is about X.500 and ldap. And in these applications there would be no performance loss if you add additional attributes even with big values. The reason is, that this additon will have no influence on the hashing (indexing) of the other attributes. Michael Storz ------------------------------------------------- Leibniz-Rechenzentrum ! <mailto:St...@lr...> Barer Str. 21 ! Fax: +49 89 2809460 80333 Muenchen, Germany ! Tel: +49 89 289-28840 |
From: Lionel B. <lio...@bo...> - 2005-02-05 14:07:19
|
Michel Bouissou wrote the following on 02/05/05 13:58 : >Le Samedi 05 F=E9vrier 2005 13:49, Lionel Bouton a =E9crit : > =20 > >>counters : what for ? As I previously said, I'm not sure they will be >>usefull as logs already hold more relevant information. It will hurt th= e >>performance of people willing to use them too (if I make them optionnal >>as requested) as this will need an update to the database on each and >>every mail SQLgrey will see (db updates are slow...). >> =20 >> > >There's already a database update on each and every mail that SQLgrey se= es :=20 >The last_seen entry gets updated. > >I don't think that updating last_seen + counter would result in any noti= ceable=20 >performance difference compared to updating last_seen alone... > =20 > Of course you are right. But the need for counters isn't clear for me yet= ... |