From: Lionel B. <lio...@bo...> - 2005-02-04 20:18:57
|
Michael Storz wrote the following on 02/04/05 17:25 : >Hi Lionel, > >to be a little bit more detailed than Max :-) > > > I must admit I had to read Max's mail twice... >We just started to use greylisting and the first day shows a reduction of >spam by about a factor of 15, that's really great. However, looking >around in the logfiles and the mysql database, I am missing some >information, to help me see what actually happens. > >Therefore I would like three additions to the tables in the database. > >1. Addition: first_seen > >Extra field first_seen also for tables form_awl and domain_awl. With this >addition you are able to see which new entries have been entered into >the database like it is possible now with table connect: > >select * from connect where first_seen > now() - interval 5 minute; > >With the from_awl and domain_awl you can only find out which entries have >been added OR have been updated. > > > Seems to be a good thing to have, especially since it will be a small column that shouldn't put much stress on the database. Added to my TODO. >2. Addition: client_name > >Extra field client_name in all 3 tables. This would help a human to see >from where a connection came. Otherwise, you must always use nslookup or >dig to find the name. > > > I'm afraid this won't be so easy : - in the default 'smart' mode, most of the entries aren't IP address but class C networks. - a VARCHAR column with potentially very long names is not a welcomed addition : it could hurt performance badly. >3. Addition: usage_count > >Every update of an entry in from_awl and domain_awl should increment an >usage_count. > > > I like the concept of storing somewhere usage data, but will an usage_count be enough ? For example I'd like to know the top 10 domain_awl entries used last week, but usage_count won't give them to me. I'm wondering if what you need isn't a separate log file parser that would get you various statistics (most frequent spam sources, most used domains, AWL efficiency, ...). The log file should already have everything you need to compute these stats. >The processing of these fields by sqlgrey should be triggered by >configuration options. For people, who do not need the information and do >not want to waste storage, they would disable these features. > >4. Consistent naming > >In table connect ip_addr is used whereas host_ip in from_awl and >domain_awl. Since it depends on the greylisting mode if the IP address is >a full host address or a class C network, you should use ip_addr for all >three tables. Now, if you try to find out what information is in every >table about an IP address, you can't just change the tablename in the >select, but you have to change the fieldname too. > > Agreed. Added to my TODO. Lionel. |