Re: [Ixhash-users] iXhash reputation/infamy rating feature - your thoughts wanted
Status: Beta
Brought to you by:
dbonengel
From: Dirk B. <di...@bo...> - 2008-10-19 21:26:43
|
Per Jessen schrieb: > Dirk Bonengel wrote: >> There are also a few things to be considered - your input wanted.. >> - Will there be any data available in the first place (This goes out >> to those that maintain there own lists) > > I think it's possible, we can certainly collect the data. > >> - Benchmarking - is there a need to store the returned IP results >> (carrying the repution) somewhere internal to SpamAssassin to avoid >> multiple DNS lookups - or should this be externalized (rely in DNS >> caching) > > I think it is best done externally. No need to reinvent the wheel again. Hmm. Is starting a new DNS lookup quicker than lookuing up a meta in $pms?? > >> - The overall number of hashes and details of how they have been >> collected will play a role - 50 submissions in a list fed by small >> (local?) spamtrap employing black- und greylisting mechanims should >> be scored differently from a list processing millions of mails daily >> with no filtering mechanisms in place. > > I'm not quite sure I understand this. I see the "score" being made up > of: > > 1. Hit or no hit. (current situation) > 2. Which list was hit. (generic, hosteurope, ctyme etc.) > 3. existing hit count. > > If there is a difference in how/where the data is being collected, it > should be on different lists, such that the score differentiation can > be done in 2). > Let me try to explain. The problem is the hit count... Currently, a spam that is fed into my iXhash system (i.e. server side) leads to all three hashes being generated, as some minor variations inside the mail's body may lead to different hashes per algorithm. Then again, you might (on the client side) run into the following scenario: Take three spams that clearly belong to the same spam run but vary in a some manner from each other. - Spam #1 hits on hash #1 but not on hash #2 -> You would get a reputation score according to the number of submissions for hash #1 - Spam #2 would _not_ score on hash #1 but on hash #2 -> You would get a repution score acc. the # of submissions for hash #2 - Spam #3 would score on both hash #1 and hash #2 -> what reputation score should be applied now?? (Both submissions added? The maximum value?) IIRC I had this phenomenon when doing a MySQL based plugin (i.e. querying a databases instead of DNS). As it is now it also shows in the beta code doing ranges (which you do not know yet) Mail scores both on <hash#1>.generic.ixhash.net and <hash#2>.generic.ixhash.net. This would be some variation on the 'addition' scheme. > > /Per Hope I came across Dirk |