Re: [Dspam-user] Mysql connections in daemon mode

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 29/01/2010 02:05, Stevan Bajić wrote:
> On Thu, 28 Jan 2010 14:07:31 +0100
> "to...@st..." <to...@st...> wrote:
>
>> yes, if email total score is above  a fixed threshold,  AND dspam
>> doesnt agree with this, then email is retrained into dspam.
>> It's the same mechanism than SA autolearn  and, AFAIK, plugin crm114
>> use this to retrain crm.
>>
>> I know that will be introduce some possible mistakes, but i think
>> balance between good and right will be ok.
>> What's your opinion on this ? should i try to do this, or it's better
>> to let DSPAM learn by himself and manually retrain it on error ?
>>
> This all depends on your needs. Do your users train DSPAM, CRM114
and/or SA?
>
> Or to put it the other way around: If you trust so much SA then why use
CRM114 and DSPAM at all? What is the point? SA has a bayes engine it
self so there is no much benefit in using DSPAM and/or CRM114 (with it's
default setup/configuration).
>
> Why do you use 3 Anti-Spam engines when each of them depend on each
other and one of them can drag the accuracy of all the others down? What
is the reason that you use a heuristic engine like SA and two
statistical like CRM114 and DSPAM?
>
> Could you write a little bit about how your users are using the
Anti-Spam system? Do they train the engine? Are they able/allowed to
train? Does every user has his own data set or do they all share the
same data for ham/spam? What MTA do you use?
>
our users are able to train dspam, crm114 and SA.
They share the same dateset.
We use postfix as global MTA, but we dont use it to retraining. (no
special alias)
In order to retrain FP, our customers can move email into 2 imap
folders in their mailbox, one for spam learning, the other for ham
learning.
it feeds 2 special folders on one centralized server from which we can
apply learning scripts.
This script do sa-learn for SA and for DSPAM, it checks email headers
and if dspam is not agree with classification, email is retrained with
command:
/usr/bin/dspam --client --user amavis --class=spam --source=error  (or
class=ham of course)

This retraining increase greatly accuracy of the 3 engines.

Autolearning is more tricky because it will massively rely on
heuristics engine (main scoring) to adjusts statistical engine (SA
bayes, CRM) on the fly.
But i'm agree with you, what's the point to  use the 3 statisticals
engine this way.
For SA, it's OK, but for CRM114 and DSPAM, I'm wonder if it's really
clever.

So I think i will let DSPAM do his job, and continue use his scoring
to balance the others.
It's the way it works actually, and I'm really satisfied: accuracy is
great and FP are very low.

And may be I will do the same with CRM114.

So I will give it a try to dspam plugin at
http://eric.lubow.org/projects/dspam-spamassassin-plugin/  because, if
i'm understand correctly, it can be used to balance scoring more
precisely.

Thanks for your help on this
Regards,
Tonio

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktijbMACgkQ8FtMlUNHQIOSOwCaAqfbx+fcmBAUy7mCFFzjb4Ys
wdcAn2433ELBLnRGYiuSQnLjCy8LFz7z
=05gq
-----END PGP SIGNATURE-----