Re: [Crm114-general] misclassification avalanche

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

   From: cr...@ne...

   I'm using version 20040627-BlameSeifkes with procmail and nmh/mh-e on my
   own Linux box.  I recently had a big burst of misclassified mail;
   accuracy declined dramatically, though with manual (TOE?) retraining it
   seems to be improving.  This leads me to a few questions:

   1) Any ideas what might have caused this problem?  I'm wondering if I
      perhaps manually misclassified a single message; would that
      dramatically decrease accuracy?

Possibly.  It depends on the prior background.

But you may also be seeing what I call an "error storm"; after weeks
to months of perfect performance, I get an error.  I train the error,
and then for a week I'll get a greatly decreased accuracy, which 
slowly returns, then I'm back to months of perfection.

Then the process repeats.

I *think* it's due to a relaxation effect in the classifiers.  I am
NOT sure about this, but I can say that the first through third error
storms are the worst and after a dozen of them, they seem to have 
stopped.

   1) Can I improve my accuracy by switching to a later (and presumably
      scarier) version of CRM?

Not really... unless you want to really go bleeding-edge and give 
Fidelis' patch for OSBF classification a try.

THAT might get you quite an accuracy boost- but be warned, you MUST
use thickness-based training with the Fidelis OSBF classifier.  Basically,
if any message (good _or_ spam) scores within +/- 20 pR points of 0, you
force-train the message anyway.  

Mailfilter.crm is not set up to do this yet.

   3) Should I be doing some sort of periodic bulk retraining every evening
      on known spam and known ham?  If so, does anyone have any suggestions
      as how best to do this?

That's called "TUNE" training, it will increase your accuracy if you 
want to try it.

What's your goal?  Are you just looking for usable email, or are
you a "numbers runner", trying to beat four-nines performance?  :-)

    -Bill Yerazunis