[Classifier4j-devel] Bayesian Results 0.99 for Everything?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

How does the BayesianClassifier differ from a program similar
to "SpamBayes" (has an Outlook plugin that uses the Bayesian
algorith to classify emails).  I "taught" the Bayesian classifier by
teaching
it with some spam emails I have.  Now it returns a 0.99 classify
result for practically EVERYTHING.

A little background..

I exported about 6400 spam emails from Outlook to an mbox-ish format
using Outport (outport.sourceforge.net).  I then read the subject and
body of each email and ran a BayesianClassifier.teachMatch("spam",
"...");

I pass it a string consisting of all the words in the subject and
body of the message (separated by a space).  This ended up creating
about 60,000 rows in the word_probablity database.

I wrote a BayesianMatcher class for James (james.apache.org).
Basically,
James (smtp/pop3 server) uses a FetchPop class to pull down emails
from my pop3 account and route those to a local email user.  During this
time I pass the email through to a matcher that uses
BayesianClassifier.classify()
to test whether it gets a 90% or better classification.. if so the
letter
is filed into "deadletters" and I never see it in Outlook.. if its less
than
90% it leaves the email untouched and delivers it.

Problem is.. everything is getting sent to deadletters because of a 0.99
match on everything it received.

Is this not the proper way to use the classifier?  Is there a different
way I should use it to get the results Im looking for?

Any help/suggestions would be appreciated!

Thanks,

- Brent