[Classifier4j-devel] Bayesian Results 0.99 for Everything?
Status: Beta
Brought to you by:
nicklothian
From: Brent L J. <br...@bj...> - 2004-01-02 22:14:39
|
How does the BayesianClassifier differ from a program similar to "SpamBayes" (has an Outlook plugin that uses the Bayesian algorith to classify emails). I "taught" the Bayesian classifier by teaching it with some spam emails I have. Now it returns a 0.99 classify result for practically EVERYTHING. A little background.. I exported about 6400 spam emails from Outlook to an mbox-ish format using Outport (outport.sourceforge.net). I then read the subject and body of each email and ran a BayesianClassifier.teachMatch("spam", "..."); I pass it a string consisting of all the words in the subject and body of the message (separated by a space). This ended up creating about 60,000 rows in the word_probablity database. I wrote a BayesianMatcher class for James (james.apache.org). Basically, James (smtp/pop3 server) uses a FetchPop class to pull down emails from my pop3 account and route those to a local email user. During this time I pass the email through to a matcher that uses BayesianClassifier.classify() to test whether it gets a 90% or better classification.. if so the letter is filed into "deadletters" and I never see it in Outlook.. if its less than 90% it leaves the email untouched and delivers it. Problem is.. everything is getting sent to deadletters because of a 0.99 match on everything it received. Is this not the proper way to use the classifier? Is there a different way I should use it to get the results Im looking for? Any help/suggestions would be appreciated! Thanks, - Brent |