Re: [Classifier4j-devel] Bayesian Results 0.99 for Everything?
Status: Beta
Brought to you by:
nicklothian
From: Nick L. <ni...@ma...> - 2004-01-03 11:30:01
|
Are you teaching it non-matches as well as matches? It needs something to compare the probability against. Nick ----- Original Message ----- From: "Brent L Johnson" <br...@bj...> To: <cla...@li...> Sent: Saturday, January 03, 2004 8:44 AM Subject: [Classifier4j-devel] Bayesian Results 0.99 for Everything? > How does the BayesianClassifier differ from a program similar > to "SpamBayes" (has an Outlook plugin that uses the Bayesian > algorith to classify emails). I "taught" the Bayesian classifier by > teaching > it with some spam emails I have. Now it returns a 0.99 classify > result for practically EVERYTHING. > > A little background.. > > I exported about 6400 spam emails from Outlook to an mbox-ish format > using Outport (outport.sourceforge.net). I then read the subject and > body of each email and ran a BayesianClassifier.teachMatch("spam", > "..."); > > I pass it a string consisting of all the words in the subject and > body of the message (separated by a space). This ended up creating > about 60,000 rows in the word_probablity database. > > I wrote a BayesianMatcher class for James (james.apache.org). > Basically, > James (smtp/pop3 server) uses a FetchPop class to pull down emails > from my pop3 account and route those to a local email user. During this > time I pass the email through to a matcher that uses > BayesianClassifier.classify() > to test whether it gets a 90% or better classification.. if so the > letter > is filed into "deadletters" and I never see it in Outlook.. if its less > than > 90% it leaves the email untouched and delivers it. > > Problem is.. everything is getting sent to deadletters because of a 0.99 > match on everything it received. > > Is this not the proper way to use the classifier? Is there a different > way I should use it to get the results Im looking for? > > Any help/suggestions would be appreciated! > > Thanks, > > - Brent > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IBM Linux Tutorials. > Become an expert in LINUX or just sharpen your skills. Sign up for IBM's > Free Linux Tutorials. Learn everything from the bash shell to sys admin. > Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > |