RE: [Classifier4j-devel] Bayesian Results 0.99 for Everything?
Status: Beta
Brought to you by:
nicklothian
From: Brent L J. <br...@bj...> - 2004-01-03 17:34:42
|
Aaahhhh - that must be the problem then. I can export some emails from my inbox and let it classify those as matches and see if that helps. And what about categories? It OK to teachMatch and teachNonMatch all the emails with the same category name? Sorry for the simple questions.. but I want to make sure Im using it right so I get the best possible probabilities for spam matching at the server :) Thanks, - Brent > -----Original Message----- > From: cla...@li... > [mailto:cla...@li...] On > Behalf Of Nick Lothian > Sent: Saturday, January 03, 2004 6:30 AM > To: cla...@li... > Subject: Re: [Classifier4j-devel] Bayesian Results 0.99 for > Everything? > > > Are you teaching it non-matches as well as matches? > > It needs something to compare the probability against. > > Nick > > ----- Original Message ----- > From: "Brent L Johnson" <br...@bj...> > To: <cla...@li...> > Sent: Saturday, January 03, 2004 8:44 AM > Subject: [Classifier4j-devel] Bayesian Results 0.99 for Everything? > > > > How does the BayesianClassifier differ from a program similar to > > "SpamBayes" (has an Outlook plugin that uses the Bayesian > algorith to > > classify emails). I "taught" the Bayesian classifier by teaching > > it with some spam emails I have. Now it returns a 0.99 classify > > result for practically EVERYTHING. > > > > A little background.. > > > > I exported about 6400 spam emails from Outlook to an > mbox-ish format > > using Outport (outport.sourceforge.net). I then read the > subject and > > body of each email and ran a BayesianClassifier.teachMatch("spam", > > "..."); > > > > I pass it a string consisting of all the words in the > subject and body > > of the message (separated by a space). This ended up > creating about > > 60,000 rows in the word_probablity database. > > > > I wrote a BayesianMatcher class for James (james.apache.org). > > Basically, James (smtp/pop3 server) uses a FetchPop class > to pull down > > emails from my pop3 account and route those to a local email user. > > During this time I pass the email through to a matcher that uses > > BayesianClassifier.classify() > > to test whether it gets a 90% or better classification.. if so the > > letter > > is filed into "deadletters" and I never see it in Outlook.. > if its less > > than > > 90% it leaves the email untouched and delivers it. > > > > Problem is.. everything is getting sent to deadletters because of a > > 0.99 match on everything it received. > > > > Is this not the proper way to use the classifier? Is there a > > different way I should use it to get the results Im looking for? > > > > Any help/suggestions would be appreciated! > > > > Thanks, > > > > - Brent > > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: IBM Linux Tutorials. Become an > > expert in LINUX or just sharpen your skills. Sign up for > IBM's Free > > Linux Tutorials. Learn everything from the bash shell to > sys admin. > > Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click > > _______________________________________________ > > Classifier4j-devel mailing list > > Cla...@li... > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IBM Linux Tutorials. > Become an expert in LINUX or just sharpen your skills. Sign > up for IBM's Free Linux Tutorials. Learn everything from the > bash shell to sys admin. Click now! > http://ads.osdn.com/?ad_id=1278&alloc_id=3371> &op=click > > _______________________________________________ > > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > |