Thread: [Classifier4j-devel] Bayesian Results 0.99 for Everything?
Status: Beta
Brought to you by:
nicklothian
From: Brent L J. <br...@bj...> - 2004-01-02 22:14:39
|
How does the BayesianClassifier differ from a program similar to "SpamBayes" (has an Outlook plugin that uses the Bayesian algorith to classify emails). I "taught" the Bayesian classifier by teaching it with some spam emails I have. Now it returns a 0.99 classify result for practically EVERYTHING. A little background.. I exported about 6400 spam emails from Outlook to an mbox-ish format using Outport (outport.sourceforge.net). I then read the subject and body of each email and ran a BayesianClassifier.teachMatch("spam", "..."); I pass it a string consisting of all the words in the subject and body of the message (separated by a space). This ended up creating about 60,000 rows in the word_probablity database. I wrote a BayesianMatcher class for James (james.apache.org). Basically, James (smtp/pop3 server) uses a FetchPop class to pull down emails from my pop3 account and route those to a local email user. During this time I pass the email through to a matcher that uses BayesianClassifier.classify() to test whether it gets a 90% or better classification.. if so the letter is filed into "deadletters" and I never see it in Outlook.. if its less than 90% it leaves the email untouched and delivers it. Problem is.. everything is getting sent to deadletters because of a 0.99 match on everything it received. Is this not the proper way to use the classifier? Is there a different way I should use it to get the results Im looking for? Any help/suggestions would be appreciated! Thanks, - Brent |
From: Nick L. <ni...@ma...> - 2004-01-03 11:30:01
|
Are you teaching it non-matches as well as matches? It needs something to compare the probability against. Nick ----- Original Message ----- From: "Brent L Johnson" <br...@bj...> To: <cla...@li...> Sent: Saturday, January 03, 2004 8:44 AM Subject: [Classifier4j-devel] Bayesian Results 0.99 for Everything? > How does the BayesianClassifier differ from a program similar > to "SpamBayes" (has an Outlook plugin that uses the Bayesian > algorith to classify emails). I "taught" the Bayesian classifier by > teaching > it with some spam emails I have. Now it returns a 0.99 classify > result for practically EVERYTHING. > > A little background.. > > I exported about 6400 spam emails from Outlook to an mbox-ish format > using Outport (outport.sourceforge.net). I then read the subject and > body of each email and ran a BayesianClassifier.teachMatch("spam", > "..."); > > I pass it a string consisting of all the words in the subject and > body of the message (separated by a space). This ended up creating > about 60,000 rows in the word_probablity database. > > I wrote a BayesianMatcher class for James (james.apache.org). > Basically, > James (smtp/pop3 server) uses a FetchPop class to pull down emails > from my pop3 account and route those to a local email user. During this > time I pass the email through to a matcher that uses > BayesianClassifier.classify() > to test whether it gets a 90% or better classification.. if so the > letter > is filed into "deadletters" and I never see it in Outlook.. if its less > than > 90% it leaves the email untouched and delivers it. > > Problem is.. everything is getting sent to deadletters because of a 0.99 > match on everything it received. > > Is this not the proper way to use the classifier? Is there a different > way I should use it to get the results Im looking for? > > Any help/suggestions would be appreciated! > > Thanks, > > - Brent > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IBM Linux Tutorials. > Become an expert in LINUX or just sharpen your skills. Sign up for IBM's > Free Linux Tutorials. Learn everything from the bash shell to sys admin. > Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > |
From: Brent L J. <br...@bj...> - 2004-01-03 17:34:42
|
Aaahhhh - that must be the problem then. I can export some emails from my inbox and let it classify those as matches and see if that helps. And what about categories? It OK to teachMatch and teachNonMatch all the emails with the same category name? Sorry for the simple questions.. but I want to make sure Im using it right so I get the best possible probabilities for spam matching at the server :) Thanks, - Brent > -----Original Message----- > From: cla...@li... > [mailto:cla...@li...] On > Behalf Of Nick Lothian > Sent: Saturday, January 03, 2004 6:30 AM > To: cla...@li... > Subject: Re: [Classifier4j-devel] Bayesian Results 0.99 for > Everything? > > > Are you teaching it non-matches as well as matches? > > It needs something to compare the probability against. > > Nick > > ----- Original Message ----- > From: "Brent L Johnson" <br...@bj...> > To: <cla...@li...> > Sent: Saturday, January 03, 2004 8:44 AM > Subject: [Classifier4j-devel] Bayesian Results 0.99 for Everything? > > > > How does the BayesianClassifier differ from a program similar to > > "SpamBayes" (has an Outlook plugin that uses the Bayesian > algorith to > > classify emails). I "taught" the Bayesian classifier by teaching > > it with some spam emails I have. Now it returns a 0.99 classify > > result for practically EVERYTHING. > > > > A little background.. > > > > I exported about 6400 spam emails from Outlook to an > mbox-ish format > > using Outport (outport.sourceforge.net). I then read the > subject and > > body of each email and ran a BayesianClassifier.teachMatch("spam", > > "..."); > > > > I pass it a string consisting of all the words in the > subject and body > > of the message (separated by a space). This ended up > creating about > > 60,000 rows in the word_probablity database. > > > > I wrote a BayesianMatcher class for James (james.apache.org). > > Basically, James (smtp/pop3 server) uses a FetchPop class > to pull down > > emails from my pop3 account and route those to a local email user. > > During this time I pass the email through to a matcher that uses > > BayesianClassifier.classify() > > to test whether it gets a 90% or better classification.. if so the > > letter > > is filed into "deadletters" and I never see it in Outlook.. > if its less > > than > > 90% it leaves the email untouched and delivers it. > > > > Problem is.. everything is getting sent to deadletters because of a > > 0.99 match on everything it received. > > > > Is this not the proper way to use the classifier? Is there a > > different way I should use it to get the results Im looking for? > > > > Any help/suggestions would be appreciated! > > > > Thanks, > > > > - Brent > > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: IBM Linux Tutorials. Become an > > expert in LINUX or just sharpen your skills. Sign up for > IBM's Free > > Linux Tutorials. Learn everything from the bash shell to > sys admin. > > Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click > > _______________________________________________ > > Classifier4j-devel mailing list > > Cla...@li... > > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IBM Linux Tutorials. > Become an expert in LINUX or just sharpen your skills. Sign > up for IBM's Free Linux Tutorials. Learn everything from the > bash shell to sys admin. Click now! > http://ads.osdn.com/?ad_id=1278&alloc_id=3371> &op=click > > _______________________________________________ > > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > > > |
From: Nick L. <ni...@ma...> - 2004-01-04 08:45:04
|
Yes, passing the same name will be fine - or else you can pass no category name and it will just use a "default" category. Nick ----- Original Message ----- From: "Brent L Johnson" <br...@bj...> To: <cla...@li...> Sent: Sunday, January 04, 2004 4:04 AM Subject: RE: [Classifier4j-devel] Bayesian Results 0.99 for Everything? > Aaahhhh - that must be the problem then. I can export some emails > from my inbox and let it classify those as matches and see > if that helps. And what about categories? It OK to teachMatch and > teachNonMatch all the emails with the same category name? > > Sorry for the simple questions.. but I want to make sure Im using > it right so I get the best possible probabilities for spam > matching at the server :) > > Thanks, > > - Brent > > > -----Original Message----- > > From: cla...@li... > > [mailto:cla...@li...] On > > Behalf Of Nick Lothian > > Sent: Saturday, January 03, 2004 6:30 AM > > To: cla...@li... > > Subject: Re: [Classifier4j-devel] Bayesian Results 0.99 for > > Everything? > > > > > > Are you teaching it non-matches as well as matches? > > > > It needs something to compare the probability against. > > > > Nick |