[Classifier4j-devel] RE: Fwd: calculateOverallProbability Questio ns
Status: Beta
Brought to you by:
nicklothian
From: David S. <dav...@ya...> - 2004-09-09 18:58:48
|
I just stumbled across this thread from last year: http://sourceforge.net/mailarchive/forum.php?thread_id=3483166&forum_id=34026 I'm having similar "troubles" in the sense that classify() is returning 0.99 too often and it's because some of the words either have zero matches or zero non-matches. The question is, does it make sense to ignore words that don't have at least 1 match and at least 1 non-match? It's easy enough to extend BayesianClassifier and override calculateOverallProbability() and in my experiment it seems to work "better", though I guess you could argue it's not fair to ignore such words, as maybe a given word will always be a match or non-match such it should be considered somehow. Anyway the code mode I did was at the bottom here of this fragment - just added 2 lines: protected double calculateOverallProbability(WordProbability[] wps) { if (wps == null || wps.length == 0) { return IClassifier.NEUTRAL_PROBABILITY; } else { // we need to calculate xy/(xy + z) // where z = (1-x)(1-y) // firstly, calculate z and xy double z = 0d; double xy = 0d; for (int i = 0; i < wps.length; i++) { // dss begin if ( wps[ i].getMatchingCount() == 0) continue; if ( wps[ i].getNonMatchingCount() == 0) continue; // dss end ... ===== _______________________________ Do you Yahoo!? Shop for Back-to-School deals on Yahoo! Shopping. http://shopping.yahoo.com/backtoschool |