Re: [Classifier4j-devel] Fwd: calculateOverallProbability Questions
Status: Beta
Brought to you by:
nicklothian
From: Matt C. <MCo...@my...> - 2003-11-18 03:10:34
|
WordProbability.calculateProbability includes the following: if (matchingCount == 0) { if (nonMatchingCount == 0) { result = IClassifier.NEUTRAL_PROBABILITY; } else { result = IClassifier.LOWER_BOUND; } } else { result = BayesianClassifier.normaliseSignificance((double)matchingCount / (double) (matchingCount + nonMatchingCount)); } In my word_probability database, I currently have no "nonMatchingCount"s. Therefore all my word probabilities are turning out as LOWER_BOUND, NEUTRAL_PROBABILITY, or .99 since effectively matchingCount/matchingCount = 1. BayesianClassifier.normaliseSignificance() presumably adjusts this outcome from 1 to .99. This I believe represents a major difference between the current method and my understanding of POPFile's method. At this point, POPFile is calculating: Occurences of Word A in Category XYZ / Total Occurences of ALL words in Category XYZ. In other words: match_count of A where Category=XYZ / sum(match_count) from category XYZ. This is my interpretation of the method discussed at: http://sourceforge.net/docman/display_doc.php?docid=13334&group_id=63137 Have I overlooked something, or is this just a difference between the two calucations? Matt Collier RemoteIT mco...@my... 877-4-NEW-LAN -----Original Message----- From: "Matt Collier" <MCo...@my...> To: "Classifier4J" <cla...@li...> Date: Mon, 17 Nov 2003 15:33:19 -0600 Subject: [Classifier4j-devel] Fwd: calculateOverallProbability Questions > Can someone explain to me what is happening in calculateOverallProbability. > > The "probability" for each word drawn into this method via > calcWordsProbabilty > is .99 if atleast one occurance of word exists in the database in the given > category and .5 (Neutral) if the word does not occur in the given category. > > This does not seem right to me. > > I am not sure, when, where, how and why the probability on the words is > getting assigned as described. > > Another thing that is confusing me is that several time during to course of > this method, the variable "z" goes to 0 (zero) and the process continues. > Attached is the tail end of a log of this method. If z goes to zero over and > over, what is the point of performing this calculation. It seems the > calculation would only take into account those words that are processed after > the very last time Z goes to zero. > > I simply added: > System.out.println("Z : [" + z +"] Word : [" + wps[i].getWord()+"] > Probability : [" + wps[i].getProbability() + "]"); > > after each assignment of z in BayesianClassifer.calculateOverallProbability() > > Also, z is recalculated on each occurence of a particular word. Is this > proper? > > > Matt Collier > RemoteIT > mco...@my... > 877-4-NEW-LAN |