Re: [Classifier4j-devel] Fwd: calculateOverallProbability Questions
Status: Beta
Brought to you by:
nicklothian
|
From: Matt C. <MCo...@my...> - 2003-11-18 03:10:34
|
WordProbability.calculateProbability includes the following:
if (matchingCount == 0) {
if (nonMatchingCount == 0) {
result = IClassifier.NEUTRAL_PROBABILITY;
} else {
result = IClassifier.LOWER_BOUND;
}
} else {
result = BayesianClassifier.normaliseSignificance((double)matchingCount /
(double) (matchingCount + nonMatchingCount));
}
In my word_probability database, I currently have no "nonMatchingCount"s.
Therefore all my word probabilities are turning out as LOWER_BOUND,
NEUTRAL_PROBABILITY, or .99 since effectively matchingCount/matchingCount =
1. BayesianClassifier.normaliseSignificance() presumably adjusts this
outcome from 1 to .99.
This I believe represents a major difference between the current method and my
understanding of POPFile's method.
At this point, POPFile is calculating:
Occurences of Word A in Category XYZ / Total Occurences of ALL words in
Category XYZ.
In other words:
match_count of A where Category=XYZ / sum(match_count) from category XYZ.
This is my interpretation of the method discussed at:
http://sourceforge.net/docman/display_doc.php?docid=13334&group_id=63137
Have I overlooked something, or is this just a difference between the two
calucations?
Matt Collier
RemoteIT
mco...@my...
877-4-NEW-LAN
-----Original Message-----
From: "Matt Collier" <MCo...@my...>
To: "Classifier4J" <cla...@li...>
Date: Mon, 17 Nov 2003 15:33:19 -0600
Subject: [Classifier4j-devel] Fwd: calculateOverallProbability Questions
> Can someone explain to me what is happening in calculateOverallProbability.
>
> The "probability" for each word drawn into this method via
> calcWordsProbabilty
> is .99 if atleast one occurance of word exists in the database in the given
> category and .5 (Neutral) if the word does not occur in the given category.
>
> This does not seem right to me.
>
> I am not sure, when, where, how and why the probability on the words is
> getting assigned as described.
>
> Another thing that is confusing me is that several time during to course of
> this method, the variable "z" goes to 0 (zero) and the process continues.
> Attached is the tail end of a log of this method. If z goes to zero over and
> over, what is the point of performing this calculation. It seems the
> calculation would only take into account those words that are processed after
> the very last time Z goes to zero.
>
> I simply added:
> System.out.println("Z : [" + z +"] Word : [" + wps[i].getWord()+"]
> Probability : [" + wps[i].getProbability() + "]");
>
> after each assignment of z in BayesianClassifer.calculateOverallProbability()
>
> Also, z is recalculated on each occurence of a particular word. Is this
> proper?
>
>
> Matt Collier
> RemoteIT
> mco...@my...
> 877-4-NEW-LAN
|