RE: [Classifier4j-devel] surprising classify score 0.01
Status: Beta
Brought to you by:
nicklothian
From: Nick L. <nl...@es...> - 2004-01-07 22:29:34
|
> > I am enjoying experimenting with your Classifer4J 0.5, > but I ran across a result I did not expect. I have > trained a BayesianClassifier with 22 positive examples > and 1600 negative examples. Many of the positive > examples contain the word "http". None of the > negative examples contain this word. > > The surprising result is that the score of a sentence > with "http" is 0.01. Can you help me to understand > why? > > Here is the sentence and the WordProbability > probabilities for each of the words in the sentence > that were in the training data: > > score = 0.01 for "Mozilla/4.0 (compatible; > grub-client-1.3.7; Crawl your own stuff with > http://grub.org)" > 0.11822660098522167 Mozilla > 0.020618556701030927 4 > 0.07223476297968397 0 > 0.029239766081871343 compatible > 0.10619469026548672 1 > 0.5454545454545454 3 > 0.01 7 > 0.99 http > That sound right to me The math goes like this (and I'm going to round those number off, because I can't be bothered typing them into my calculator): score = ((0.11)(0.02)(0.07)(0.02)(0.11)(0.55)(0.01)(0.99))/((0.11)(0.02)(0.07)(0.02) (0.11)(0.55)(0.01)(0.99) + (1 - 0.11)(1 - 0.02)(1 - 0.07)(1 - 0.02)(1 - 0.11)(1 - 0.55)(1 - 0.01)(1 - 0.99)) = 0.000000001844766/(0.000000001844766 + (0.89)(0.98)(0.93)(0.98)(0.89)(0.45)(0.99)(0.01)) = 0.000000001844766/(0.000000001844766 + 0.003151830266046) = 0.000000001844766/0.003151832110812 = 0.00000058529957660871 Classifier4J has a cut-off system where anything under 0.01 gets 0.01. Does that help explain things? This code for this is in net.sf.classifier4J.bayesian.BayesianClassifier (see <http://classifier4j.sourceforge.net/xref/net/sf/classifier4J/bayesian/Bayes ianClassifier.html>) Nick Lothian |