|
From: Glenn M. <gle...@gm...> - 2010-12-13 19:17:50
|
Hi,
I'm looking into JBoost to do text classification. I've generated
Java output code (Predict.java) with "demo/$ java -Xmx100M
jboost.controller.Controller -p 2 -S spambase -j spambase.java", and
run it, and had some questions.
Incidentally, to compile with "javac -cp ../dist/jboost.jar
Predict.java" from demo/ I had to change the paths of some classes in
the main() method.
I ran Predict ("java -cp .:../dist/jboost.jar Predict <
spambase.data") against the original data. I got two columns of
output that looked like
5.00073612523801 -5.00073612523801
11.864681207163063 -11.864681207163063
8.780744089260097 -8.780744089260097
...
Why are there two columns with the same magnitudes? I'm guessing that
these are is/is not spam scores, but they seem redundant.
It would seem that changing a value in the first line of spambase.data
would change the classification score I see above, but it doesn't. I
changed the first value in
0,0.64,0.64,0,0.32,0,0,0,0,0,0,0.64,0,0,0,0.32,0,1.29,1.93,0,0.96,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.778,0,0,3.756,61,278,+1;
from 0 to other values, but the first classification score
(5.00073612523801) didn't change. Why is that?
Thanks,
Glenn
|