|
From: Glenn M. <gle...@gm...> - 2010-12-13 22:16:38
|
Hi Aaron,
Thanks for the prompt reply. Your comment on using several weak
thresholding classifiers makes sense. Changing all the values (0 ->
1) in the first data line did indeed change the classification score.
I don't think you're right about the two columns, though. Since they
always have the same magnitude I looked into the code and saw that the
code is in fact printing out {p, -p}, where p, it seems, is
prediction. It turns out that margins information can be generated
when the tree is created.
The generated comment for predict(String[] as) says it returns "an
array of scores corresponding to the classes: +1 and -1". Are
"classes" the same as labels?
Thanks,
Glenn
On Mon, Dec 13, 2010 at 12:51 PM, Aaron Arvey <aa...@cb...> wrote:
> Hey Glenn,
>
> I haven't used JBoost in a while, but I have a couple of guesses that may
> answer your questions.
>
>> I ran Predict ("java -cp .:../dist/jboost.jar Predict <
>> spambase.data") against the original data. I got two columns of
>> output that looked like
>>
>> 5.00073612523801 -5.00073612523801
>> 11.864681207163063 -11.864681207163063
>> 8.780744089260097 -8.780744089260097
>> ...
>> Why are there two columns with the same magnitudes? I'm guessing that
>> these are is/is not spam scores, but they seem redundant.
>>
>
> Guess: One is margin and the other is classification score. You can
> determine this by looking at labels*column1 or labels*column2 and see if
> the results match the other column.
>
>> It would seem that changing a value in the first line of spambase.data
>> would change the classification score I see above, but it doesn't. I
>> changed the first value in
>>
>> 0,0.64,0.64,0,0.32,0,0,0,0,0,0,0.64,0,0,0,0.32,0,1.29,1.93,0,0.96,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.778,0,0,3.756,61,278,+1;
>>
>> from 0 to other values, but the first classification score
>> (5.00073612523801) didn't change. Why is that?
>
> Guess: Boosting doesn't produce a linear classifier. Depending on the
> number of iterations used, you may have used fewer dimensions than exist
> in the data. In fact, even if you change every value in an example, the
> score may still be the same. This is due to JBoost using thresholding
> weak classifiers. If you look at the actual tree (either at the raw file
> or see documentation about visualization), you should be able to determine
> which dimension where used and at what thresholds. If you change one of
> these dimensions so that it is on the other side of the threshold, you
> should see a change in output value.
>
> Hope that helps!
>
> Aaron
>
>
>
>
>
>
>
|