Re: [Jboost-users] Questions about JBoost implementations

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Aaron,

Thanks for the prompt reply.  Your comment on using several weak
thresholding classifiers makes sense.  Changing all the values (0 ->
1) in the first data line did indeed change the classification score.

I don't think you're right about the two columns, though.  Since they
always have the same magnitude I looked into the code and saw that the
code is in fact printing out {p, -p}, where p, it seems, is
prediction.  It turns out that margins information can be generated
when the tree is created.

The generated comment for predict(String[] as) says it returns "an
array of scores corresponding to the classes: +1 and -1".  Are
"classes" the same as labels?

Thanks,
Glenn

On Mon, Dec 13, 2010 at 12:51 PM, Aaron Arvey <aa...@cb...> wrote:
> Hey Glenn,
>
> I haven't used JBoost in a while, but I have a couple of guesses that may
> answer your questions.
>
>> I ran Predict ("java -cp .:../dist/jboost.jar Predict <
>> spambase.data") against the original data.  I got two columns of
>> output that looked like
>>
>> 5.00073612523801        -5.00073612523801
>> 11.864681207163063      -11.864681207163063
>> 8.780744089260097       -8.780744089260097
>> ...
>> Why are there two columns with the same magnitudes?  I'm guessing that
>> these are is/is not spam scores, but they seem redundant.
>>
>
> Guess: One is margin and the other is classification score.  You can
> determine this by looking at labels*column1 or labels*column2 and see if
> the results match the other column.
>
>> It would seem that changing a value in the first line of spambase.data
>> would change the classification score I see above, but it doesn't.  I
>> changed the first value in
>>
>> 0,0.64,0.64,0,0.32,0,0,0,0,0,0,0.64,0,0,0,0.32,0,1.29,1.93,0,0.96,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.778,0,0,3.756,61,278,+1;
>>
>> from 0 to other values, but the first classification score
>> (5.00073612523801) didn't change.  Why is that?
>
> Guess: Boosting doesn't produce a linear classifier.  Depending on the
> number of iterations used, you may have used fewer dimensions than exist
> in the data.  In fact, even if you change every value in an example, the
> score may still be the same.  This is due to JBoost using thresholding
> weak classifiers.  If you look at the actual tree (either at the raw file
> or see documentation about visualization), you should be able to determine
> which dimension where used and at what thresholds.  If you change one of
> these dimensions so that it is on the other side of the threshold, you
> should see a change in output value.
>
> Hope that helps!
>
> Aaron
>
>
>
>
>
>
>