Re: [Jboost-users] Question regarding matlab output of classifier

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hey Aaron,
	OK, so when I run "../jboost -numRounds 10 -a 10 -S stem" I get an  
empty output in the .boosting.info files. I am using release 1.4, but  
I can try the repository version.
	Thanks!
	Viren

On Aug 13, 2008, at 2:58 PM, Aaron Arvey wrote:

Hey Viren,

I just tried out

cd jboost/demo
../jboost -numRounds 10 -a 9 -S stem
cp stem.test.boosting.info stem.test.boosting.info.bak
../jboost -numRounds 10 -a -2 -S stem
sdiff stem.test.boosting.info.bak stem.test.boosting.info

And I see that this outputs the second to last iteration.  When I try

cd jboost/demo
../jboost -numRounds 10 -a 10 -S stem
cp stem.test.boosting.info stem.test.boosting.info.bak
../jboost -numRounds 10 -a -2 -S stem
sdiff stem.test.boosting.info.bak stem.test.boosting.info

I see that the final iteration is output.

Let me know what you see when you run the above.  If you see something  
different, perhaps the used to be a bug and it was corrected. The code  
to output files by the "-a" switch was recently updated, so perhaps  
this bug was corrected (I updated it and have no memory of fixing this  
bug, but perhaps I did...).  Are you perhaps using an old version of  
JBoost? Perhaps try out the cvs repository and see if that fixes your  
problem.

Aaron

On Wed, 13 Aug 2008, Viren Jain wrote:

> Thanks again, Aaron.
>
> I double checked things and it seems I still discrepancies in the  
> classifier outputs. The exact jboost command I am using is:
>
> ... jboost.controller.Controller -S test_old_background_mporder - 
> numRounds 300 -b LogLossBoost -ATreeType ADD_ROOT -a 299 -m  
> classify_background.m
>
> I assume there is some sort of 0 counting, since if I use -a 300  
> the .info.testing and .info.training file are 0 bytes. So if this is  
> correct, then test_old_background_mporder.test.boosting.info should  
> have identical outputs to those generated from the same examples by  
> using classify_background.m?
>
> Again, thanks so much!
> Viren
>
>
>
> On Aug 13, 2008, at 1:11 PM, Aaron Arvey wrote:
>
> On Wed, 13 Aug 2008, Viren Jain wrote:
>
>> I'm actually using text strings for the labels. i.e., in the spec  
>> file i have  line "labels		(merge, split)" and then for each  
>> example in training/test, I output the appropriate string. Do you  
>> recommend I use (-1,1) instead?
>
> That's fine.  I just assumed that since you said the labels were  
> inverted, that meant you were using -1/+1.  Using text is perfectly  
> okay.
>
>> Also, what is the iteration on which Jboost outputs the matlab file  
>> when I use the -m option? The last one?
>
> Yes, it is the last iteration.  There should probably be an option  
> (like -a) to output this more often.
>
> Aaron
>
>
>> On Aug 13, 2008, at 12:43 PM, Aaron Arvey wrote:
>> Hi Viren,
>> The inverted label is a result of JBoost using it's own internal  
>> labeling system.  If you swap the order of how you specify the  
>> labels (i.e. instead of "labels (1,-1)" you do "labels (-1,1)")  
>> you'll get the correct label.
>> I haven't heard about the difference in score before.  Are you  
>> perhaps looking at the scores for the wrong iteration?  Are you  
>> using "-a -1" or "-a -2" switches to obtain the appropriate score/ 
>> margin output files? Are you perhaps getting training and testing  
>> sets mixed up?
>> I just tested ADD_ROOT on the spambase dataset (in the demo  
>> directory) and it looks like everything is fine.  If you can send  
>> your train/test files or reproduce the bug on the spambase dataset,  
>> please send me the exact parameters you're using and I'll see if  
>> it's a bug, poor documentation, or a misunderstanding of some sort.
>> Thanks for the heads up on the potential bug in the matlab scores.
>> Aaron
>> On Wed, 13 Aug 2008, Viren Jain wrote:
>>> I trained a LogLossBoost classifier with -ATreeType ADD_ROOT using  
>>> Jboost. I also asked it to output a matlab script I could use to  
>>> classify examples with in the future. However, I was wondering why  
>>> the matlab script outputs slightly different values than I would  
>>> get by classifying the training/test set directly using Jboost  
>>> (for example, the sign of the classifier output is always opposite  
>>> to what Jboost produces, and at most I have seen a 0.1469  
>>> discrepancy in the actual value after accounting for the sign  
>>> issue). Has anyone encountered this issue, or am I perhaps doing  
>>> something incorrectly?