Re: [Jboost-users] Documention Deficiencies

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Dear Jason,

See responses inline below.

On Tue, 30 Sep 2008, Jason Kania wrote:
> In looking at the examples, I am finding that the lack of documentation 
> on the examples themselves is really hurting my attempts to understand 
> what the classifier is outputting. It would make sense, for example, if 
> the noisy line example was completed from end to end with details on the 
> building and execution of the example as well as some description of the 
> input and interpretation of the results for some examples. The 
> information that is provided is insufficient to understand.
> 
> I am able to build it, but when I ran it, the lack of some instructions 
> on the command line and very terse comments in the code meant that I was 
> unsure what I should be typing on the command line. I had to figure it 
> out which is quite silly for someone trying to use examples in order to 
> learn the application.

There is a README file in the "demo" directory that describes the noisy 
line example.

There are also a couple of very simple examples of how to run the program 
on the website at http://jboost.sourceforge.net/examples.html and more in 
depth description of the options at 
http://jboost.sourceforge.net/doc.html.

To use the program, you should never have to look at the code.  Is there 
anything you found in the code that isn't in the documentation?  If so, 
please let me know and I'll post the relevant information to the website.

> For the output, the example described in Wikipedia yields a single 
> number confidence output and this is intuitive whereas the output of two 
> numbers makes little sense to me. 

I'm not sure what you're referring to when you mention the "two numbers."  
Where are you seeing these two numbers? If you're referring to the 
.boosting.info file, it is described at 
http://jboost.sourceforge.net/doc.html#boost_format.

> Is the confidence the summation of the outputs in the vector, is it 
> multidimensional or something else? This information should be on the 
> site so others can attempt to make use of the data.

The classification is based on the sum of hypotheses.  If the sum is 
positive, a positive label is claimed.  If the sum is negative, a negative 
labeled is claimed.  The absolute value of the sum can be considered a 
measure of confidence.

Let me know if the above addresses your concerns.

Aaron