From: Aaron A. <aa...@cs...> - 2007-10-16 02:05:40
|
Hi David, See responses. Also know that for release 1.4 (coming out sometime this week most likely) we've completely changed the format of the output files and post-processing. We are also in the process of porting all Python/Perl code to Java. We are also developing a GUI to make the whole process more intuitive, which will probably be released in the next month. That being said... On Mon, 15 Oct 2007, David Rolland wrote: > First, I tested the visualization tools to make sure I had the same > results as shown on > http://www.cs.ucsd.edu/~aarvey/jboost/doc.html#visualization. I had to > modify the atree2dot2ps.pl script because it does not use the "--dir" > option everywhere in the script. Is it the expected behavior ? You're right. The $dirname variable is only used for the $infofilename, not the $filename. I've committed the change to CVS. See http://sourceforge.net/cvs/?group_id=195659 for details on how to get CVS access. > Second, I was not able to reproduce the margin output. My problem is > that I run JBoost on Windows XP and the margin script uses 'cat' and > other Unix commands. Yeah... the scripts were written assuming a few other things too... Really, they were written as bandages, not final solutions. We're currently working on porting all of this to Java so that we have more interoperability. > I am a C/C++ programmer, I don't know Perl nor Python. Perl is pretty out of style these, though it is amazing what some of that gross syntax can do. Python has survived a couple of fads (ruby, etc) and still seems to me the best widespread scripting language. It may be worth a look, even if not for this project. > Even though these languages are similar to C I still wonder how the > files spambase.train.margin and spambase.train.scores can be processed > to output the margin graphic. We use gnuplot as an intermediary... > And actually, what's the difference between these two files? One has the "score" of the example, the other has the "margin". The score is defined as the value predicted for a given example. The margin is the if (label of example is correct) return |score| else return - |score| where |x| is the absolute value of x. > There only seems to be some positive/negative changes of values. I don't > expect you to rewrite margin.py without taking advantage of Unix > commands, but if you explain how I can get from the data files to the > output graphic, I'll code myself a Windows equivalent. I'd say the best bet for the moment being is to grab cygwin, where everything has been tested and *seemed* to work peachy. The second best bet would probably be to wait till Wednesday/Thursday for the next release (when all of your changes would probably be rendered somewhat moot anyways). Third best bet is edit the code. The only place where UNIX commands are used in 1.3.1 margin.py are lines 244--255. That is where a label file is created. All labels in JBoost are converted into binary values +1, -1. If you have more than two labels, just wait till later this week. if you have two labels, then just figure out which is mapped to "+1" and which to "-1" (should be fairly straight forward). Create the labels file, which is just a series of 1, -1 read in. The formula for margin is (line 168) margin = score * label This is identical to what I state above for when +1,-1 are used. So if you write a script to create a label file, you can specify the file on the command line (via --labels=...), and all your problems should be solved. Let me know if you have any other questions/comments. Also, out of curiosity, for what classification task are you using JBoost? Aaron |