From: David R. <rol...@ya...> - 2007-10-16 00:01:54
|
Hi,=0A=0A I just began to use JBoost and I have some questions.=0A=0A = First, I tested the visualization tools to make sure I had the same result= s as shown on http://www.cs.ucsd.edu/~aarvey/jboost/doc.html#visualization.= I had to modify the atree2dot2ps.pl script because it does not use the "--= dir" option everywhere in the script. Is it the expected behavior ?=0A=0A = Second, I was not able to reproduce the margin output. My problem is that= I run JBoost on Windows XP and the margin script uses 'cat' and other Unix= commands. I am a C/C++ programmer, I don't know Perl nor Python. Even thou= gh these languages are similar to C I still wonder how the files spambase.t= rain.margin and spambase.train.scores can be processed to output the margin= graphic. And actually, what's the difference between these two files ? The= re only seems to be some positive/negative changes of values. I don't expec= t you to rewrite margin.py without taking advantage of Unix commands, but i= f you explain how I can get from the data files to the output graphic, I'll= code myself a Windows equivalent.=0A=0AThanks,=0A=0ADavid R.=0A=0A=0A=0A= =0A Be smarter than spam. See how smart SpamGuard is at giving junk em= ail the boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_= beta?.intl=3Dca=0A |
From: Aaron A. <aa...@cs...> - 2007-10-16 02:05:40
|
Hi David, See responses. Also know that for release 1.4 (coming out sometime this week most likely) we've completely changed the format of the output files and post-processing. We are also in the process of porting all Python/Perl code to Java. We are also developing a GUI to make the whole process more intuitive, which will probably be released in the next month. That being said... On Mon, 15 Oct 2007, David Rolland wrote: > First, I tested the visualization tools to make sure I had the same > results as shown on > http://www.cs.ucsd.edu/~aarvey/jboost/doc.html#visualization. I had to > modify the atree2dot2ps.pl script because it does not use the "--dir" > option everywhere in the script. Is it the expected behavior ? You're right. The $dirname variable is only used for the $infofilename, not the $filename. I've committed the change to CVS. See http://sourceforge.net/cvs/?group_id=195659 for details on how to get CVS access. > Second, I was not able to reproduce the margin output. My problem is > that I run JBoost on Windows XP and the margin script uses 'cat' and > other Unix commands. Yeah... the scripts were written assuming a few other things too... Really, they were written as bandages, not final solutions. We're currently working on porting all of this to Java so that we have more interoperability. > I am a C/C++ programmer, I don't know Perl nor Python. Perl is pretty out of style these, though it is amazing what some of that gross syntax can do. Python has survived a couple of fads (ruby, etc) and still seems to me the best widespread scripting language. It may be worth a look, even if not for this project. > Even though these languages are similar to C I still wonder how the > files spambase.train.margin and spambase.train.scores can be processed > to output the margin graphic. We use gnuplot as an intermediary... > And actually, what's the difference between these two files? One has the "score" of the example, the other has the "margin". The score is defined as the value predicted for a given example. The margin is the if (label of example is correct) return |score| else return - |score| where |x| is the absolute value of x. > There only seems to be some positive/negative changes of values. I don't > expect you to rewrite margin.py without taking advantage of Unix > commands, but if you explain how I can get from the data files to the > output graphic, I'll code myself a Windows equivalent. I'd say the best bet for the moment being is to grab cygwin, where everything has been tested and *seemed* to work peachy. The second best bet would probably be to wait till Wednesday/Thursday for the next release (when all of your changes would probably be rendered somewhat moot anyways). Third best bet is edit the code. The only place where UNIX commands are used in 1.3.1 margin.py are lines 244--255. That is where a label file is created. All labels in JBoost are converted into binary values +1, -1. If you have more than two labels, just wait till later this week. if you have two labels, then just figure out which is mapped to "+1" and which to "-1" (should be fairly straight forward). Create the labels file, which is just a series of 1, -1 read in. The formula for margin is (line 168) margin = score * label This is identical to what I state above for when +1,-1 are used. So if you write a script to create a label file, you can specify the file on the command line (via --labels=...), and all your problems should be solved. Let me know if you have any other questions/comments. Also, out of curiosity, for what classification task are you using JBoost? Aaron |
From: Aaron A. <aa...@cs...> - 2007-10-24 00:09:04
|
David, The new version of JBoost has been released. The output files are much esaier to work with, so you'll be able to implement the changes you discuss below much faster. This output format is now fairly stable, but may undergo changes in the future. However, these changes will be minor and should not irrevocably break any code you write. I'm going to post some comments on the file format on the website. Till then, feel free to ask me any questions you have. Aaron On Mon, 15 Oct 2007, David Rolland wrote: > Hi, > > I just began to use JBoost and I have some questions. > > First, I tested the visualization tools to make sure I had the same > results as shown on > http://www.cs.ucsd.edu/~aarvey/jboost/doc.html#visualization. I had to > modify the atree2dot2ps.pl script because it does not use the "--dir" > option everywhere in the script. Is it the expected behavior ? > > Second, I was not able to reproduce the margin output. My problem is > that I run JBoost on Windows XP and the margin script uses 'cat' and > other Unix commands. I am a C/C++ programmer, I don't know Perl nor > Python. Even though these languages are similar to C I still wonder how > the files spambase.train.margin and spambase.train.scores can be > processed to output the margin graphic. And actually, what's the > difference between these two files ? There only seems to be some > positive/negative changes of values. I don't expect you to rewrite > margin.py without taking advantage of Unix commands, but if you explain > how I can get from the data files to the output graphic, I'll code > myself a Windows equivalent. > > Thanks, > > David R. > > > > > Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_beta?.intl=ca > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > jboost-users mailing list > jbo...@li... > https://lists.sourceforge.net/lists/listinfo/jboost-users > |