From: David R. <rol...@ya...> - 2007-10-25 03:27:30
|
Hi Aaron,=0A=0AI'm starting a new thread because I never received your repl= y through my mail, so I can't reply to the thread.=0A=0AI just downloaded v= ersion 1.4, I'll take a look at it really soon. I'm still in the process of= evaluating JBoost to see if it fits my needs. So any change to the ouput f= iles format doesn't affect me right now. The GUI will be a very interesting= tool, even though I already made my own batch files to call JBoost.=0A=0AY= our explanation of score/margin is clear, but there is still something that= I don't understand. The graphic for the margin has "Margin" and "Cumulativ= e Distribution" as axis labels. In this graphic, the margin values are betw= een -1 and +1. This means that the values have been remapped from the formu= la you gave me (margin =3D score * label). Am I right ? This remap was actu= ally the purpose of my initial question, but I wasn't clear on that point.= =0A=0AI began using Adaboost for image analysis (mainly OCR) where I was wo= rking, but seeing it's interesting performances, I want to try it in other = domains. I'll give you feedback it it works well. From my previous experien= ce with OCR, Adaboost isn't magical, the input data must be chosen carefull= y and right now I'm working on the input before really using JBoost.=0A=0A = Thanks for the previous answers,=0A=0ADavid R.=0A=0A=0A--------------------= ----=0A=0AHi David,=0A=0ASee responses. Also know that for release 1.4 (com= ing out sometime this =0Aweek most likely) we've completely changed the fo= rmat of the output files =0Aand post-processing. We are also in the proces= s of porting all =0APython/Perl code to Java. We are also developing a GUI= to make the whole =0Aprocess more intuitive, which will probably be relea= sed in the next month. =0AThat being said...=0A=0AOn Mon, 15 Oct 2007, Dav= id Rolland wrote:=0A=0A> First, I tested the visualization tools to make su= re I had the same =0A> results as shown on =0A> http://www.cs.ucsd.edu/~a= arvey/jboost/doc.html#visualization. I had to =0A> modify the atree2dot2ps= .pl script because it does not use the "--dir" =0A> option everywhere in t= he script. Is it the expected behavior ?=0A=0AYou're right. The $dirname va= riable is only used for the $infofilename, =0Anot the $filename. I've comm= itted the change to CVS. See =0Ahttp://sourceforge.net/cvs/?group_id=3D195= 659 for details on how to get CVS =0Aaccess.=0A=0A> Second, I was not able= to reproduce the margin output. My problem is =0A> that I run JBoost on W= indows XP and the margin script uses 'cat' and =0A> other Unix commands.= =0A=0AYeah... the scripts were written assuming a few other things too... = =0AReally, they were written as bandages, not final solutions. We're =0Acu= rrently working on porting all of this to Java so that we have more =0Aint= eroperability.=0A=0A> I am a C/C++ programmer, I don't know Perl nor Python= .=0A=0APerl is pretty out of style these, though it is amazing what some of= that =0Agross syntax can do. Python has survived a couple of fads (ruby, = etc) and =0Astill seems to me the best widespread scripting language. It m= ay be worth =0Aa look, even if not for this project.=0A=0A> Even though th= ese languages are similar to C I still wonder how the =0A> files spambase.= train.margin and spambase.train.scores can be processed =0A> to output the= margin graphic.=0A=0AWe use gnuplot as an intermediary...=0A=0A> And actua= lly, what's the difference between these two files?=0A=0AOne has the "score= " of the example, the other has the "margin". The score =0Ais defined as t= he value predicted for a given example. The margin is the=0A=0Aif (label of= example is correct)=0Areturn |score|=0Aelse=0Areturn - |score|=0A=0Awhere = |x| is the absolute value of x.=0A=0A> There only seems to be some positive= /negative changes of values. I don't =0A> expect you to rewrite margin.py = without taking advantage of Unix =0A> commands, but if you explain how I c= an get from the data files to the =0A> output graphic, I'll code myself a = Windows equivalent.=0A=0AI'd say the best bet for the moment being is to gr= ab cygwin, where =0Aeverything has been tested and *seemed* to work peachy= . The second =0Abest bet would probably be to wait till Wednesday/Thursday= for the next =0Arelease (when all of your changes would probably be rende= red somewhat moot =0Aanyways). Third best bet is edit the code.=0A=0AThe o= nly place where UNIX commands are used in 1.3.1 margin.py are lines =0A244= --255. That is where a label file is created. All labels in JBoost =0Aare = converted into binary values +1, -1. If you have more than two =0Alabels, = just wait till later this week. if you have two labels, then just =0Afigur= e out which is mapped to "+1" and which to "-1" (should be fairly =0Astrai= ght forward). Create the labels file, which is just a series of 1, =0A-1 r= ead in. The formula for margin is (line 168)=0A=0Amargin =3D score * label= =0A=0AThis is identical to what I state above for when +1,-1 are used. So i= f =0Ayou write a script to create a label file, you can specify the file o= n the =0Acommand line (via --labels=3D...), and all your problems should b= e solved.=0A=0ALet me know if you have any other questions/comments.=0A=0AA= lso, out of curiosity, for what classification task are you using JBoost?= =0A=0AAaron=0A=0A=0A=0A=0A Be smarter than spam. See how smart SpamGua= rd is at giving junk email the boot with the All-new Yahoo! Mail at http://= mrd.mail.yahoo.com/try_beta?.intl=3Dca=0A |
From: Aaron A. <aa...@cs...> - 2007-10-25 03:50:33
|
On Wed, 24 Oct 2007, David Rolland wrote: > I'm starting a new thread because I never received your reply through my > mail, so I can't reply to the thread. Huh... I'll see if that's a problem on my end... > I just downloaded version 1.4, I'll take a look at it really soon. I'm > still in the process of evaluating JBoost to see if it fits my needs. So > any change to the ouput files format doesn't affect me right now. The > GUI will be a very interesting tool, even though I already made my own > batch files to call JBoost. The GUI didn't make the 1.4 cut :-(. It'll be in 1.4.1 or 1.4.2. I can send you a version with GUI if you'd like. > Your explanation of score/margin is clear, but there is still something > that I don't understand. The graphic for the margin has "Margin" and > "Cumulative Distribution" as axis labels. In this graphic, the margin > values are between -1 and +1. This means that the values have been > remapped from the formula you gave me (margin = score * label). Am I > right ? This remap was actually the purpose of my initial question, but > I wasn't clear on that point. Yes, the margin is remapped into the interval [-1,+1]. An unnormalized margin doesn't really mean much (and all theorems have been proven for the normalized margin), so we normalize it. > I began using Adaboost for image analysis (mainly OCR) where I was > working, but seeing it's interesting performances, I want to try it in > other domains. I'll give you feedback it it works well. From my previous > experience with OCR, Adaboost isn't magical, the input data must be > chosen carefully and right now I'm working on the input before really > using JBoost. All statistics and machine learning algorithms are only as good as the input. One nice thing about AdaBoost (in contrast to other learning algorithms) is that feature selection is less important. Thus, you can throw as many features as you can imagine at the algorithm and let it figure out which ones are relevant. At some point in the future we're planning on putting cascaded predictors similar to Viola & Jones. This may be relevant to your work. Aaron > Hi David, > > See responses. Also know that for release 1.4 (coming out sometime this > week most likely) we've completely changed the format of the output files > and post-processing. We are also in the process of porting all > Python/Perl code to Java. We are also developing a GUI to make the whole > process more intuitive, which will probably be released in the next month. > That being said... > > On Mon, 15 Oct 2007, David Rolland wrote: > >> First, I tested the visualization tools to make sure I had the same >> results as shown on >> http://www.cs.ucsd.edu/~aarvey/jboost/doc.html#visualization. I had to >> modify the atree2dot2ps.pl script because it does not use the "--dir" >> option everywhere in the script. Is it the expected behavior ? > > You're right. The $dirname variable is only used for the $infofilename, > not the $filename. I've committed the change to CVS. See > http://sourceforge.net/cvs/?group_id=195659 for details on how to get CVS > access. > >> Second, I was not able to reproduce the margin output. My problem is >> that I run JBoost on Windows XP and the margin script uses 'cat' and >> other Unix commands. > > Yeah... the scripts were written assuming a few other things too... > Really, they were written as bandages, not final solutions. We're > currently working on porting all of this to Java so that we have more > interoperability. > >> I am a C/C++ programmer, I don't know Perl nor Python. > > Perl is pretty out of style these, though it is amazing what some of that > gross syntax can do. Python has survived a couple of fads (ruby, etc) and > still seems to me the best widespread scripting language. It may be worth > a look, even if not for this project. > >> Even though these languages are similar to C I still wonder how the >> files spambase.train.margin and spambase.train.scores can be processed >> to output the margin graphic. > > We use gnuplot as an intermediary... > >> And actually, what's the difference between these two files? > > One has the "score" of the example, the other has the "margin". The score > is defined as the value predicted for a given example. The margin is the > > if (label of example is correct) > return |score| > else > return - |score| > > where |x| is the absolute value of x. > >> There only seems to be some positive/negative changes of values. I don't >> expect you to rewrite margin.py without taking advantage of Unix >> commands, but if you explain how I can get from the data files to the >> output graphic, I'll code myself a Windows equivalent. > > I'd say the best bet for the moment being is to grab cygwin, where > everything has been tested and *seemed* to work peachy. The second > best bet would probably be to wait till Wednesday/Thursday for the next > release (when all of your changes would probably be rendered somewhat moot > anyways). Third best bet is edit the code. > > The only place where UNIX commands are used in 1.3.1 margin.py are lines > 244--255. That is where a label file is created. All labels in JBoost > are converted into binary values +1, -1. If you have more than two > labels, just wait till later this week. if you have two labels, then just > figure out which is mapped to "+1" and which to "-1" (should be fairly > straight forward). Create the labels file, which is just a series of 1, > -1 read in. The formula for margin is (line 168) > > margin = score * label > > This is identical to what I state above for when +1,-1 are used. So if > you write a script to create a label file, you can specify the file on the > command line (via --labels=...), and all your problems should be solved. > > Let me know if you have any other questions/comments. > > Also, out of curiosity, for what classification task are you using JBoost? > > Aaron > > > > > Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_beta?.intl=ca > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > jboost-users mailing list > jbo...@li... > https://lists.sourceforge.net/lists/listinfo/jboost-users > |