From: David R. <rol...@ya...> - 2007-10-25 03:27:30
|
Hi Aaron,=0A=0AI'm starting a new thread because I never received your repl= y through my mail, so I can't reply to the thread.=0A=0AI just downloaded v= ersion 1.4, I'll take a look at it really soon. I'm still in the process of= evaluating JBoost to see if it fits my needs. So any change to the ouput f= iles format doesn't affect me right now. The GUI will be a very interesting= tool, even though I already made my own batch files to call JBoost.=0A=0AY= our explanation of score/margin is clear, but there is still something that= I don't understand. The graphic for the margin has "Margin" and "Cumulativ= e Distribution" as axis labels. In this graphic, the margin values are betw= een -1 and +1. This means that the values have been remapped from the formu= la you gave me (margin =3D score * label). Am I right ? This remap was actu= ally the purpose of my initial question, but I wasn't clear on that point.= =0A=0AI began using Adaboost for image analysis (mainly OCR) where I was wo= rking, but seeing it's interesting performances, I want to try it in other = domains. I'll give you feedback it it works well. From my previous experien= ce with OCR, Adaboost isn't magical, the input data must be chosen carefull= y and right now I'm working on the input before really using JBoost.=0A=0A = Thanks for the previous answers,=0A=0ADavid R.=0A=0A=0A--------------------= ----=0A=0AHi David,=0A=0ASee responses. Also know that for release 1.4 (com= ing out sometime this =0Aweek most likely) we've completely changed the fo= rmat of the output files =0Aand post-processing. We are also in the proces= s of porting all =0APython/Perl code to Java. We are also developing a GUI= to make the whole =0Aprocess more intuitive, which will probably be relea= sed in the next month. =0AThat being said...=0A=0AOn Mon, 15 Oct 2007, Dav= id Rolland wrote:=0A=0A> First, I tested the visualization tools to make su= re I had the same =0A> results as shown on =0A> http://www.cs.ucsd.edu/~a= arvey/jboost/doc.html#visualization. I had to =0A> modify the atree2dot2ps= .pl script because it does not use the "--dir" =0A> option everywhere in t= he script. Is it the expected behavior ?=0A=0AYou're right. The $dirname va= riable is only used for the $infofilename, =0Anot the $filename. I've comm= itted the change to CVS. See =0Ahttp://sourceforge.net/cvs/?group_id=3D195= 659 for details on how to get CVS =0Aaccess.=0A=0A> Second, I was not able= to reproduce the margin output. My problem is =0A> that I run JBoost on W= indows XP and the margin script uses 'cat' and =0A> other Unix commands.= =0A=0AYeah... the scripts were written assuming a few other things too... = =0AReally, they were written as bandages, not final solutions. We're =0Acu= rrently working on porting all of this to Java so that we have more =0Aint= eroperability.=0A=0A> I am a C/C++ programmer, I don't know Perl nor Python= .=0A=0APerl is pretty out of style these, though it is amazing what some of= that =0Agross syntax can do. Python has survived a couple of fads (ruby, = etc) and =0Astill seems to me the best widespread scripting language. It m= ay be worth =0Aa look, even if not for this project.=0A=0A> Even though th= ese languages are similar to C I still wonder how the =0A> files spambase.= train.margin and spambase.train.scores can be processed =0A> to output the= margin graphic.=0A=0AWe use gnuplot as an intermediary...=0A=0A> And actua= lly, what's the difference between these two files?=0A=0AOne has the "score= " of the example, the other has the "margin". The score =0Ais defined as t= he value predicted for a given example. The margin is the=0A=0Aif (label of= example is correct)=0Areturn |score|=0Aelse=0Areturn - |score|=0A=0Awhere = |x| is the absolute value of x.=0A=0A> There only seems to be some positive= /negative changes of values. I don't =0A> expect you to rewrite margin.py = without taking advantage of Unix =0A> commands, but if you explain how I c= an get from the data files to the =0A> output graphic, I'll code myself a = Windows equivalent.=0A=0AI'd say the best bet for the moment being is to gr= ab cygwin, where =0Aeverything has been tested and *seemed* to work peachy= . The second =0Abest bet would probably be to wait till Wednesday/Thursday= for the next =0Arelease (when all of your changes would probably be rende= red somewhat moot =0Aanyways). Third best bet is edit the code.=0A=0AThe o= nly place where UNIX commands are used in 1.3.1 margin.py are lines =0A244= --255. That is where a label file is created. All labels in JBoost =0Aare = converted into binary values +1, -1. If you have more than two =0Alabels, = just wait till later this week. if you have two labels, then just =0Afigur= e out which is mapped to "+1" and which to "-1" (should be fairly =0Astrai= ght forward). Create the labels file, which is just a series of 1, =0A-1 r= ead in. The formula for margin is (line 168)=0A=0Amargin =3D score * label= =0A=0AThis is identical to what I state above for when +1,-1 are used. So i= f =0Ayou write a script to create a label file, you can specify the file o= n the =0Acommand line (via --labels=3D...), and all your problems should b= e solved.=0A=0ALet me know if you have any other questions/comments.=0A=0AA= lso, out of curiosity, for what classification task are you using JBoost?= =0A=0AAaron=0A=0A=0A=0A=0A Be smarter than spam. See how smart SpamGua= rd is at giving junk email the boot with the All-new Yahoo! Mail at http://= mrd.mail.yahoo.com/try_beta?.intl=3Dca=0A |