You can subscribe to this list here.
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(7) |
Nov
|
Dec
(2) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2008 |
Jan
(8) |
Feb
(4) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(7) |
Aug
(15) |
Sep
(5) |
Oct
|
Nov
(3) |
Dec
|
2009 |
Jan
(5) |
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
(5) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(3) |
2011 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Filippo <son...@ya...> - 2008-02-24 15:43:45
|
Hi, I'm studying JBoost 1.4 in this weekend on Windows XP. I succeded in run the demos and my training set. The tree is successfully build. Now I'd like to use the tree with java since it has the predictor just built in and I don't want to write a generic source in c to manage all types of input. But I cannot understand how to obtain the predictor.java from stem.java (for example). If I cut the section highlighted in stem.java, javac send me this error: predict.java:1: 'class' or 'interface' expected static private jboost.examples.Example ex; ^ predict.java:8: 'class' or 'interface' expected static public void main(String[] argv) { ^ 2 errors So I added the class name I find at the top of the original file stem.java, but now the I've mach more errors: ------------------------------- predict.java:27: cannot find symbol symbol : variable attr location: class predict attr = new Object[4]; ^ predict.java:29: cannot find symbol symbol : variable keys location: class predict jboost.examples.TextDescription.setTokenSet(keys); ^ predict.java:45: cannot find symbol symbol : variable real_attr location: class predict for (j = 0; j < real_attr.length; j++) { ^ predict.java:46: cannot find symbol symbol : variable real_attr location: class predict a = real_attr[j]; ^ predict.java:48: cannot find symbol symbol : variable attr location: class predict attr[a] = (at.isDefined() ^ predict.java:52: cannot find symbol symbol : variable disc_attr location: class predict for (j = 0; j < disc_attr.length; j++) { ^ predict.java:53: cannot find symbol symbol : variable disc_attr location: class predict a = disc_attr[j]; ^ predict.java:55: cannot find symbol symbol : variable attr location: class predict attr[a] = (at.isDefined() ^ predict.java:59: cannot find symbol symbol : variable text_attr location: class predict for (j = 0; j < text_attr.length; j++) { ^ predict.java:60: cannot find symbol symbol : variable text_attr location: class predict a = text_attr[j]; ^ predict.java:63: cannot find symbol symbol : variable attr location: class predict attr[a] = ""; ^ predict.java:66: cannot find symbol symbol : variable tokens location: class predict Arrays.fill(tokens[a], false); ^ predict.java:68: cannot find symbol symbol : variable tokens location: class predict tokens[a][set[k]] = true; ^ predict.java:71: cannot find symbol symbol : variable attr location: class predict attr[a] = null; ^ predict.java:74: cannot find symbol symbol : method predict_int() location: class predict r = predict_int(); ^ 15 errors ------------------------------- Please, can you explain step by step what I've to do with stem.java to obtain predict.java under windows? Even if your project is interesting, as you may imagine, it's not usefull for me without prediction. :-) Thank you very much for your help! Filippo --------------------------------- --------------------------------- L'email della prossima generazione? Puoi averla con la nuova Yahoo! Mail |
From: Aaron A. <aa...@cs...> - 2008-02-19 03:22:29
|
The best way for you to fix the problem is to create an object that will take all of those values. Then populate the object by giving it an array of length 630, which you should be able to do via float[] arr = new float[] {1,2,3,4....,630}; I would be surprised if this didn't work; however, if that doesn't work, then just pass 50 params at a time to the object with another paramter telling which 50 params you are passing. You can do this via parsing script (e.g. python) or preferably you could edit the AlternatingTree.java file and send in a patch. Aaron On Mon, 18 Feb 2008, Rodrigo Pizarro wrote: > Hi, > > I tried the patch with 10 iterarions (just fotr test) and it looks that it > works! Now the tree have many levels and some leafs are positive and some > others negative :) > > Now I have other problem, this time from java compilation. I modify two > source files of JBoost sources (i will send the modifications as soon as i > can). Briefly, the java source generated by jboost have a method that have a > problem with recieving parameters (i get "too many parameters" error at > compiling). I'm using 630 labels (sorry, my project requieres 630 labels), > and i guess that some method is recieving 630 parameters (the "add_pred" > method). I was resolved this problem giving an array as parameter and each > "if-then" branch from the tree were transformed to an array and then passed > to add_pred. It looks to work fine, but at compiling, i get "code too > large". It means that the java compiler does not deal with a so very huge > "if-then-else" structure containing many arrays of 630 elements. In others > words "a single method in a Java class may be at most 64KB of bytecode." > (found in a Sun's forum). > > Rodrigo > > On Feb 16, 2008 3:53 PM, Rodrigo Pizarro <piz...@gm...> wrote: > >> Ok, Aaron. Thanks so much! >> >> I'm gonnas to download from CVS and try on my computer. Cuerrently I'm in >> a folk town on my country, and i have not Wireless connection available. >> >> I will mail to you when i get some results. >> >> >> On 2/16/08, Aaron Arvey <aa...@cs...> wrote: >>> >>> After personal communicae, it was determined that this was a bug in the >>> ngram code. A hack has been committed to CVS with the intention of a >>> permenant fix in the near future. Currently, all ngram sizes are being >>> ignored and the text is being split into single word bag of features. >>> >>> Aaron >>> >>> >>> On Mon, 28 Jan 2008, Rodrigo Pizarro wrote: >>> >>>> I think that is a bug. >>>> >>>> I try with another training and test sets (about 5 examples of { text, >>> label >>>> }) and i still getting one level tree with negative leafs. I guess >>> that the >>>> problem is dealing with just one text atribute. >>>> >>>> In the file FILE_NAME.log, the text attributes says "set with zero >>> elements". >>>> The algorithm is not considering text atributes? >>>> >>>> The predict_int() method from the generated java file (-j flag) is >>>> >>>> static private double[] predict_int() { >>>> reset_pred(); >>>> add_pred( /* R */ >>>> -0.8047189562170501, >>>> -0.34657359027997264, >>>> -0.34657359027997264, >>>> -0.8047189562170501); >>>> >>>> return finalize_pred(); >>>> } >>>> >>>> Is a tree with just one branch, all with negative values! If you try >>> with >>>> other demo examples (with numeric, set and text atributes in each >>> example), >>>> the method predict_int() contanins many if-then-else statements, >>> representing >>>> all the tree. >>>> >>>> I'm in the right? >>>> >>>> Rodrigo Pizarro G. >>>> Ingeniería Informática >>>> Universidad de Santiago de Chile >>>> >>>> >>>> >>>> >>>> El 28-01-2008, a las 14:58, Aaron Arvey escribió: >>>> >>>>> Try doing something along the lines of >>>>> >>>>> ./jboost -b AdaBoost -numRounds 1000 -a -2 -S FILE_NAMES -ATreeType >>> ADD_ALL >>>>> >>>>> The output tree is going to be incomprehensible no matter what (620 >>> labels >>>>> is just too much to view visually). You also probably have an >>> outdated >>>>> copy of atree2dot2ps.pl. Grab the latest version from CVS and see if >>> that >>>>> works any better for you. Zoom into the postscript file (png,gif >>> zoom will >>>>> be blury, postscript is vector graphics -- I think). Directions for >>> CVS are >>>>> at http://sourceforge.net/cvs/?group_id=195659. >>>>> >>>>> Also, look at the FILE_NAME.info file and see what the error >>> is. This will >>>>> give you a good idea as to the actuall progress of the booster. If >>> error >>>>> is going down, then you know that something good is happening. If >>> error is >>>>> staying very high, then there may be a bug or some other problem. >>>>> >>>>> Aaron >>>>> >>>>> >>>>> On Mon, 28 Jan 2008, Rodrigo Pizarro wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> i'm training jBoost with examples of the type (text, label). There >>> is >>>>>> 620 different labels, 1720 training examples, and 1600 test >>> examples. >>>>>> The problem is that the output tree has only one level with 620 >>>>>> branches and all the branches has negative values!!! >>>>>> >>>>>> JBoost deals correctly with text attributes? is there some bug? I >>>>>> tried with many different run parameters, but i still get similar >>>>>> wreid outputs. Please, any suggestion? >>>>>> >>>>>> Many Thanks beforehand! >>>>>> >>>>>> Rodrigo Pizarro G. >>>>>> Ingeniería Informática >>>>>> Universidad de Santiago de Chile >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>> ------------------------------------------------------------------------- >>>>>> This SF.net email is sponsored by: Microsoft >>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>> _______________________________________________ >>>>>> jboost-users mailing list >>>>>> jbo...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/jboost-users >>>>>> >>>> >> >> >> >> >> -- >> Rodrigo Pizarro Guzmán >> >> Ingeniería Informática >> Universidad de Santiago de Chile >> > > > > |
From: Aaron A. <aa...@cs...> - 2008-02-16 05:23:55
|
After personal communicae, it was determined that this was a bug in the ngram code. A hack has been committed to CVS with the intention of a permenant fix in the near future. Currently, all ngram sizes are being ignored and the text is being split into single word bag of features. Aaron On Mon, 28 Jan 2008, Rodrigo Pizarro wrote: > I think that is a bug. > > I try with another training and test sets (about 5 examples of { text, label > }) and i still getting one level tree with negative leafs. I guess that the > problem is dealing with just one text atribute. > > In the file FILE_NAME.log, the text attributes says "set with zero elements". > The algorithm is not considering text atributes? > > The predict_int() method from the generated java file (-j flag) is > > static private double[] predict_int() { > reset_pred(); > add_pred( /* R */ > -0.8047189562170501, > -0.34657359027997264, > -0.34657359027997264, > -0.8047189562170501); > > return finalize_pred(); > } > > Is a tree with just one branch, all with negative values! If you try with > other demo examples (with numeric, set and text atributes in each example), > the method predict_int() contanins many if-then-else statements, representing > all the tree. > > I'm in the right? > > Rodrigo Pizarro G. > Ingeniería Informática > Universidad de Santiago de Chile > > > > > El 28-01-2008, a las 14:58, Aaron Arvey escribió: > >> Try doing something along the lines of >> >> ./jboost -b AdaBoost -numRounds 1000 -a -2 -S FILE_NAMES -ATreeType ADD_ALL >> >> The output tree is going to be incomprehensible no matter what (620 labels >> is just too much to view visually). You also probably have an outdated >> copy of atree2dot2ps.pl. Grab the latest version from CVS and see if that >> works any better for you. Zoom into the postscript file (png,gif zoom will >> be blury, postscript is vector graphics -- I think). Directions for CVS are >> at http://sourceforge.net/cvs/?group_id=195659. >> >> Also, look at the FILE_NAME.info file and see what the error is. This will >> give you a good idea as to the actuall progress of the booster. If error >> is going down, then you know that something good is happening. If error is >> staying very high, then there may be a bug or some other problem. >> >> Aaron >> >> >> On Mon, 28 Jan 2008, Rodrigo Pizarro wrote: >> >>> Hi, >>> >>> i'm training jBoost with examples of the type (text, label). There is >>> 620 different labels, 1720 training examples, and 1600 test examples. >>> The problem is that the output tree has only one level with 620 >>> branches and all the branches has negative values!!! >>> >>> JBoost deals correctly with text attributes? is there some bug? I >>> tried with many different run parameters, but i still get similar >>> wreid outputs. Please, any suggestion? >>> >>> Many Thanks beforehand! >>> >>> Rodrigo Pizarro G. >>> Ingeniería Informática >>> Universidad de Santiago de Chile >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> jboost-users mailing list >>> jbo...@li... >>> https://lists.sourceforge.net/lists/listinfo/jboost-users >>> > |
From: Rodrigo P. <piz...@gm...> - 2008-01-28 21:27:18
|
I think that is a bug. I try with another training and test sets (about 5 examples of { text, =20= label }) and i still getting one level tree with negative leafs. I =20 guess that the problem is dealing with just one text atribute. In the file FILE_NAME.log, the text attributes says "set with zero =20 elements". The algorithm is not considering text atributes? The predict_int() method from the generated java file (-j flag) is static private double[] predict_int() { reset_pred(); add_pred( /* R */ -0.8047189562170501, -0.34657359027997264, -0.34657359027997264, -0.8047189562170501); return finalize_pred(); } Is a tree with just one branch, all with negative values! If you try =20 with other demo examples (with numeric, set and text atributes in each =20= example), the method predict_int() contanins many if-then-else =20 statements, representing all the tree. I'm in the right? Rodrigo Pizarro G. Ingenier=EDa Inform=E1tica Universidad de Santiago de Chile El 28-01-2008, a las 14:58, Aaron Arvey escribi=F3: > Try doing something along the lines of > > ./jboost -b AdaBoost -numRounds 1000 -a -2 -S FILE_NAMES -ATreeType =20= > ADD_ALL > > The output tree is going to be incomprehensible no matter what (620 =20= > labels is just too much to view visually). You also probably have =20 > an outdated copy of atree2dot2ps.pl. Grab the latest version from =20 > CVS and see if that works any better for you. Zoom into the =20 > postscript file (png,gif zoom will be blury, postscript is vector =20 > graphics -- I think). Directions for CVS are at = http://sourceforge.net/cvs/?group_id=3D195659=20 > . > > Also, look at the FILE_NAME.info file and see what the error is. =20 > This will give you a good idea as to the actuall progress of the =20 > booster. If error is going down, then you know that something good =20= > is happening. If error is staying very high, then there may be a =20 > bug or some other problem. > > Aaron > > > On Mon, 28 Jan 2008, Rodrigo Pizarro wrote: > >> Hi, >> >> i'm training jBoost with examples of the type (text, label). There is >> 620 different labels, 1720 training examples, and 1600 test examples. >> The problem is that the output tree has only one level with 620 >> branches and all the branches has negative values!!! >> >> JBoost deals correctly with text attributes? is there some bug? I >> tried with many different run parameters, but i still get similar >> wreid outputs. Please, any suggestion? >> >> Many Thanks beforehand! >> >> Rodrigo Pizarro G. >> Ingenier=EDa Inform=E1tica >> Universidad de Santiago de Chile >> >> >> >> >> >> = ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> jboost-users mailing list >> jbo...@li... >> https://lists.sourceforge.net/lists/listinfo/jboost-users >> |
From: Aaron A. <aa...@cs...> - 2008-01-28 17:58:28
|
Try doing something along the lines of ./jboost -b AdaBoost -numRounds 1000 -a -2 -S FILE_NAMES -ATreeType ADD_ALL The output tree is going to be incomprehensible no matter what (620 labels is just too much to view visually). You also probably have an outdated copy of atree2dot2ps.pl. Grab the latest version from CVS and see if that works any better for you. Zoom into the postscript file (png,gif zoom will be blury, postscript is vector graphics -- I think). Directions for CVS are at http://sourceforge.net/cvs/?group_id=195659. Also, look at the FILE_NAME.info file and see what the error is. This will give you a good idea as to the actuall progress of the booster. If error is going down, then you know that something good is happening. If error is staying very high, then there may be a bug or some other problem. Aaron On Mon, 28 Jan 2008, Rodrigo Pizarro wrote: > Hi, > > i'm training jBoost with examples of the type (text, label). There is > 620 different labels, 1720 training examples, and 1600 test examples. > The problem is that the output tree has only one level with 620 > branches and all the branches has negative values!!! > > JBoost deals correctly with text attributes? is there some bug? I > tried with many different run parameters, but i still get similar > wreid outputs. Please, any suggestion? > > Many Thanks beforehand! > > Rodrigo Pizarro G. > Ingeniería Informática > Universidad de Santiago de Chile > > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > jboost-users mailing list > jbo...@li... > https://lists.sourceforge.net/lists/listinfo/jboost-users > |
From: Rodrigo P. <piz...@gm...> - 2008-01-28 15:09:34
|
Hi, i'm training jBoost with examples of the type (text, label). There is =20= 620 different labels, 1720 training examples, and 1600 test examples. =20= The problem is that the output tree has only one level with 620 =20 branches and all the branches has negative values!!! JBoost deals correctly with text attributes? is there some bug? I =20 tried with many different run parameters, but i still get similar =20 wreid outputs. Please, any suggestion? Many Thanks beforehand! Rodrigo Pizarro G. Ingenier=EDa Inform=E1tica Universidad de Santiago de Chile |
From: Aaron A. <aa...@cs...> - 2008-01-25 01:03:51
|
I run the example you give and I have no problems... I have the text I used below: $> cat rodrigo.train rodrigo.test rodrigo.spec neoplasia maligna de labio superior externa , c000; neoplasia maligna de labio superior borde vermellon sai , c000; neoplasia maligna de labio superior externa trastorno , c000; neoplasia maligna del labio superior area del bermellon trastorno , c000; tumor maligno de ddel borde bermellon del labio superior trastorno , c000; neoplasia maligna de labio superior externa , c000; neoplasia maligna de labio superior borde vermellon sai , c000; neoplasia maligna de labio superior externa trastorno , c000; neoplasia maligna del labio superior area del bermellon trastorno , c000; tumor maligno de ddel borde bermellon del labio superior trastorno , c000; exampleTerminator=; attributeTerminator=, maxBadExa=0 diagnostico text labels (c000,c001,c002,c003,c004,c005,c006,c009) $> ./jboost -b AdaBoost -numRounds 5 -S rodrigo I have a couple of guesses for problems: * You have "weird" line ends. Try using the most stripped down text editor you have. Or send the first 10 examples in a file (not copy and pasted into email form) * You have too many labels and JBoost is getting confused. JBoost can handle up to 32000 labels (maybe more...) and I've never actually tested the behavior if there are more than this. To answer your question: What is maxBadExa? It is just the maximum number of bad examples tolerated. Since I can't reproduce the problem, I'm not sure exactly what it is. If you post your datasets somewhere or send small version of them (10KB total or less), I may be able to help more. Aaron On Thu, 24 Jan 2008, Rodrigo Pizarro wrote: > Hi, > > i have a Exception "BadAttException" while running JBoost. > > --------------------- > iMac-de-Rodrigo-Pizarro:demo Rodri_gop$ java jboost.controller.Controller -S > diag > Fileloader adding . to path. > WARNING: configuration file jboost.config not found. Continuing... > Found diag.spec > Found diag.train > Found diag.spec > Found diag.test > Booster type: jboost.booster.AdaBoost > BadAttException: Line 0 > Multiple labels found when expecting single label: c000 > Continuing to parse example. > jboost.tokenizer.BadAttException: Multiple labels found when expecting single > label: c000 > at jboost.examples.LabelDescription.str2Att(LabelDescription.java:59) > at > jboost.tokenizer.ExampleStream.parseExampleText(ExampleStream.java:127) > at jboost.tokenizer.ExampleStream.getExample(ExampleStream.java:83) > at jboost.controller.Controller.readTrainData(Controller.java:571) > at jboost.controller.Controller.init(Controller.java:164) > at jboost.controller.Controller.<init>(Controller.java:105) > at jboost.controller.Controller.main(Controller.java:80) > BadExaException Example beginning at line 0 > Number of bad attributes in example exceeds 0 skipping rest of example. > BadAttException: Line 0 > Multiple labels found when expecting single label: c000 > Continuing to parse example. > jboost.tokenizer.BadAttException: Multiple labels found when expecting single > label: c000 > at jboost.examples.LabelDescription.str2Att(LabelDescription.java:59) > at > jboost.tokenizer.ExampleStream.parseExampleText(ExampleStream.java:127) > at jboost.tokenizer.ExampleStream.getExample(ExampleStream.java:83) > at jboost.controller.Controller.readTrainData(Controller.java:571) > at jboost.controller.Controller.init(Controller.java:164) > at jboost.controller.Controller.<init>(Controller.java:105) > at jboost.controller.Controller.main(Controller.java:80) > BadExaException Example beginning at line 0 > Number of bad attributes in example exceeds 0 skipping rest of example. > BadAttException: Line 1 > Multiple labels found when expecting single label: c000 > Continuing to parse example. > ... > -------------------------- > > The train file is like: > > neoplasia maligna de labio superior externa , c000; > neoplasia maligna de labio superior borde vermellon sai , c000; > neoplasia maligna de labio superior externa trastorno , c000; > neoplasia maligna del labio superior area del bermellon trastorno , c000; > tumor maligno de ddel borde bermellon del labio superior trastorno , c000; > ... > ... > > And the spec file is like: > > exampleTerminator=; > attributeTerminator=, > maxBadExa=2000 > > diagnostico text > labels (c000,c001,c002,c003,c004,c005,c006,c009, ... ,d487,d489) > > (it's resumed) > > I have changed the maxBadExample parametrer for some values between 0 and > 2000 and I still get similar errors. The jboost execution is fine with the > demo examples. > > What can be happening? > What means "maxBadExa" parameter? > > > Many thanks beforehand!!! > > > > Rodrigo Pizarro G. > Ingeniería Informática > Universidad de Santiago de Chile > > > > |
From: Rodrigo P. <piz...@gm...> - 2008-01-25 00:11:48
|
Hi, i have a Exception "BadAttException" while running JBoost. --------------------- iMac-de-Rodrigo-Pizarro:demo Rodri_gop$ java =20 jboost.controller.Controller -S diag Fileloader adding . to path. WARNING: configuration file jboost.config not found. Continuing... Found diag.spec Found diag.train Found diag.spec Found diag.test Booster type: jboost.booster.AdaBoost BadAttException: Line 0 Multiple labels found when expecting single label: c000 Continuing to parse example. jboost.tokenizer.BadAttException: Multiple labels found when expecting =20= single label: c000 at = jboost.examples.LabelDescription.str2Att(LabelDescription.java:59) at = jboost.tokenizer.ExampleStream.parseExampleText(ExampleStream.java:=20 127) at = jboost.tokenizer.ExampleStream.getExample(ExampleStream.java:83) at = jboost.controller.Controller.readTrainData(Controller.java:571) at jboost.controller.Controller.init(Controller.java:164) at jboost.controller.Controller.<init>(Controller.java:105) at jboost.controller.Controller.main(Controller.java:80) BadExaException Example beginning at line 0 Number of bad attributes in example exceeds 0 skipping rest of example. BadAttException: Line 0 Multiple labels found when expecting single label: c000 Continuing to parse example. jboost.tokenizer.BadAttException: Multiple labels found when expecting =20= single label: c000 at = jboost.examples.LabelDescription.str2Att(LabelDescription.java:59) at = jboost.tokenizer.ExampleStream.parseExampleText(ExampleStream.java:=20 127) at = jboost.tokenizer.ExampleStream.getExample(ExampleStream.java:83) at = jboost.controller.Controller.readTrainData(Controller.java:571) at jboost.controller.Controller.init(Controller.java:164) at jboost.controller.Controller.<init>(Controller.java:105) at jboost.controller.Controller.main(Controller.java:80) BadExaException Example beginning at line 0 Number of bad attributes in example exceeds 0 skipping rest of example. BadAttException: Line 1 Multiple labels found when expecting single label: c000 Continuing to parse example. ... -------------------------- The train file is like: neoplasia maligna de labio superior externa , c000; neoplasia maligna de labio superior borde vermellon sai , c000; neoplasia maligna de labio superior externa trastorno , c000; neoplasia maligna del labio superior area del bermellon trastorno , =20 c000; tumor maligno de ddel borde bermellon del labio superior trastorno , =20 c000; ... ... And the spec file is like: exampleTerminator=3D; attributeTerminator=3D, maxBadExa=3D2000 diagnostico text labels (c000,c001,c002,c003,c004,c005,c006,c009, ... ,d487,d489) (it's resumed) I have changed the maxBadExample parametrer for some values between 0 =20= and 2000 and I still get similar errors. The jboost execution is fine =20= with the demo examples. What can be happening? What means "maxBadExa" parameter? Many thanks beforehand!!! Rodrigo Pizarro G. Ingenier=EDa Inform=E1tica Universidad de Santiago de Chile |
From: William B. <wb...@cs...> - 2008-01-16 23:40:05
|
Hello. Each value does correspond to a label. The value is the predictor's =20 confidence in selecting that label. In the simplest case, selecting =20 the max positive value gives you your "best" prediction; noting that =20 you may not be that confident in your "best" prediction if it is =20 close to zero (or negative). In the examples provided, the labels would be "smart" and "rich", =20 respectively. Let me know if that sufficiently answers your question. best, -william ------------ William Beaver wb...@cs... On Jan 16, 2008, at 9:34 AM, Rodrigo Pizarro wrote: > Hi, > > I'm currentrly involved in a project about Natural Language > Processing. My system takes as input an plain text (a medical > diagnose) and I need to output a ranking with the n most plausible > labels (each label is a standard code for the diagnose). I have found > JBoost and because it produces a Java class, it is perfect to include > as a part of my whole system, but I have some basic question about the > output. I have produced a "predict.class" for the "stem" example in > the "demo" folder. The output with the "java predict < stem.train" > command give me this: > > -36.173392809294306 36.17339280935683 -36.17339280949395 > -36.173392809448266 > 20.278047383272927 -20.85468565374684 -35.32543210576156 > -35.81457452982327 > > I have read that each value corresponds to a label. My question is: > How can I to interpret this output? what means the numbers? =20 > confidence? > > The labels are (rich, smart, happy, none). What about the sign in > multilabel problems? Can I build some kind of ranking of the most > plausible labels for each input example? > > Many thanks beforehand! > > PS: sorry for the basicness of my question > > Rodrigo Pizarro G. > Ingenier=EDa Inform=E1tica > Universidad de Santiago de Chile |
From: Rodrigo P. <piz...@gm...> - 2008-01-16 19:47:50
|
Hi, I'm currentrly involved in a project about Natural Language =20 Processing. My system takes as input an plain text (a medical =20 diagnose) and I need to output a ranking with the n most plausible =20 labels (each label is a standard code for the diagnose). I have found =20= JBoost and because it produces a Java class, it is perfect to include =20= as a part of my whole system, but I have some basic question about the =20= output. I have produced a "predict.class" for the "stem" example in =20 the "demo" folder. The output with the "java predict < stem.train" =20 command give me this: -36.173392809294306 36.17339280935683 -36.17339280949395 =20 -36.173392809448266 20.278047383272927 -20.85468565374684 -35.32543210576156 =20 -35.81457452982327 I have read that each value corresponds to a label. My question is: =20 How can I to interpret this output? what means the numbers? confidence? The labels are (rich, smart, happy, none). What about the sign in =20 multilabel problems? Can I build some kind of ranking of the most =20 plausible labels for each input example? Many thanks beforehand! PS: sorry for the basicness of my question Rodrigo Pizarro G. Ingenier=EDa Inform=E1tica Universidad de Santiago de Chile |
From: Rodrigo P. <piz...@gm...> - 2008-01-16 17:34:33
|
Hi, I'm currentrly involved in a project about Natural Language =20 Processing. My system takes as input an plain text (a medical =20 diagnose) and I need to output a ranking with the n most plausible =20 labels (each label is a standard code for the diagnose). I have found =20= JBoost and because it produces a Java class, it is perfect to include =20= as a part of my whole system, but I have some basic question about the =20= output. I have produced a "predict.class" for the "stem" example in =20 the "demo" folder. The output with the "java predict < stem.train" =20 command give me this: -36.173392809294306 36.17339280935683 -36.17339280949395 =20 -36.173392809448266 20.278047383272927 -20.85468565374684 -35.32543210576156 =20 -35.81457452982327 I have read that each value corresponds to a label. My question is: =20 How can I to interpret this output? what means the numbers? confidence? The labels are (rich, smart, happy, none). What about the sign in =20 multilabel problems? Can I build some kind of ranking of the most =20 plausible labels for each input example? Many thanks beforehand! PS: sorry for the basicness of my question Rodrigo Pizarro G. Ingenier=EDa Inform=E1tica Universidad de Santiago de Chile |
From: Aaron A. <aa...@cs...> - 2007-12-28 23:44:56
|
I run jboost on a Linux x64 machine as well. When I set the CLASSPATH environment variable to nothing, I get the following error: --------------------------------- You do not have your CLASSPATH variable set correctly. You have the following dirs in your java path: You need to have the following in your path: JBOOST_HOME/dist/jboost.jar. JBOOST_HOME/lib/concurrent.jar. If jboost.jar doesn't exist, downloaded the distribution again. To set CLASSPATH, see the documentation in JBOOST_HOME/doc or online. JBoost jar file is: Exception in thread "main" java.lang.NoClassDefFoundError: jboost/controller/Controller --------------------------------- When I set the CLASSPATH as such: > CLASSPATH="~/jboost/dist/jboost.jar:~/jboost/lib/concurrent.jar" where I use tildes '~' instead of '/home/aarvey' I get the following more cryptic error message: ---------------------------- JBoost jar file is: ~/jboost/dist/jboost.jar Exception in thread "main" java.lang.NoClassDefFoundError: jboost/controller/Controller ---------------------------- I'm guessing that using tildes may be your problem. Try doing full classpath yourself and see if that solves your problems. Let me know success/failure. Aaron On Thu, 27 Dec 2007, Shiv N. Vitaladevuni wrote: > Hi, > > I downloaded jboost-1.4 on a Linux x64 machine. When I execute command > ./jboost -S demo/stem > I get the following error message: > JBoost jar file is: ~/research/lib/jboost-1.4/dist/jboost.jar > Exception in thread "main" java.lang.NoClassDefFoundError: > jboost.controller.Controller > at gnu.java.lang.MainThread.run(libgcj.so.7rh) > Caused by: java.lang.ClassNotFoundException: > jboost.controller.Controller not found in > gnu.gcj.runtime.SystemClassLoader{urls=[file:./], > parent=gnu.gcj.runtime.ExtensionClassLoader{urls=[], parent=null}} > at java.net.URLClassLoader.findClass(libgcj.so.7rh) > at java.lang.ClassLoader.loadClass(libgcj.so.7rh) > at java.lang.ClassLoader.loadClass(libgcj.so.7rh) > at gnu.java.lang.MainThread.run(libgcj.so.7rh) > > Can you tell what I am doing wrong? I have set the CLASSPATH environment > variable as mentioned in README, and checked with the Sun website that > the installed Java is up to date. > > Thanking you, > Regards, > Shiv > > |
From: Shiv N. V. <vit...@ja...> - 2007-12-27 22:55:28
|
Hi, I downloaded jboost-1.4 on a Linux x64 machine. When I execute command ./jboost -S demo/stem I get the following error message: JBoost jar file is: ~/research/lib/jboost-1.4/dist/jboost.jar Exception in thread "main" java.lang.NoClassDefFoundError: jboost.controller.Controller at gnu.java.lang.MainThread.run(libgcj.so.7rh) Caused by: java.lang.ClassNotFoundException: jboost.controller.Controller not found in gnu.gcj.runtime.SystemClassLoader{urls=[file:./], parent=gnu.gcj.runtime.ExtensionClassLoader{urls=[], parent=null}} at java.net.URLClassLoader.findClass(libgcj.so.7rh) at java.lang.ClassLoader.loadClass(libgcj.so.7rh) at java.lang.ClassLoader.loadClass(libgcj.so.7rh) at gnu.java.lang.MainThread.run(libgcj.so.7rh) Can you tell what I am doing wrong? I have set the CLASSPATH environment variable as mentioned in README, and checked with the Sun website that the installed Java is up to date. Thanking you, Regards, Shiv -- Shiv N. Vitaladevuni, Bioinformatics Specialist, Janelia Farm Research Campus, Howard Hughes Medical Institute. |
From: Aaron A. <aa...@cs...> - 2007-10-26 20:45:33
|
Hongbin, Two excellent questions. For asymmetric cost, there are two methods that are currently available and reasonably well documented: 1) You can over/under-sample your examples such that the weight placed on the small number of positive examples increases w.r.t. the total weight of all examples. While this is a hack, it may be sufficient and is the simplest way to do asymmetric cost. For this method, I recommend using LogLossBoost as it will leave the positive examples with more weight than AdaBoost (which will increase the weight of the negative examples at an exponential rate, whereas LogLossBoost will do this at a linear rate -- give or take). However, try both algorithms and compare results (error rates, margin curves, etc). There is a resample.py script that may be a good way to resample (it keeps things in order), though you'll likely have to edit the source since I wrote it quickly and likely didn't do a good job. 2) You can give each negative example an initial weight in the spec file using the "weight" feature (see the online documentation for creating the spec file). This weight can be anything in the range of [0, 1]. I know that this worked at one point in time, but I know it's had troubled lately. It is also roughly equivalent in effectiveness to option 1, but it requires less memory. 3) Yes, I know I said there were two methods, but here's a 3rd one anyways. I am currently working on asymmetric cost to be used with BrownBoost & NormalBoost. I'll likely do release 1.4.1 in the next couple weeks and it will have asymmetric cost for BrownBoost (documented, etc) and an initial attempt at bug free asymmetric cost with NormalBoost. BrownBoost is fairly simple to parameterize and NormalBoost is only slightly more complicated. I will post documentation on the website and send you an email in the next week or so when asymmetric cost BrownBoost is finished. I'm guessing NormalBoost will take another week or two more. For cascade classifiers, a friend of mine was going to code this up, but he just hasn't gotten around to it. The main changes that would need to be made are in ./jboost/src/jboost/atree/ and ./jboost/src/jboost/booster. In particular, I think the PredictorNode (which I recently changed to be evaluated in iteration order, not DFS) will need check to see if the next prediction pushes us over the limit and we can stop evaluating weak hypotheses. Once an example is considered to be obsolete, it can be marked as such (though this will increase the memory size of an example, something we're trying to keep to a minimum). The PredictorNode and Booster can then check to see if the example is obsolete or not. If you're interested in implementing this, let me know and I can go into greater detail. Otherwise, I'll mention it again to the guy who is going to need it for some other. Let me know if you have any other questions and I'll shoot you another email when I get asymmetric BrownBoost and NormalBoost in the next couple weeks. Aaron On Fri, 26 Oct 2007, hongbin wang wrote: > Hi Aaron, > > Thanks for your excellent software. I have a question > about how to set asymetric cost. Basically I have a > imbalanced dataset include small number of positive > samples and large negative samples. Another question > is that, how to extend your code into cascade > classifier? > > Many thanks > > Hongbin |
From: hongbin w. <kin...@ya...> - 2007-10-26 13:37:51
|
Hi Aaron, Thanks for your excellent software. I have a question about how to set asymetric cost. Basically I have a imbalanced dataset include small number of positive samples and large negative samples. Another question is that, how to extend your code into cascade classifier? Many thanks Hongbin __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: Aaron A. <aa...@cs...> - 2007-10-25 03:50:33
|
On Wed, 24 Oct 2007, David Rolland wrote: > I'm starting a new thread because I never received your reply through my > mail, so I can't reply to the thread. Huh... I'll see if that's a problem on my end... > I just downloaded version 1.4, I'll take a look at it really soon. I'm > still in the process of evaluating JBoost to see if it fits my needs. So > any change to the ouput files format doesn't affect me right now. The > GUI will be a very interesting tool, even though I already made my own > batch files to call JBoost. The GUI didn't make the 1.4 cut :-(. It'll be in 1.4.1 or 1.4.2. I can send you a version with GUI if you'd like. > Your explanation of score/margin is clear, but there is still something > that I don't understand. The graphic for the margin has "Margin" and > "Cumulative Distribution" as axis labels. In this graphic, the margin > values are between -1 and +1. This means that the values have been > remapped from the formula you gave me (margin = score * label). Am I > right ? This remap was actually the purpose of my initial question, but > I wasn't clear on that point. Yes, the margin is remapped into the interval [-1,+1]. An unnormalized margin doesn't really mean much (and all theorems have been proven for the normalized margin), so we normalize it. > I began using Adaboost for image analysis (mainly OCR) where I was > working, but seeing it's interesting performances, I want to try it in > other domains. I'll give you feedback it it works well. From my previous > experience with OCR, Adaboost isn't magical, the input data must be > chosen carefully and right now I'm working on the input before really > using JBoost. All statistics and machine learning algorithms are only as good as the input. One nice thing about AdaBoost (in contrast to other learning algorithms) is that feature selection is less important. Thus, you can throw as many features as you can imagine at the algorithm and let it figure out which ones are relevant. At some point in the future we're planning on putting cascaded predictors similar to Viola & Jones. This may be relevant to your work. Aaron > Hi David, > > See responses. Also know that for release 1.4 (coming out sometime this > week most likely) we've completely changed the format of the output files > and post-processing. We are also in the process of porting all > Python/Perl code to Java. We are also developing a GUI to make the whole > process more intuitive, which will probably be released in the next month. > That being said... > > On Mon, 15 Oct 2007, David Rolland wrote: > >> First, I tested the visualization tools to make sure I had the same >> results as shown on >> http://www.cs.ucsd.edu/~aarvey/jboost/doc.html#visualization. I had to >> modify the atree2dot2ps.pl script because it does not use the "--dir" >> option everywhere in the script. Is it the expected behavior ? > > You're right. The $dirname variable is only used for the $infofilename, > not the $filename. I've committed the change to CVS. See > http://sourceforge.net/cvs/?group_id=195659 for details on how to get CVS > access. > >> Second, I was not able to reproduce the margin output. My problem is >> that I run JBoost on Windows XP and the margin script uses 'cat' and >> other Unix commands. > > Yeah... the scripts were written assuming a few other things too... > Really, they were written as bandages, not final solutions. We're > currently working on porting all of this to Java so that we have more > interoperability. > >> I am a C/C++ programmer, I don't know Perl nor Python. > > Perl is pretty out of style these, though it is amazing what some of that > gross syntax can do. Python has survived a couple of fads (ruby, etc) and > still seems to me the best widespread scripting language. It may be worth > a look, even if not for this project. > >> Even though these languages are similar to C I still wonder how the >> files spambase.train.margin and spambase.train.scores can be processed >> to output the margin graphic. > > We use gnuplot as an intermediary... > >> And actually, what's the difference between these two files? > > One has the "score" of the example, the other has the "margin". The score > is defined as the value predicted for a given example. The margin is the > > if (label of example is correct) > return |score| > else > return - |score| > > where |x| is the absolute value of x. > >> There only seems to be some positive/negative changes of values. I don't >> expect you to rewrite margin.py without taking advantage of Unix >> commands, but if you explain how I can get from the data files to the >> output graphic, I'll code myself a Windows equivalent. > > I'd say the best bet for the moment being is to grab cygwin, where > everything has been tested and *seemed* to work peachy. The second > best bet would probably be to wait till Wednesday/Thursday for the next > release (when all of your changes would probably be rendered somewhat moot > anyways). Third best bet is edit the code. > > The only place where UNIX commands are used in 1.3.1 margin.py are lines > 244--255. That is where a label file is created. All labels in JBoost > are converted into binary values +1, -1. If you have more than two > labels, just wait till later this week. if you have two labels, then just > figure out which is mapped to "+1" and which to "-1" (should be fairly > straight forward). Create the labels file, which is just a series of 1, > -1 read in. The formula for margin is (line 168) > > margin = score * label > > This is identical to what I state above for when +1,-1 are used. So if > you write a script to create a label file, you can specify the file on the > command line (via --labels=...), and all your problems should be solved. > > Let me know if you have any other questions/comments. > > Also, out of curiosity, for what classification task are you using JBoost? > > Aaron > > > > > Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_beta?.intl=ca > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > jboost-users mailing list > jbo...@li... > https://lists.sourceforge.net/lists/listinfo/jboost-users > |
From: David R. <rol...@ya...> - 2007-10-25 03:27:30
|
Hi Aaron,=0A=0AI'm starting a new thread because I never received your repl= y through my mail, so I can't reply to the thread.=0A=0AI just downloaded v= ersion 1.4, I'll take a look at it really soon. I'm still in the process of= evaluating JBoost to see if it fits my needs. So any change to the ouput f= iles format doesn't affect me right now. The GUI will be a very interesting= tool, even though I already made my own batch files to call JBoost.=0A=0AY= our explanation of score/margin is clear, but there is still something that= I don't understand. The graphic for the margin has "Margin" and "Cumulativ= e Distribution" as axis labels. In this graphic, the margin values are betw= een -1 and +1. This means that the values have been remapped from the formu= la you gave me (margin =3D score * label). Am I right ? This remap was actu= ally the purpose of my initial question, but I wasn't clear on that point.= =0A=0AI began using Adaboost for image analysis (mainly OCR) where I was wo= rking, but seeing it's interesting performances, I want to try it in other = domains. I'll give you feedback it it works well. From my previous experien= ce with OCR, Adaboost isn't magical, the input data must be chosen carefull= y and right now I'm working on the input before really using JBoost.=0A=0A = Thanks for the previous answers,=0A=0ADavid R.=0A=0A=0A--------------------= ----=0A=0AHi David,=0A=0ASee responses. Also know that for release 1.4 (com= ing out sometime this =0Aweek most likely) we've completely changed the fo= rmat of the output files =0Aand post-processing. We are also in the proces= s of porting all =0APython/Perl code to Java. We are also developing a GUI= to make the whole =0Aprocess more intuitive, which will probably be relea= sed in the next month. =0AThat being said...=0A=0AOn Mon, 15 Oct 2007, Dav= id Rolland wrote:=0A=0A> First, I tested the visualization tools to make su= re I had the same =0A> results as shown on =0A> http://www.cs.ucsd.edu/~a= arvey/jboost/doc.html#visualization. I had to =0A> modify the atree2dot2ps= .pl script because it does not use the "--dir" =0A> option everywhere in t= he script. Is it the expected behavior ?=0A=0AYou're right. The $dirname va= riable is only used for the $infofilename, =0Anot the $filename. I've comm= itted the change to CVS. See =0Ahttp://sourceforge.net/cvs/?group_id=3D195= 659 for details on how to get CVS =0Aaccess.=0A=0A> Second, I was not able= to reproduce the margin output. My problem is =0A> that I run JBoost on W= indows XP and the margin script uses 'cat' and =0A> other Unix commands.= =0A=0AYeah... the scripts were written assuming a few other things too... = =0AReally, they were written as bandages, not final solutions. We're =0Acu= rrently working on porting all of this to Java so that we have more =0Aint= eroperability.=0A=0A> I am a C/C++ programmer, I don't know Perl nor Python= .=0A=0APerl is pretty out of style these, though it is amazing what some of= that =0Agross syntax can do. Python has survived a couple of fads (ruby, = etc) and =0Astill seems to me the best widespread scripting language. It m= ay be worth =0Aa look, even if not for this project.=0A=0A> Even though th= ese languages are similar to C I still wonder how the =0A> files spambase.= train.margin and spambase.train.scores can be processed =0A> to output the= margin graphic.=0A=0AWe use gnuplot as an intermediary...=0A=0A> And actua= lly, what's the difference between these two files?=0A=0AOne has the "score= " of the example, the other has the "margin". The score =0Ais defined as t= he value predicted for a given example. The margin is the=0A=0Aif (label of= example is correct)=0Areturn |score|=0Aelse=0Areturn - |score|=0A=0Awhere = |x| is the absolute value of x.=0A=0A> There only seems to be some positive= /negative changes of values. I don't =0A> expect you to rewrite margin.py = without taking advantage of Unix =0A> commands, but if you explain how I c= an get from the data files to the =0A> output graphic, I'll code myself a = Windows equivalent.=0A=0AI'd say the best bet for the moment being is to gr= ab cygwin, where =0Aeverything has been tested and *seemed* to work peachy= . The second =0Abest bet would probably be to wait till Wednesday/Thursday= for the next =0Arelease (when all of your changes would probably be rende= red somewhat moot =0Aanyways). Third best bet is edit the code.=0A=0AThe o= nly place where UNIX commands are used in 1.3.1 margin.py are lines =0A244= --255. That is where a label file is created. All labels in JBoost =0Aare = converted into binary values +1, -1. If you have more than two =0Alabels, = just wait till later this week. if you have two labels, then just =0Afigur= e out which is mapped to "+1" and which to "-1" (should be fairly =0Astrai= ght forward). Create the labels file, which is just a series of 1, =0A-1 r= ead in. The formula for margin is (line 168)=0A=0Amargin =3D score * label= =0A=0AThis is identical to what I state above for when +1,-1 are used. So i= f =0Ayou write a script to create a label file, you can specify the file o= n the =0Acommand line (via --labels=3D...), and all your problems should b= e solved.=0A=0ALet me know if you have any other questions/comments.=0A=0AA= lso, out of curiosity, for what classification task are you using JBoost?= =0A=0AAaron=0A=0A=0A=0A=0A Be smarter than spam. See how smart SpamGua= rd is at giving junk email the boot with the All-new Yahoo! Mail at http://= mrd.mail.yahoo.com/try_beta?.intl=3Dca=0A |
From: Aaron A. <aa...@cs...> - 2007-10-24 00:09:04
|
David, The new version of JBoost has been released. The output files are much esaier to work with, so you'll be able to implement the changes you discuss below much faster. This output format is now fairly stable, but may undergo changes in the future. However, these changes will be minor and should not irrevocably break any code you write. I'm going to post some comments on the file format on the website. Till then, feel free to ask me any questions you have. Aaron On Mon, 15 Oct 2007, David Rolland wrote: > Hi, > > I just began to use JBoost and I have some questions. > > First, I tested the visualization tools to make sure I had the same > results as shown on > http://www.cs.ucsd.edu/~aarvey/jboost/doc.html#visualization. I had to > modify the atree2dot2ps.pl script because it does not use the "--dir" > option everywhere in the script. Is it the expected behavior ? > > Second, I was not able to reproduce the margin output. My problem is > that I run JBoost on Windows XP and the margin script uses 'cat' and > other Unix commands. I am a C/C++ programmer, I don't know Perl nor > Python. Even though these languages are similar to C I still wonder how > the files spambase.train.margin and spambase.train.scores can be > processed to output the margin graphic. And actually, what's the > difference between these two files ? There only seems to be some > positive/negative changes of values. I don't expect you to rewrite > margin.py without taking advantage of Unix commands, but if you explain > how I can get from the data files to the output graphic, I'll code > myself a Windows equivalent. > > Thanks, > > David R. > > > > > Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_beta?.intl=ca > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > jboost-users mailing list > jbo...@li... > https://lists.sourceforge.net/lists/listinfo/jboost-users > |
From: Aaron A. <aa...@cs...> - 2007-10-16 02:05:40
|
Hi David, See responses. Also know that for release 1.4 (coming out sometime this week most likely) we've completely changed the format of the output files and post-processing. We are also in the process of porting all Python/Perl code to Java. We are also developing a GUI to make the whole process more intuitive, which will probably be released in the next month. That being said... On Mon, 15 Oct 2007, David Rolland wrote: > First, I tested the visualization tools to make sure I had the same > results as shown on > http://www.cs.ucsd.edu/~aarvey/jboost/doc.html#visualization. I had to > modify the atree2dot2ps.pl script because it does not use the "--dir" > option everywhere in the script. Is it the expected behavior ? You're right. The $dirname variable is only used for the $infofilename, not the $filename. I've committed the change to CVS. See http://sourceforge.net/cvs/?group_id=195659 for details on how to get CVS access. > Second, I was not able to reproduce the margin output. My problem is > that I run JBoost on Windows XP and the margin script uses 'cat' and > other Unix commands. Yeah... the scripts were written assuming a few other things too... Really, they were written as bandages, not final solutions. We're currently working on porting all of this to Java so that we have more interoperability. > I am a C/C++ programmer, I don't know Perl nor Python. Perl is pretty out of style these, though it is amazing what some of that gross syntax can do. Python has survived a couple of fads (ruby, etc) and still seems to me the best widespread scripting language. It may be worth a look, even if not for this project. > Even though these languages are similar to C I still wonder how the > files spambase.train.margin and spambase.train.scores can be processed > to output the margin graphic. We use gnuplot as an intermediary... > And actually, what's the difference between these two files? One has the "score" of the example, the other has the "margin". The score is defined as the value predicted for a given example. The margin is the if (label of example is correct) return |score| else return - |score| where |x| is the absolute value of x. > There only seems to be some positive/negative changes of values. I don't > expect you to rewrite margin.py without taking advantage of Unix > commands, but if you explain how I can get from the data files to the > output graphic, I'll code myself a Windows equivalent. I'd say the best bet for the moment being is to grab cygwin, where everything has been tested and *seemed* to work peachy. The second best bet would probably be to wait till Wednesday/Thursday for the next release (when all of your changes would probably be rendered somewhat moot anyways). Third best bet is edit the code. The only place where UNIX commands are used in 1.3.1 margin.py are lines 244--255. That is where a label file is created. All labels in JBoost are converted into binary values +1, -1. If you have more than two labels, just wait till later this week. if you have two labels, then just figure out which is mapped to "+1" and which to "-1" (should be fairly straight forward). Create the labels file, which is just a series of 1, -1 read in. The formula for margin is (line 168) margin = score * label This is identical to what I state above for when +1,-1 are used. So if you write a script to create a label file, you can specify the file on the command line (via --labels=...), and all your problems should be solved. Let me know if you have any other questions/comments. Also, out of curiosity, for what classification task are you using JBoost? Aaron |
From: David R. <rol...@ya...> - 2007-10-16 00:01:54
|
Hi,=0A=0A I just began to use JBoost and I have some questions.=0A=0A = First, I tested the visualization tools to make sure I had the same result= s as shown on http://www.cs.ucsd.edu/~aarvey/jboost/doc.html#visualization.= I had to modify the atree2dot2ps.pl script because it does not use the "--= dir" option everywhere in the script. Is it the expected behavior ?=0A=0A = Second, I was not able to reproduce the margin output. My problem is that= I run JBoost on Windows XP and the margin script uses 'cat' and other Unix= commands. I am a C/C++ programmer, I don't know Perl nor Python. Even thou= gh these languages are similar to C I still wonder how the files spambase.t= rain.margin and spambase.train.scores can be processed to output the margin= graphic. And actually, what's the difference between these two files ? The= re only seems to be some positive/negative changes of values. I don't expec= t you to rewrite margin.py without taking advantage of Unix commands, but i= f you explain how I can get from the data files to the output graphic, I'll= code myself a Windows equivalent.=0A=0AThanks,=0A=0ADavid R.=0A=0A=0A=0A= =0A Be smarter than spam. See how smart SpamGuard is at giving junk em= ail the boot with the All-new Yahoo! Mail at http://mrd.mail.yahoo.com/try_= beta?.intl=3Dca=0A |
From: Aaron A. <aa...@cs...> - 2007-05-15 04:22:25
|
On Mon, 14 May 2007, Aaron Arvey wrote: > this is another test msg > > aaron > |
From: Aaron A. <aa...@cs...> - 2007-05-15 04:20:51
|
this is another test msg aaron |
From: Aaron A. <aa...@cs...> - 2007-05-15 04:02:11
|
this is a test message. Aaron |