You can subscribe to this list here.
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(7) |
Nov
|
Dec
(2) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2008 |
Jan
(8) |
Feb
(4) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(7) |
Aug
(15) |
Sep
(5) |
Oct
|
Nov
(3) |
Dec
|
2009 |
Jan
(5) |
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
(5) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(3) |
2011 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Sheng C. <she...@gm...> - 2011-04-01 21:18:28
|
Hi, I've been using Jboost through my 2010 summer internship, I really loved it and I actually made some contribution to the source code of jboost (some cost-senstive learning algorithm) Now I started my new job, I wanna continue the success of jboost on my new problem. The new problem is to do classification on a database with extremely highly dimensional (around 16000), highly sparse (every instance generally have no more than 10 features available) feature space. The volume of the database is also huge, millions of.... The data set is generated by counting tf-idf feature of some text corpus. If I treat each feature as a separate feature for jboost input, the data file would be prohibitive to be generated and loaded by jboost (countless of commas in every instance). jboost cannot even finish load all data into memory... I tried -Xmx4G, still failed. I am wondering if there is a way to do some thing smart like svm-ligh or libsvm data format, i.e., you ONLY need to specify available feature, those missing feature can be ignored in the data file. This way size of the data set would be significantly shrinked, and hopefully jboost can process more efficiently accordingly. Sheng |
From: Glenn M. <gle...@gm...> - 2010-12-13 22:16:38
|
Hi Aaron, Thanks for the prompt reply. Your comment on using several weak thresholding classifiers makes sense. Changing all the values (0 -> 1) in the first data line did indeed change the classification score. I don't think you're right about the two columns, though. Since they always have the same magnitude I looked into the code and saw that the code is in fact printing out {p, -p}, where p, it seems, is prediction. It turns out that margins information can be generated when the tree is created. The generated comment for predict(String[] as) says it returns "an array of scores corresponding to the classes: +1 and -1". Are "classes" the same as labels? Thanks, Glenn On Mon, Dec 13, 2010 at 12:51 PM, Aaron Arvey <aa...@cb...> wrote: > Hey Glenn, > > I haven't used JBoost in a while, but I have a couple of guesses that may > answer your questions. > >> I ran Predict ("java -cp .:../dist/jboost.jar Predict < >> spambase.data") against the original data. I got two columns of >> output that looked like >> >> 5.00073612523801 -5.00073612523801 >> 11.864681207163063 -11.864681207163063 >> 8.780744089260097 -8.780744089260097 >> ... >> Why are there two columns with the same magnitudes? I'm guessing that >> these are is/is not spam scores, but they seem redundant. >> > > Guess: One is margin and the other is classification score. You can > determine this by looking at labels*column1 or labels*column2 and see if > the results match the other column. > >> It would seem that changing a value in the first line of spambase.data >> would change the classification score I see above, but it doesn't. I >> changed the first value in >> >> 0,0.64,0.64,0,0.32,0,0,0,0,0,0,0.64,0,0,0,0.32,0,1.29,1.93,0,0.96,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.778,0,0,3.756,61,278,+1; >> >> from 0 to other values, but the first classification score >> (5.00073612523801) didn't change. Why is that? > > Guess: Boosting doesn't produce a linear classifier. Depending on the > number of iterations used, you may have used fewer dimensions than exist > in the data. In fact, even if you change every value in an example, the > score may still be the same. This is due to JBoost using thresholding > weak classifiers. If you look at the actual tree (either at the raw file > or see documentation about visualization), you should be able to determine > which dimension where used and at what thresholds. If you change one of > these dimensions so that it is on the other side of the threshold, you > should see a change in output value. > > Hope that helps! > > Aaron > > > > > > > |
From: Aaron A. <aa...@cb...> - 2010-12-13 19:51:56
|
Hey Glenn, I haven't used JBoost in a while, but I have a couple of guesses that may answer your questions. > I ran Predict ("java -cp .:../dist/jboost.jar Predict < > spambase.data") against the original data. I got two columns of > output that looked like > > 5.00073612523801 -5.00073612523801 > 11.864681207163063 -11.864681207163063 > 8.780744089260097 -8.780744089260097 > ... > Why are there two columns with the same magnitudes? I'm guessing that > these are is/is not spam scores, but they seem redundant. > Guess: One is margin and the other is classification score. You can determine this by looking at labels*column1 or labels*column2 and see if the results match the other column. > It would seem that changing a value in the first line of spambase.data > would change the classification score I see above, but it doesn't. I > changed the first value in > > 0,0.64,0.64,0,0.32,0,0,0,0,0,0,0.64,0,0,0,0.32,0,1.29,1.93,0,0.96,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.778,0,0,3.756,61,278,+1; > > from 0 to other values, but the first classification score > (5.00073612523801) didn't change. Why is that? Guess: Boosting doesn't produce a linear classifier. Depending on the number of iterations used, you may have used fewer dimensions than exist in the data. In fact, even if you change every value in an example, the score may still be the same. This is due to JBoost using thresholding weak classifiers. If you look at the actual tree (either at the raw file or see documentation about visualization), you should be able to determine which dimension where used and at what thresholds. If you change one of these dimensions so that it is on the other side of the threshold, you should see a change in output value. Hope that helps! Aaron |
From: Glenn M. <gle...@gm...> - 2010-12-13 19:17:50
|
Hi, I'm looking into JBoost to do text classification. I've generated Java output code (Predict.java) with "demo/$ java -Xmx100M jboost.controller.Controller -p 2 -S spambase -j spambase.java", and run it, and had some questions. Incidentally, to compile with "javac -cp ../dist/jboost.jar Predict.java" from demo/ I had to change the paths of some classes in the main() method. I ran Predict ("java -cp .:../dist/jboost.jar Predict < spambase.data") against the original data. I got two columns of output that looked like 5.00073612523801 -5.00073612523801 11.864681207163063 -11.864681207163063 8.780744089260097 -8.780744089260097 ... Why are there two columns with the same magnitudes? I'm guessing that these are is/is not spam scores, but they seem redundant. It would seem that changing a value in the first line of spambase.data would change the classification score I see above, but it doesn't. I changed the first value in 0,0.64,0.64,0,0.32,0,0,0,0,0,0,0.64,0,0,0,0.32,0,1.29,1.93,0,0.96,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.778,0,0,3.756,61,278,+1; from 0 to other values, but the first classification score (5.00073612523801) didn't change. Why is that? Thanks, Glenn |
From: Julien H. <jh...@no...> - 2010-04-26 22:17:25
|
Hi Aaron, Thanks for your answer. Instead of this line code throw new RuntimeException("Type of split not allowed"); I write this following code to know which kind of summary type causes this issue : if (summary.type == Summary.CONTAINS_ABSTAIN){ throw new RuntimeException("Contains abstain not allowed"); } else if (summary.type == Summary.CONTAINS_NOABSTAIN) { throw new RuntimeException("Contains noabstain not allowed"); } else { throw new RuntimeException("Type of split not allowed"); } This is the stracktrace : Message:java.lang.RuntimeException: Contains noabstain not allowed So you are right. This is a issue of not handled splitters for python generation code. Do you know if this kind of splitters comes from finite attribute type? Because the problem appears when I put in my attribute some finite and text types. I will look the code of MakeCode method to know how to handle this splitter for python. Thanks. Regards, Julien Henot. 2010/4/26 Aaron Arvey <aa...@cb...> > Great to hear that your able to edit and compile the code! > > Can you send me the *output.tree file from the run? I'm interested to see > what kind of splitters the algorithm is using for your data. > > The types of splitter are given in src/jboost/learners/Summary.java > > public static final char EQUALITY = 1; > public static final char LESS_THAN = 2; > public static final char CONTAINS_ABSTAIN = 3; > public static final char CONTAINS_NOABSTAIN = 4; > > makePythonCode currently only works with EQUALITY and LESS_THAN. I'm not > actually familiar with the other two splitters. > > Another easy way to check what splitters are being used is to output the > following prior to the case statement: > > System.out.println("Splitter summary: " + summary.type) > > We should be able to get an exact diagnosis this way. > > Assuming this is the problem, it looks like there is some example code for > how to handle CONTAINS_ABSTAIN and CONTAINS_NOABSTAIN at lines 859-885 in > String makeCode(SplitterNode sn, String tab). > > Aaron > > > > On Mon, 26 Apr 2010, Julien Hénot wrote: > > > Hi aaron, > > > > Thanks for your answer ! > > > > I have changed the code of makePythonCode as you recommend, but I still > have > > an error: > > See the error message : > > > > Exception occured while attempting to write Python code > > Message:java.lang.RuntimeException: Type of split not allowed > > java.lang.RuntimeException: Type of split not allowed > > at > jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:716) > > at > jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) > > at > jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:705) > > at > jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) > > at > jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:713) > > at > jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) > > at > jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:705) > > at > jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) > > at jboost.atree.AlternatingTree.toPython(AlternatingTree.java:654) > > at jboost.controller.Controller.generateCode(Controller.java:694) > > at > > jboost.controller.Controller.outputLearningResults(Controller.java:273) > > at jboost.controller.Controller.main(Controller.java:91) > > > > > > For this line : 1064 ((Integer)summary.val) + ":\n"; > > I tried different casting, but I have still an error. > > > > Thanks in advance. > > > > Regards, > > Julien. > > > > > > 2010/4/23 Aaron Arvey <aa...@cb...> > > > > > Hi Julien, > > > > > > First, thanks for the detailed error message. > > > > > > While I'm no longer a main developer (and so I won't edit the public > > > repository), it looks like the problem is a simple typo. > > > > > > In AlternatingTree.java, there should be code that looks like this > (line > > > numbers may not match up): > > > > > > 1051 private String makePythonCode(SplitterNode sn, String tab) { > > > 1052 String code = ""; > > > 1053 > > > 1054 Summary summary=sn.splitter.getSummary(); > > > 1055 > > > 1056 code += tab + (allDefined > > > 1057 ? " " > > > 1058 : "if def(" + (summary.index+1) + "):") > > > 1059 + " # " + sn.id + "\n"; > > > 1060 String stab = (allDefined ? tab : tab); > > > 1061 switch(summary.type) { > > > 1062 case Summary.EQUALITY: > > > 1063 code += stab + "if self.get_data_value(x,'" + > > > ad[(summary.index)].getAttributeName() + "') == " + > > > 1064 ((Integer)summary.val) + ":\n"; > > > 1065 code += makeMatlabCode(sn.predictorNodes[0], stab + "\t"); > > > 1066 code += stab + "else:\n"; > > > 1067 code += makeMatlabCode(sn.predictorNodes[1], stab + "\t"); > > > 1068 break; > > > 1069 case Summary.LESS_THAN: > > > 1070 code += stab + "if self.get_data_value(x,'" + > > > ad[(summary.index)].getAttributeName() + "') <= " + > > > 1071 ((Double) summary.val) + ":\n"; > > > 1072 code += makePythonCode(sn.predictorNodes[0], stab + "\t"); > > > 1073 code += stab + "else:\n"; > > > 1074 code += makePythonCode(sn.predictorNodes[1], stab + "\t"); > > > 1075 break; > > > 1076 default: > > > 1077 throw new RuntimeException("Type of split not allowed"); > > > 1078 } > > > 1079 > > > 1080 return code; > > > 1081 } > > > > > > > > > Notice the three lines: > > > > > > 1065 code += makeMatlabCode(sn.predictorNodes[0], stab + "\t"); > > > 1066 code += stab + "else:\n"; > > > 1067 code += makeMatlabCode(sn.predictorNodes[1], stab + "\t"); > > > > > > These calls should be changed to recursively call the Python method > > > whereas they are currently calling the matlab method: > > > > > > 1065 code += makePythonCode(sn.predictorNodes[0], stab + "\t"); > > > 1066 code += stab + "else:\n"; > > > 1067 code += makePythonCode(sn.predictorNodes[1], stab + "\t"); > > > > > > You may also have to change the line > > > > > > 1064 ((Integer)summary.val) + ":\n"; > > > > > > So that it can handle strings (though the strings may be converted to > > > integers, and this may be easier to edit once the python code has been > > > outputted using a dictionary, etc). I don't remember the exact > internal > > > string representation, but you can email again if you have further > > > problems. > > > > > > Running 'ant jar' or 'ant dist' in the top directory will compile the > code > > > (make sure the file 'JBOOST_DIR/dist/jboost.jar' has been updated). > > > > > > Aaron > > > > > > > > > > > > On Fri, 23 Apr 2010, Julien Hénot wrote: > > > > > > > Hi all, > > > > > > > > I am a new user of JBoost and I'm very enthusiastic and happy to use > it. > > > > It's really simple to start and I succeed easily in having some good > > > > results. > > > > > > > > I would ask you a little question. > > > > > > > > When I run my Jboost command with generation of python code, I have > this > > > > exception : > > > > > > > > Exception occured while attempting to write Python code > > > > Message:java.lang.RuntimeException: Type of split not allowed > > > > java.lang.RuntimeException: Type of split not allowed > > > > at > > > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:780) > > > > at > > > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:751) > > > > at > > > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:765) > > > > at > > > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:751) > > > > at > > > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:775) > > > > at > > > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:751) > > > > at > > > jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:705) > > > > at > > > jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) > > > > at > jboost.atree.AlternatingTree.toPython(AlternatingTree.java:654) > > > > at jboost.controller.Controller.generateCode(Controller.java:694) > > > > at > > > > > jboost.controller.Controller.outputLearningResults(Controller.java:273) > > > > at jboost.controller.Controller.main(Controller.java:91) > > > > > > > > > > > > Have you an idea ? I look the code but I did not succeed to fix the > > > problem. > > > > > > > > I have different types of value in my spec file : number, text, and > > > finite > > > > (none,same,not) > > > > > > > > Thanks in advance, > > > > > > > > Regards, > > > > Julien Henot > > > > > > > > > > |
From: Aaron A. <aa...@cb...> - 2010-04-26 21:27:31
|
Great to hear that your able to edit and compile the code! Can you send me the *output.tree file from the run? I'm interested to see what kind of splitters the algorithm is using for your data. The types of splitter are given in src/jboost/learners/Summary.java public static final char EQUALITY = 1; public static final char LESS_THAN = 2; public static final char CONTAINS_ABSTAIN = 3; public static final char CONTAINS_NOABSTAIN = 4; makePythonCode currently only works with EQUALITY and LESS_THAN. I'm not actually familiar with the other two splitters. Another easy way to check what splitters are being used is to output the following prior to the case statement: System.out.println("Splitter summary: " + summary.type) We should be able to get an exact diagnosis this way. Assuming this is the problem, it looks like there is some example code for how to handle CONTAINS_ABSTAIN and CONTAINS_NOABSTAIN at lines 859-885 in String makeCode(SplitterNode sn, String tab). Aaron On Mon, 26 Apr 2010, Julien Hénot wrote: > Hi aaron, > > Thanks for your answer ! > > I have changed the code of makePythonCode as you recommend, but I still have > an error: > See the error message : > > Exception occured while attempting to write Python code > Message:java.lang.RuntimeException: Type of split not allowed > java.lang.RuntimeException: Type of split not allowed > at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:716) > at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) > at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:705) > at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) > at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:713) > at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) > at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:705) > at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) > at jboost.atree.AlternatingTree.toPython(AlternatingTree.java:654) > at jboost.controller.Controller.generateCode(Controller.java:694) > at > jboost.controller.Controller.outputLearningResults(Controller.java:273) > at jboost.controller.Controller.main(Controller.java:91) > > > For this line : 1064 ((Integer)summary.val) + ":\n"; > I tried different casting, but I have still an error. > > Thanks in advance. > > Regards, > Julien. > > > 2010/4/23 Aaron Arvey <aa...@cb...> > > > Hi Julien, > > > > First, thanks for the detailed error message. > > > > While I'm no longer a main developer (and so I won't edit the public > > repository), it looks like the problem is a simple typo. > > > > In AlternatingTree.java, there should be code that looks like this (line > > numbers may not match up): > > > > 1051 private String makePythonCode(SplitterNode sn, String tab) { > > 1052 String code = ""; > > 1053 > > 1054 Summary summary=sn.splitter.getSummary(); > > 1055 > > 1056 code += tab + (allDefined > > 1057 ? " " > > 1058 : "if def(" + (summary.index+1) + "):") > > 1059 + " # " + sn.id + "\n"; > > 1060 String stab = (allDefined ? tab : tab); > > 1061 switch(summary.type) { > > 1062 case Summary.EQUALITY: > > 1063 code += stab + "if self.get_data_value(x,'" + > > ad[(summary.index)].getAttributeName() + "') == " + > > 1064 ((Integer)summary.val) + ":\n"; > > 1065 code += makeMatlabCode(sn.predictorNodes[0], stab + "\t"); > > 1066 code += stab + "else:\n"; > > 1067 code += makeMatlabCode(sn.predictorNodes[1], stab + "\t"); > > 1068 break; > > 1069 case Summary.LESS_THAN: > > 1070 code += stab + "if self.get_data_value(x,'" + > > ad[(summary.index)].getAttributeName() + "') <= " + > > 1071 ((Double) summary.val) + ":\n"; > > 1072 code += makePythonCode(sn.predictorNodes[0], stab + "\t"); > > 1073 code += stab + "else:\n"; > > 1074 code += makePythonCode(sn.predictorNodes[1], stab + "\t"); > > 1075 break; > > 1076 default: > > 1077 throw new RuntimeException("Type of split not allowed"); > > 1078 } > > 1079 > > 1080 return code; > > 1081 } > > > > > > Notice the three lines: > > > > 1065 code += makeMatlabCode(sn.predictorNodes[0], stab + "\t"); > > 1066 code += stab + "else:\n"; > > 1067 code += makeMatlabCode(sn.predictorNodes[1], stab + "\t"); > > > > These calls should be changed to recursively call the Python method > > whereas they are currently calling the matlab method: > > > > 1065 code += makePythonCode(sn.predictorNodes[0], stab + "\t"); > > 1066 code += stab + "else:\n"; > > 1067 code += makePythonCode(sn.predictorNodes[1], stab + "\t"); > > > > You may also have to change the line > > > > 1064 ((Integer)summary.val) + ":\n"; > > > > So that it can handle strings (though the strings may be converted to > > integers, and this may be easier to edit once the python code has been > > outputted using a dictionary, etc). I don't remember the exact internal > > string representation, but you can email again if you have further > > problems. > > > > Running 'ant jar' or 'ant dist' in the top directory will compile the code > > (make sure the file 'JBOOST_DIR/dist/jboost.jar' has been updated). > > > > Aaron > > > > > > > > On Fri, 23 Apr 2010, Julien Hénot wrote: > > > > > Hi all, > > > > > > I am a new user of JBoost and I'm very enthusiastic and happy to use it. > > > It's really simple to start and I succeed easily in having some good > > > results. > > > > > > I would ask you a little question. > > > > > > When I run my Jboost command with generation of python code, I have this > > > exception : > > > > > > Exception occured while attempting to write Python code > > > Message:java.lang.RuntimeException: Type of split not allowed > > > java.lang.RuntimeException: Type of split not allowed > > > at > > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:780) > > > at > > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:751) > > > at > > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:765) > > > at > > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:751) > > > at > > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:775) > > > at > > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:751) > > > at > > jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:705) > > > at > > jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) > > > at jboost.atree.AlternatingTree.toPython(AlternatingTree.java:654) > > > at jboost.controller.Controller.generateCode(Controller.java:694) > > > at > > > jboost.controller.Controller.outputLearningResults(Controller.java:273) > > > at jboost.controller.Controller.main(Controller.java:91) > > > > > > > > > Have you an idea ? I look the code but I did not succeed to fix the > > problem. > > > > > > I have different types of value in my spec file : number, text, and > > finite > > > (none,same,not) > > > > > > Thanks in advance, > > > > > > Regards, > > > Julien Henot > > > > > > |
From: Julien H. <jh...@no...> - 2010-04-26 08:52:01
|
Hi aaron, Thanks for your answer ! I have changed the code of makePythonCode as you recommend, but I still have an error: See the error message : Exception occured while attempting to write Python code Message:java.lang.RuntimeException: Type of split not allowed java.lang.RuntimeException: Type of split not allowed at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:716) at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:705) at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:713) at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:705) at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) at jboost.atree.AlternatingTree.toPython(AlternatingTree.java:654) at jboost.controller.Controller.generateCode(Controller.java:694) at jboost.controller.Controller.outputLearningResults(Controller.java:273) at jboost.controller.Controller.main(Controller.java:91) For this line : 1064 ((Integer)summary.val) + ":\n"; I tried different casting, but I have still an error. Thanks in advance. Regards, Julien. 2010/4/23 Aaron Arvey <aa...@cb...> > Hi Julien, > > First, thanks for the detailed error message. > > While I'm no longer a main developer (and so I won't edit the public > repository), it looks like the problem is a simple typo. > > In AlternatingTree.java, there should be code that looks like this (line > numbers may not match up): > > 1051 private String makePythonCode(SplitterNode sn, String tab) { > 1052 String code = ""; > 1053 > 1054 Summary summary=sn.splitter.getSummary(); > 1055 > 1056 code += tab + (allDefined > 1057 ? " " > 1058 : "if def(" + (summary.index+1) + "):") > 1059 + " # " + sn.id + "\n"; > 1060 String stab = (allDefined ? tab : tab); > 1061 switch(summary.type) { > 1062 case Summary.EQUALITY: > 1063 code += stab + "if self.get_data_value(x,'" + > ad[(summary.index)].getAttributeName() + "') == " + > 1064 ((Integer)summary.val) + ":\n"; > 1065 code += makeMatlabCode(sn.predictorNodes[0], stab + "\t"); > 1066 code += stab + "else:\n"; > 1067 code += makeMatlabCode(sn.predictorNodes[1], stab + "\t"); > 1068 break; > 1069 case Summary.LESS_THAN: > 1070 code += stab + "if self.get_data_value(x,'" + > ad[(summary.index)].getAttributeName() + "') <= " + > 1071 ((Double) summary.val) + ":\n"; > 1072 code += makePythonCode(sn.predictorNodes[0], stab + "\t"); > 1073 code += stab + "else:\n"; > 1074 code += makePythonCode(sn.predictorNodes[1], stab + "\t"); > 1075 break; > 1076 default: > 1077 throw new RuntimeException("Type of split not allowed"); > 1078 } > 1079 > 1080 return code; > 1081 } > > > Notice the three lines: > > 1065 code += makeMatlabCode(sn.predictorNodes[0], stab + "\t"); > 1066 code += stab + "else:\n"; > 1067 code += makeMatlabCode(sn.predictorNodes[1], stab + "\t"); > > These calls should be changed to recursively call the Python method > whereas they are currently calling the matlab method: > > 1065 code += makePythonCode(sn.predictorNodes[0], stab + "\t"); > 1066 code += stab + "else:\n"; > 1067 code += makePythonCode(sn.predictorNodes[1], stab + "\t"); > > You may also have to change the line > > 1064 ((Integer)summary.val) + ":\n"; > > So that it can handle strings (though the strings may be converted to > integers, and this may be easier to edit once the python code has been > outputted using a dictionary, etc). I don't remember the exact internal > string representation, but you can email again if you have further > problems. > > Running 'ant jar' or 'ant dist' in the top directory will compile the code > (make sure the file 'JBOOST_DIR/dist/jboost.jar' has been updated). > > Aaron > > > > On Fri, 23 Apr 2010, Julien Hénot wrote: > > > Hi all, > > > > I am a new user of JBoost and I'm very enthusiastic and happy to use it. > > It's really simple to start and I succeed easily in having some good > > results. > > > > I would ask you a little question. > > > > When I run my Jboost command with generation of python code, I have this > > exception : > > > > Exception occured while attempting to write Python code > > Message:java.lang.RuntimeException: Type of split not allowed > > java.lang.RuntimeException: Type of split not allowed > > at > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:780) > > at > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:751) > > at > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:765) > > at > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:751) > > at > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:775) > > at > jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:751) > > at > jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:705) > > at > jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) > > at jboost.atree.AlternatingTree.toPython(AlternatingTree.java:654) > > at jboost.controller.Controller.generateCode(Controller.java:694) > > at > > jboost.controller.Controller.outputLearningResults(Controller.java:273) > > at jboost.controller.Controller.main(Controller.java:91) > > > > > > Have you an idea ? I look the code but I did not succeed to fix the > problem. > > > > I have different types of value in my spec file : number, text, and > finite > > (none,same,not) > > > > Thanks in advance, > > > > Regards, > > Julien Henot > > > |
From: Aaron A. <aa...@cb...> - 2010-04-23 18:06:07
|
Hi Julien, First, thanks for the detailed error message. While I'm no longer a main developer (and so I won't edit the public repository), it looks like the problem is a simple typo. In AlternatingTree.java, there should be code that looks like this (line numbers may not match up): 1051 private String makePythonCode(SplitterNode sn, String tab) { 1052 String code = ""; 1053 1054 Summary summary=sn.splitter.getSummary(); 1055 1056 code += tab + (allDefined 1057 ? " " 1058 : "if def(" + (summary.index+1) + "):") 1059 + " # " + sn.id + "\n"; 1060 String stab = (allDefined ? tab : tab); 1061 switch(summary.type) { 1062 case Summary.EQUALITY: 1063 code += stab + "if self.get_data_value(x,'" + ad[(summary.index)].getAttributeName() + "') == " + 1064 ((Integer)summary.val) + ":\n"; 1065 code += makeMatlabCode(sn.predictorNodes[0], stab + "\t"); 1066 code += stab + "else:\n"; 1067 code += makeMatlabCode(sn.predictorNodes[1], stab + "\t"); 1068 break; 1069 case Summary.LESS_THAN: 1070 code += stab + "if self.get_data_value(x,'" + ad[(summary.index)].getAttributeName() + "') <= " + 1071 ((Double) summary.val) + ":\n"; 1072 code += makePythonCode(sn.predictorNodes[0], stab + "\t"); 1073 code += stab + "else:\n"; 1074 code += makePythonCode(sn.predictorNodes[1], stab + "\t"); 1075 break; 1076 default: 1077 throw new RuntimeException("Type of split not allowed"); 1078 } 1079 1080 return code; 1081 } Notice the three lines: 1065 code += makeMatlabCode(sn.predictorNodes[0], stab + "\t"); 1066 code += stab + "else:\n"; 1067 code += makeMatlabCode(sn.predictorNodes[1], stab + "\t"); These calls should be changed to recursively call the Python method whereas they are currently calling the matlab method: 1065 code += makePythonCode(sn.predictorNodes[0], stab + "\t"); 1066 code += stab + "else:\n"; 1067 code += makePythonCode(sn.predictorNodes[1], stab + "\t"); You may also have to change the line 1064 ((Integer)summary.val) + ":\n"; So that it can handle strings (though the strings may be converted to integers, and this may be easier to edit once the python code has been outputted using a dictionary, etc). I don't remember the exact internal string representation, but you can email again if you have further problems. Running 'ant jar' or 'ant dist' in the top directory will compile the code (make sure the file 'JBOOST_DIR/dist/jboost.jar' has been updated). Aaron On Fri, 23 Apr 2010, Julien Hénot wrote: > Hi all, > > I am a new user of JBoost and I'm very enthusiastic and happy to use it. > It's really simple to start and I succeed easily in having some good > results. > > I would ask you a little question. > > When I run my Jboost command with generation of python code, I have this > exception : > > Exception occured while attempting to write Python code > Message:java.lang.RuntimeException: Type of split not allowed > java.lang.RuntimeException: Type of split not allowed > at jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:780) > at jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:751) > at jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:765) > at jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:751) > at jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:775) > at jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:751) > at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:705) > at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) > at jboost.atree.AlternatingTree.toPython(AlternatingTree.java:654) > at jboost.controller.Controller.generateCode(Controller.java:694) > at > jboost.controller.Controller.outputLearningResults(Controller.java:273) > at jboost.controller.Controller.main(Controller.java:91) > > > Have you an idea ? I look the code but I did not succeed to fix the problem. > > I have different types of value in my spec file : number, text, and finite > (none,same,not) > > Thanks in advance, > > Regards, > Julien Henot > |
From: Julien H. <jh...@no...> - 2010-04-23 13:09:41
|
Hi all, I am a new user of JBoost and I'm very enthusiastic and happy to use it. It's really simple to start and I succeed easily in having some good results. I would ask you a little question. When I run my Jboost command with generation of python code, I have this exception : Exception occured while attempting to write Python code Message:java.lang.RuntimeException: Type of split not allowed java.lang.RuntimeException: Type of split not allowed at jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:780) at jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:751) at jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:765) at jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:751) at jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:775) at jboost.atree.AlternatingTree.makeMatlabCode(AlternatingTree.java:751) at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:705) at jboost.atree.AlternatingTree.makePythonCode(AlternatingTree.java:691) at jboost.atree.AlternatingTree.toPython(AlternatingTree.java:654) at jboost.controller.Controller.generateCode(Controller.java:694) at jboost.controller.Controller.outputLearningResults(Controller.java:273) at jboost.controller.Controller.main(Controller.java:91) Have you an idea ? I look the code but I did not succeed to fix the problem. I have different types of value in my spec file : number, text, and finite (none,same,not) Thanks in advance, Regards, Julien Henot |
From: Evan E. <eet...@cs...> - 2009-06-19 20:28:45
|
i hear you =) -------- Evan Ettinger CSE Department, UCSD On Fri, Jun 19, 2009 at 1:07 PM, Sunsern Cheamanunkul <sch...@cs...>wrote: > test > > > ------------------------------------------------------------------------------ > Are you an open source citizen? Join us for the Open Source Bridge > conference! > Portland, OR, June 17-19. Two days of sessions, one day of unconference: > $250. > Need another reason to go? 24-hour hacker lounge. Register today! > > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org > _______________________________________________ > jboost-users mailing list > jbo...@li... > https://lists.sourceforge.net/lists/listinfo/jboost-users > |
From: Sunsern C. <sch...@cs...> - 2009-06-19 20:26:08
|
test |
From: Vadim O. <vad...@gm...> - 2009-06-04 04:20:58
|
Hi, I have a set of weak predictors that I have been trying to boost, to improve on simple voting. On the first pass I used simple voting and got decent improvement from 51~52% accuracy for individual predictors to 55~57% accuracy for the unanimous voters. I was expecting to do better with jboost but actually got results that are worse. Not by much, but still, I would like to understand what I can do better. I have tried AdaBoost, LogLossBoost and BrownBoost and messed around with number of interations and various other options. Initially, and fairly quickly, both insample and outofsample error rates go to about 46% and more or less stay there. Eventually, I can observe overfitting, whereby the in-sample error is reduced while out of sample error increases with additional interations. I was wondering if anybody might have any thoughts on what might be going on. I can send sample files if somebody might be willing to play with the data. Thanks,v |
From: Aaron A. <aa...@cb...> - 2009-04-03 13:55:55
|
Try capitalizing, i.e. jboost.bat -S demo/stem Hope that helps. Aaron On Thu, 2 Apr 2009, xuhuafen408 wrote: > > Hello,I want to ask a question about Jboost. > I run JBoost on oen of the demo files: > jboost.bat -s demo/stem > the output is "JBoost Exception:ERROR:Input file names(sepc,train,test)do not exit. > but actually these files exist. > why? I don't know . > I look forward to receiving you letter! > Best wishes > > |
From: xuhuafen408 <xuh...@16...> - 2009-04-02 13:52:47
|
Hello,I want to ask a question about Jboost. I run JBoost on oen of the demo files: jboost.bat -s demo/stem the output is "JBoost Exception:ERROR:Input file names(sepc,train,test)do not exit. but actually these files exist. why? I don't know . I look forward to receiving you letter! Best wishes |
From: Aaron A. <aa...@cb...> - 2009-01-29 16:43:35
|
Look at your train/test error in *.info. It may be that the exponential loss was so small that JBoost caught an underflow exception and terminated. If you redirected output to a file, it will have an error message at the end of it stating that there was an underflow. If you haven't redirected output to a file, try ./jboost -S myfile -b AdaBoost -numRounds 100000 -ATreeType ADD_ALL > outfile 2>&1 & Which will put the output and error messages into a file. That will help with diagnosis. Aaron On Thu, 29 Jan 2009, Busa-Fekete Róbert wrote: > Dear List Members, > > I tried to run the JBoost package on the UCI pendigits dabase, but it > stopped after 530 iterations. I used the JBoost with these options: > > -b AdaBoost -numRounds 100000 -ATreeType ADD_ALL > > Why did it stop? Is there any criteria what I didn't set up? > > > BR, > Robert Busa-Fekete > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > jboost-users mailing list > jbo...@li... > https://lists.sourceforge.net/lists/listinfo/jboost-users > |
From: Busa-Fekete R. <bus...@in...> - 2009-01-29 15:42:41
|
Dear List Members, I tried to run the JBoost package on the UCI pendigits dabase, but it stopped after 530 iterations. I used the JBoost with these options: -b AdaBoost -numRounds 100000 -ATreeType ADD_ALL Why did it stop? Is there any criteria what I didn't set up? BR, Robert Busa-Fekete |
From: Aaron A. <aa...@cb...> - 2009-01-03 09:26:38
|
On Sat, 3 Jan 2009, Aaron Arvey wrote: > Hi Joseph, > > > 1) Where is the default output location for Window and Linux machines? > > The output location is the current directory. When running the nfold.py > script, the default directory is cvdata-DATE/TREE_TYPE. A clarification I just remembered: if you do not use a stem name (i.e. you specify the train, test, and spec files separately), then the files are named "noname*" or ".*" This isn't a great way to deal with output files, but it works well enough for now. Aaron |
From: Aaron A. <aa...@cb...> - 2009-01-03 05:34:57
|
Hi Joseph, > 1) Where is the default output location for Window and Linux machines? The output location is the current directory. When running the nfold.py script, the default directory is cvdata-DATE/TREE_TYPE. > 2) Is there a way to convert model output to actual Java/C++ codes? There's currrently no conversion script, but there is a way to output the code via -j or -m or -c. There's also a -P for python in the svn repository. I would say that the python code is easiest to read, use, and understand. Aaron |
From: Joseph W. <jos...@ya...> - 2009-01-02 20:23:18
|
Hi, Two questions: 1) Where is the default output location for Window and Linux machines? 2) Is there a way to convert model output to actual Java/C++ codes? thanks, J |
From: Aaron A. <aa...@cb...> - 2008-11-26 16:57:42
|
Hi Sascha, Currently JBoost is only for classification and cannot handle regression. There are many other boosting packages (e.g. the boost library in R) that can handle regression tasks. Cheers, Aaron On Wed, 26 Nov 2008, Sascha Ackermann wrote: > Hello, > > I am looking for a simple example which allows regression with jboost. Is > regression possible with jboost, or is jboost only a classification tool ? > > Thanks for your answer, > > Sascha > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > jboost-users mailing list > jbo...@li... > https://lists.sourceforge.net/lists/listinfo/jboost-users > |
From: Sascha A. <ack...@ia...> - 2008-11-26 09:49:19
|
Hello, I am looking for a simple example which allows regression with jboost. Is regression possible with jboost, or is jboost only a classification tool ? Thanks for your answer, Sascha |
From: polatkan@Princeton.EDU - 2008-11-05 05:55:10
|
BODY { font-family:Arial, Helvetica, sans-serif;font-size:12px; }Hi, I had written before about this weight thing. Is there any solution to the bug mentioned at the site about weighting the data? I would like to weight my data. I was oversampling before. However, I need some another method. To give the weights as the initial distribution I guess is the exact way to do that. (At the boosting papers the D is the distribution, maybe the current method is the same(taking the weights and give it boosting as the initial distribution) but the implementation is buggy?) Best, Gungor |
From: Aaron A. <aa...@cb...> - 2008-09-30 16:42:56
|
Dear Jason, See responses inline below. On Tue, 30 Sep 2008, Jason Kania wrote: > In looking at the examples, I am finding that the lack of documentation > on the examples themselves is really hurting my attempts to understand > what the classifier is outputting. It would make sense, for example, if > the noisy line example was completed from end to end with details on the > building and execution of the example as well as some description of the > input and interpretation of the results for some examples. The > information that is provided is insufficient to understand. > > I am able to build it, but when I ran it, the lack of some instructions > on the command line and very terse comments in the code meant that I was > unsure what I should be typing on the command line. I had to figure it > out which is quite silly for someone trying to use examples in order to > learn the application. There is a README file in the "demo" directory that describes the noisy line example. There are also a couple of very simple examples of how to run the program on the website at http://jboost.sourceforge.net/examples.html and more in depth description of the options at http://jboost.sourceforge.net/doc.html. To use the program, you should never have to look at the code. Is there anything you found in the code that isn't in the documentation? If so, please let me know and I'll post the relevant information to the website. > For the output, the example described in Wikipedia yields a single > number confidence output and this is intuitive whereas the output of two > numbers makes little sense to me. I'm not sure what you're referring to when you mention the "two numbers." Where are you seeing these two numbers? If you're referring to the .boosting.info file, it is described at http://jboost.sourceforge.net/doc.html#boost_format. > Is the confidence the summation of the outputs in the vector, is it > multidimensional or something else? This information should be on the > site so others can attempt to make use of the data. The classification is based on the sum of hypotheses. If the sum is positive, a positive label is claimed. If the sum is negative, a negative labeled is claimed. The absolute value of the sum can be considered a measure of confidence. Let me know if the above addresses your concerns. Aaron |
From: Jason K. <jas...@ro...> - 2008-09-30 13:54:59
|
Hello, In looking at the examples, I am finding that the lack of documentation on the examples themselves is really hurting my attempts to understand what the classifier is outputting. It would make sense, for example, if the noisy line example was completed from end to end with details on the building and execution of the example as well as some description of the input and interpretation of the results for some examples. The information that is provided is insufficient to understand. I am able to build it, but when I ran it, the lack of some instructions on the command line and very terse comments in the code meant that I was unsure what I should be typing on the command line. I had to figure it out which is quite silly for someone trying to use examples in order to learn the application. For the output, the example described in Wikipedia yields a single number confidence output and this is intuitive whereas the output of two numbers makes little sense to me. Is the confidence the summation of the outputs in the vector, is it multidimensional or something else? This information should be on the site so others can attempt to make use of the data. Jason |
From: Vimal V. <vim...@ya...> - 2008-09-30 09:18:37
|
Respected Sir, i am doing my research work on Boosting method and specially on Adaboost. i need the BrownBoost Algorithm that you have used in jBoost Software. also i need the statastics that is related to BrownBoost. so please provide me so i can do more work on that. i am waiting for your positive reply. thanking you, vimal vaghela |