From: Haiyong X. <hai...@gm...> - 2007-08-15 14:40:53
|
Hi there, I found current implementation of JBoost mainly focus on Alternating Decision Tree (ADT), which means the weak learners are similar to decision stumps. Is there any way to construct a boosting system based on another type of weak learner, say neural networks? Thanks. -- Haiyong |
From: Aaron A. <aa...@cs...> - 2007-08-17 22:16:22
|
Apologies for the late response. See inline comments. On Wed, 15 Aug 2007, Haiyong Xu wrote: > I found current implementation of JBoost mainly focus on Alternating > Decision Tree (ADT), which means the weak learners are similar to > decision stumps. ADTrees are capable of using any weak learner. The nice thing about using decision stumps with ADTrees is that since each path is a conjunction, ADTrees are able to form quite a rich representation with a limited base class of hypotheses. See the original paper for more details: http://www.cse.ucsd.edu/~yfreund/papers/atrees.pdf > Is there any way to construct a boosting system based on another type of > weak learner, say neural networks? Thanks. In JBoost, a weak learner (typically something similar to a decision stump) can be replaced by just about anything, including neural networks. If you wanted to implement a neural network weak learner, you could mimic the existing weak learners at ./src/jboost/leaner/. The easiest weak learner to understand is InequalitySplitter.java (which is created by InequalitySplitterBuilder.java). A simple neural net implementation should be fairly straight forward in this framework. Note that there are a variety of reasons why boosting neural nets may not be the best idea. The margin analysis given by Schapire et al (1998, Annals of stats) gives generalization bounds in terms of the VC dimension of the base classifier, while these bounds are certainly quite lose, they do provide some intuition why boosting may not overfit. Also note that JBoost can boost decision trees with the command ./jboost ... -ATreeType ADD_ROOT_OR_SINGLES You also have the other options: -ATreeType type The type of ATree to create. There are several options: ADD_ALL Create a full ADTree (default) ADD_ROOT Add splits only at the root producing a glat tree. This is equivalent to boosting decision stumps ADD_SINGLES Create a decision tree ADD_ROOT_OR_SINGLES Create a linear combination of decision trees. This is equivalent to simultaneously growing boosted decision trees. An existing implementation of Boosting with neural nets (and many other weak classifiers) exists in WEKA. Last time I checked, neural nets were under the name of Multilayer Perceptron. Aaron |
From: Aaron A. <aa...@cs...> - 2007-08-20 22:17:45
|
On Sat, 18 Aug 2007, Haiyong Xu wrote: > Thanks for your reply. > > Currently, I am involving in a project using pattern recognition > techniques and we build an ensemble system based on AdaBoost method. > JBoost is the only library I found which could be practically employed > in our project. Therefore, I'd like to dig into the detailed > implementation to see how to adapt it to our requirement. Great! > Through reading part of source codes, mainly about AdaBoost.java, > InequalitySpilterBuilder.java, and InstrumentedAlternatingTree.java, I > found that when choosing a candidate spliter, AdaBoost.java always > choose the one to minimize a loss criteria, which is defined as > "2*Sqrt(w_0 * w_1) - w_0 - w_1" in function BinaryBag::getLoss( ). > Actually, this minimization is not exactly the same as the one defined > in Freund's original paper "The alternating decision tree learning > algorithm" (1999). (See step 1 and 2 in Figure 3). That is absolutely correct. This is because the weak classifiers c1 and c2 are evaluated in a separate module and the booster is only informed of the (weighted) errors w.r.t. the classes. Using the notation of Freund & Mason 1999, the booster is only told of \sum W- and \sum W+ Then there are two prediction weights determined in two separate phases (one for "YES/TRUE" and one for "NO/FALSE"). This is reminiscent of where this loss function originally came from (Schapire & Singer "Improved algorithms using confidence rated predictions" 1999), which actually might be a better reference for understanding much of the reasoning behind the ADTree paper (Freund & Mason 1999). While the loss function is not identical, the motivation is to use a loss which will best separate the classes, not necessarily give the lowest weighted error w.r.t. some potential function (which was the reasoning in the original AdaBoost paper). You can also implement other loss functions. Two that are on the todo list (http://www.cs.ucsd.edu/~aarvey/jboost/todo.html) are weighted-error and information gain. There are many other loss functions that can be used, granted the Z-loss of Schapire & Singer 99 has some very nice properties. > Also, I am still trying to figure out the connection between ADT and > AdaBoost algorithm. As we know, the fininal prediction in AdaBoost is f > = a1*h1 + a2*h2 + a3*h3 +...+ aT*hT, with a1 is coefficient and H1 is > weak learner. In ADT, the fininal prediction has similar form as the > summary of prediction values. Here, I suppose the prediction value in > ADT is equivalent to a1*h1 in AdaBoost. Not exactly. I think the best way to understand the evaluation process of ADTrees is by example. There is an example provided on wikipedia at http://en.wikipedia.org/wiki/Alternating_decision_tree. There is another example in http://www.cse.ucsd.edu/~yfreund/papers/final-geneclass.pdf. The major difference in ADTrees is that h_{t} may depend on the evaluation of h_{1}...h_{t-1}. Namely, each path is a conjunction of weak hypotheses. For instance, not all examples are evaluated by the t^{th} hypothesis; only those examples that make it to the decision node of the t^{th} hypothesis are actually evaluated. This allows very weak classes of hypotheses (e.g. decision stumps) to create fairly complex hyptheses (e.g. decision rectangles) without doing the large number of iterations to create the same classifier with standard boosting. While this may lead to overfitting, it does not seem to do so empirically. > The marginal based generalization bound analysis proposed by Schapire is a > very nice thing. And in recent years, many reserachers have put efforts on > searching algorithms to maximize margin. See papers: > > "Soft Margins for AdaBoost", Ratsch, 17th Annual Conference on > Computational Learning Theory, 2004. > > "Boosting Based on a Smooth Margin", Rudin, 17th Annual Conference on > Computational Learning Theory, 2004. > > "ANALYSIS OF BOOSTING ALGORITHMS USING THE SMOOTH MARGIN FUNCTION", > Rudin, Submitted to the Annals of Statistics. 2007 > > And I would like to make use of those maximizing magin algorithms to build > an AdaBoost ensemble system. At this time, I am thinking about implement it > on the basis of JBoost architecture. I have actually not read these three particular papers (they're certainly on my stack). However, I know that a pitfall of some "margin maximizing" algorithms is that they maximize the *minimal* margin, not the margin distribution (which is perhaps somewhat ill defined for maximizing). Also keep in mind that while margin maximization is very important, it may not be the only explanation for why AdaBoost generalizes so well (see Reyzin & Schapire 06). Also, maximizing the minimum margin may allow noisy data to severely impair the learning process. I believe that the "Soft margins..." (Ratsch) paper discusses noise resistant boosting. While JBoost doesn't have the exact algorithm of Ratsch, BrownBoost (use CVS version, there may be small bugs in version 1.3.1) can be used to be noise resistant. This works by setting the exact amount of noise you want to allow. This can be tweaked quickly by using the -potential flag. You can then visualize the creation of the margin distribution using the surfing.py script (again, use the CVS version). This may not be exactly what you need, but it may give some intuition as to what is happening in the "boosting the margin" interpretation of boosting. If you wish to implement a "margin maximizing" booster, I'd recommend starting with one of the simpler methods, e.g. LPBoost. Also, since some of these methods require keeping all previous hypotheses available, I'd recommend keeping the data structures (matrices, etc) in the booster class. See the (inefficient, though correct) BrownBoost implementation for an example of how to keep past hypotheses stored in the booster class. > Thank you for your maintenance of open sourced project JBoost, and hope we > can cooperate on JBoost project. I'm glad to hear that you're interested in contributing! I'm not sure if you've contributed to open source projects in the past; either way, I'd recommend reading http://www.cs.ucsd.edu/~aarvey/jboost/contribute.html (I just added some more content to it) and some of the links. Cheers, Aaron > On 8/17/07, Aaron Arvey <aa...@cs...> wrote: >> >> Apologies for the late response. See inline comments. >> >> On Wed, 15 Aug 2007, Haiyong Xu wrote: >> >>> I found current implementation of JBoost mainly focus on Alternating >>> Decision Tree (ADT), which means the weak learners are similar to >>> decision stumps. >> >> ADTrees are capable of using any weak learner. The nice thing about using >> decision stumps with ADTrees is that since each path is a conjunction, >> ADTrees are able to form quite a rich representation with a limited base >> class of hypotheses. See the original paper for more details: >> http://www.cse.ucsd.edu/~yfreund/papers/atrees.pdf >> >>> Is there any way to construct a boosting system based on another type of >>> weak learner, say neural networks? Thanks. >> >> In JBoost, a weak learner (typically something similar to a decision >> stump) can be replaced by just about anything, including neural networks. >> If you wanted to implement a neural network weak learner, you could mimic >> the existing weak learners at ./src/jboost/leaner/. The easiest weak >> learner to understand is InequalitySplitter.java (which is created by >> InequalitySplitterBuilder.java). A simple neural net implementation >> should be fairly straight forward in this framework. >> >> Note that there are a variety of reasons why boosting neural nets may not >> be the best idea. The margin analysis given by Schapire et al (1998, >> Annals of stats) gives generalization bounds in terms of the VC dimension >> of the base classifier, while these bounds are certainly quite lose, they >> do provide some intuition why boosting may not overfit. >> >> Also note that JBoost can boost decision trees with the command >> >> ./jboost ... -ATreeType ADD_ROOT_OR_SINGLES >> >> You also have the other options: >> >> -ATreeType type The type of ATree to create. There are several >> options: >> ADD_ALL Create a full ADTree >> (default) >> ADD_ROOT Add splits only at the >> root producing a glat tree. >> This is equivalent to >> boosting decision stumps >> ADD_SINGLES Create a decision tree >> ADD_ROOT_OR_SINGLES Create a linear >> combination of decision trees. >> This is equivalent to >> simultaneously growing >> boosted decision trees. >> >> An existing implementation of Boosting with neural nets (and many other >> weak classifiers) exists in WEKA. Last time I checked, neural nets were >> under the name of Multilayer Perceptron. >> >> Aaron >> >> > |