From: Gungor P. <pol...@Pr...> - 2008-08-12 22:57:50
|
Hi Aaron, Thank you very much for your detailed explanations. The major issue for me was how the current algorithm use the weights. Is the implementation buggy or the idea prior to implement? if the implementation is buggy, the thing I am curious about is the idea behind weighting the data in the current implementation. I am looking forward to good news on this thing. I also looked at the code and I am confused about the the variable names. Does it mention about the Distribution updated at each iterationof boosting as also weights? Thanks again, Best, Gungor Aaron Arvey yazm?s,: > Hi Gungor, > > Glad to hear you're working with JBoost! > > See comments inline below. > > On Tue, 12 Aug 2008, Gungor Polatkan wrote: > >> 1) First question is about the weight input. The meaning(higher weight >> implies greater importance to classify correctly) is fundamentally >> important for us since it is the heart of our research project. How does >> the algorithm do that? Is there any paper related to this idea? or is it >> just a practical empirical method just by changing the initial >> distribution? Do you guys know anything about that? Any information >> about this thing will help me very much. Also what is the bug currently >> in the weighting? looking for the news... >> |weight| an initial weighting of the data (higher weight implies >> greater importance to classify correctly) THERE IS A BUG IN WEIGHTING IN >> MOST VERSIONS. MORE NEWS SOON. default = 1.0 Optional > > The bug in weighting has still not been fixed. All I know is that the > final output from data with weights is not as would be expected (I > verified this myself several months ago). The weighting itself is read > in correctly (from what I could tell by output), but the way it is > applied is somehow buggy. It is only applied in a couple locations, so > it is somewhat unnerving that it causes such abnormal behavior. > > There are many other ways to reweight your data other than using the > provided weight option. Depending how large and extreme your class > distributions are, you can just oversample the smaller class prior to > input to JBoost. Keep in mind that the first weak hypothesis is always > "Class is +1" so any lopsidedness in the data will be reweighted by > the score given to this classifier and the subsequent reweighting on > examples. > > NOTE: The "Class is +1" classifier will rebalance data that isn't > *too* skewed in class distribution. The fact that it doesn't balance > the classes when they are skewed is considered to be a small bug (I > verified this as well, around the same time I verified the weight > bug). However, if you oversample the data so that the classes aren't > *too* skewed (I've done 10:1 without problem), then sliding the score > for "Class is +1" should provide you with control over > sensitivity/specificity. > >> 2) Second question is about the weak learner Jboost use. Since my data >> features are Real Values (not binary or discrete but -inf to +inf Real >> Numbers), I think I should use decision stumps with real thresholds. >> Does the algorithm consider such a thing (for binary feature a simpler >> stump should be used and for real valued another one)? > > If you run JBoost with default boosting parameters, it will use > decision stumps for weak learners. Boolean values can be seen as a > subset of real values (-1 is false, +1 is true) and the decision > stumps would then be "<0" for false and ">0" for true. > > Also, I believe I remember there's a bug with "+inf" "-inf" values (as > may happen from output. I'd recommend replacing all -inf and +inf > values with a real value larger than all other values. The weak > learning algorithms will treat the largest (smallest) values as +inf > (-inf). > > Try the default parameters for boosting and let me know if you need > any more guidance on this topic. > >> 3)For the modification, are all the source codes in the SRC folder ? > > Yes. There are some scripts in jboost-VERSION/scripts that are helpful > in visualizing the output, but all the code you'll likely want to edit > is in jboost-VERSION/src. > > > > Let me know if this answers your questions or if you have any other > inquiries. > > Aaron |