|
From: Hendrik B. <hen...@cs...> - 2013-08-20 18:18:10
|
Dear Martin, I don't know if anyone has answered you yet. I can't answer the pruning question. But regarding your question about learning with missing labels: - Learning from data with lots of missing labels is related to "semi-supervised learning". - Most methods for semi-supervised learning essentially use the information that is in the distribution of the unlabeled instances to get improved performance. - Such a setting can be mimicked in Clus by using VarianceReduction with variance being defined as a mix of variance in the predictor space and in the target space. I believe there is a setting in Clus that allows you to list the attributes that should be used for the variance computation. By including non-target attributes here, Clus will behave more like a semi-supervised learner, which means it may work better on your data. No guarantee though - this much depends on the data. Hope this helps. Best, Hendrik On 15 Aug 2013, at 12:17, Martin Guetlein wrote: > Hi, > > First of all: hello to everybody on the list, and thanks to the > authors of Clus to provide such a nice tool. > > We are using Clus as one possible MLC method, applied to our > toxicological dataset. The data has around 700 instances, around 25 > labels and lots (!) of missing labels. > > Unfortunately Clus (and the other MLC) methods do not perform as good > as we would like to (Clus predictions have only around 60 percent > accuracy and auc). > > Are there any settings especially suited for datasets with many > missing labels? So far, I tried using GainRatio instead of > VarianceReduction as heuristic. > > I have disabled pruning, and it showed no effect (identical results > with Pruning=C4.5). Is there an explanation for this? > > Thanks and kind regards, > Martin > > > P.S.: > I have adjusted the Clus source code to commons-math3-3.2 and > weka-3-7-6, if s.o. is interested, I can share my changes > > > -- > Dipl-Inf. Martin Gütlein > Phone: > +49 (0)761 203 8442 (office) > +49 (0)177 623 9499 (mobile) > Email: > gue...@in... > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite! > It's a free troubleshooting tool designed for production. > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk > _______________________________________________ > Clus-general mailing list > Clu...@li... > https://lists.sourceforge.net/lists/listinfo/clus-general Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm |