You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
(1) |
Apr
(4) |
May
(1) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
(2) |
Apr
(10) |
May
(1) |
Jun
(13) |
Jul
(69) |
Aug
(40) |
Sep
(45) |
Oct
(21) |
Nov
(15) |
Dec
(2) |
2008 |
Jan
(44) |
Feb
(21) |
Mar
(28) |
Apr
(33) |
May
(35) |
Jun
(16) |
Jul
(12) |
Aug
(29) |
Sep
(12) |
Oct
(24) |
Nov
(36) |
Dec
(22) |
2009 |
Jan
(25) |
Feb
(19) |
Mar
(47) |
Apr
(23) |
May
(39) |
Jun
(14) |
Jul
(33) |
Aug
(12) |
Sep
(31) |
Oct
(31) |
Nov
(19) |
Dec
(13) |
2010 |
Jan
(7) |
Feb
(27) |
Mar
(26) |
Apr
(17) |
May
(10) |
Jun
(11) |
Jul
(17) |
Aug
(20) |
Sep
(31) |
Oct
(13) |
Nov
(19) |
Dec
(6) |
2011 |
Jan
(13) |
Feb
(17) |
Mar
(36) |
Apr
(19) |
May
(4) |
Jun
(14) |
Jul
(24) |
Aug
(22) |
Sep
(47) |
Oct
(35) |
Nov
(24) |
Dec
(18) |
2012 |
Jan
(28) |
Feb
(19) |
Mar
(23) |
Apr
(36) |
May
(27) |
Jun
(39) |
Jul
(29) |
Aug
(23) |
Sep
(17) |
Oct
(36) |
Nov
(60) |
Dec
(28) |
2013 |
Jan
(34) |
Feb
(23) |
Mar
(44) |
Apr
(39) |
May
(89) |
Jun
(55) |
Jul
(31) |
Aug
(47) |
Sep
(6) |
Oct
(21) |
Nov
(21) |
Dec
(10) |
2014 |
Jan
(19) |
Feb
(32) |
Mar
(11) |
Apr
(33) |
May
(22) |
Jun
(7) |
Jul
(16) |
Aug
(4) |
Sep
(20) |
Oct
(17) |
Nov
(12) |
Dec
(6) |
2015 |
Jan
(9) |
Feb
(7) |
Mar
(16) |
Apr
(5) |
May
(13) |
Jun
(27) |
Jul
(25) |
Aug
(11) |
Sep
(10) |
Oct
(7) |
Nov
(47) |
Dec
(2) |
2016 |
Jan
(9) |
Feb
(2) |
Mar
(4) |
Apr
(18) |
May
(2) |
Jun
(8) |
Jul
|
Aug
(27) |
Sep
(47) |
Oct
(28) |
Nov
(3) |
Dec
(9) |
2017 |
Jan
(11) |
Feb
(23) |
Mar
(7) |
Apr
(7) |
May
(20) |
Jun
|
Jul
(6) |
Aug
(1) |
Sep
|
Oct
(3) |
Nov
(11) |
Dec
(8) |
2018 |
Jan
(9) |
Feb
(8) |
Mar
(2) |
Apr
(2) |
May
(2) |
Jun
|
Jul
(2) |
Aug
(1) |
Sep
(2) |
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(2) |
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
From: Luca P. <mrl...@gm...> - 2017-02-09 16:52:28
|
Dear Experts, I recently read about the possibility to use a "Parameterized Machine Learning" for High-Energy Physics [1] (a technique already used by some analysis performed by the CMS experiments). Is possible with TMVA train a BDT using parametrized signal distributions? This would enormously simplify the discrimination of a signals whose kinematic vary depending on its mass for example. As far I understood, instead than performing several trainings in different mass regions, we could feed to the MVA directly the a parametrization of signal distribution as a function of a parameter (i.e. mass) and perform a single training. Thanks a lot, Luca [1] https://arxiv.org/abs/1601.07913 |
From: Bernie M <ber...@gm...> - 2017-02-09 16:18:11
|
Helge, Thanks for your reply and sorry for my slowness in reacting back. See my comments below. > MC background samples of different sources should be weighted such > that their 'integrated luminosity' is equal for the various samples. > The actual NUMBER OF EVENTS doesn't matter .. (well > it will somehow of course be a a function of the integrated lumi and > cross section). This OBVIOUSLY cannot be taken care of by TMVA > automatically, as it doesn't know about your MC production. > You misunderstood my question. My question is, assuming one does supply the info about the samples (which, of course, TMVA could not otherwise have known),does TMVA the merging consistently? The number of events do matter in the sense that these should be merged by lumi which depends both on cross section and number of events. But more than that, I'm worried about the potential bias this introduces depending on how it's done, and I don't know this. Ideally I would like that the different background samples are merged randomly (like shuffling cards) and than this is considered "the background". If that is not what is happening, than I will write an external macro to merge the background and than feed it to TMVA with weight = 1. But I would only go down this road if TMVA cannot be trusted to do this as it should. To be clear, if I do: factory->SetBackgroundWeightExpression( "mc_weight_b1" ); factory->SetBackgroundWeightExpression( "mc_weight_b2" ); factory->SetBackgroundWeightExpression( "mc_weight_b3" ); factory->SetSignalWeightExpression( "mc_weight_s" ); where mc_weight_b1 = Xb1 mc_weight_b2 = Xb2 mc_weight_b3 = Xb3 mc_weight_s = Xs will TMVA do the weighting and random merge correctly? Note that the number of events is important to the extent that lumi depends on it. Otherwise I will do this outside TMVA and stop worrying. Such a macro is not complicated, but still a few lines of what could be unnecessary code. > > The relative weighting of signal vs background is more a matter of > 'trial and error', but a good start is 'weight them such that they > have both end up with the same number of events (i.e. their respective > sum of event weights). This can be done 'automatically if you choose > "NormMode=EqualNumEvents" or "NormMode=NumEvents) in the Factory ... > but I never remember how that treats possible preselection cuts... so > best if you > choose NormMode=None and do it by hand (using the weights you mentioned > above) > OK. So, if I do NormMode=None and mc_weight_s = Xs (well, Xs*efficiency) I'm good. Got it. Cheers, Bernie > > cheers, > > Helge > > On 20 January 2017 at 22:26, Bernie M <ber...@gm...> wrote: > > Hi, > > > > I have a question on how the mc-weights are implemented in TMVA. I have a > > few background samples and one signal sample. They all have different > cross > > section and number of events. > > The mc-weight is not by default on any of these samples but I can add it > in > > by hand. My question is what is to be received by TMVA. As far as I > > understand, I could do the following: > > > > Ignore the fact that the samples have different number of events and > weight > > by cross section, that is. Assuming I have background samples with cross > > sections Xb1, Xb2, Xb3 with number of events Nb1, Nb2 and Nb3 and a > signal > > sample with cross section Xs and Ns number of events, I simply pass: > > > > factory->SetBackgroundWeightExpression( "mc_weight_b" ); > > factory->SetSignalWeightExpression( "mc_weight_s" ); > > > > where mc_weight_b = Xb1, Xb2, Xb3 (I merged the background samples into a > > single file > > mc_weight_s = Xs > > > > I assume the fact the samples have different number of events is taken > into > > account by TMVA directly. Is this correct? Also, is it better to really > > merge the samples into one or prodive TMVA with more than one background > > sample (should make no difference, but you never know). > > > > Cheers, > > > > Bernie > > > > ------------------------------------------------------------ > ------------------ > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > > _______________________________________________ > > TMVA-users mailing list > > TMV...@li... > > https://lists.sourceforge.net/lists/listinfo/tmva-users > > > |
From: Anna W. <ann...@de...> - 2017-01-24 17:26:45
|
Hi, Wishing you all the best for SLAS 2017 - 6th Annual Conference & Exhibition! I understand that you are one of the exhibitors in SLAS 2017 - 6th Annual Conference & Exhibition. Would you be interested in a SLAS Expo Attendees List? This includes complete contact details and verified email addresses of: - Scientist, Researcher, Engineer, Management, Sales, Marketing, Business Development, Academia - Faculty, Researcher, Academia - Post-Doctoral, Executive Management and many more across US/UK/CANADA and Europe. If you are interested please send me your target audience and geographical area, so that I can send counts, costs and few sample records for your review. Target Industry: _____________; (Any) Target Geography: ______________; (USA, UK, Canada, Aus & across the world) Target Job Title: _________________; (CEO, CFO, CMO, VP etc) Looking forward to continued success with you. Regards, Anna Woodson, Marketing Manager. If you do not wish to receive future emails from us, please reply as 'leave out' |
From: Helge V. <Hel...@ce...> - 2017-01-24 13:51:45
|
Hi Bernie, as always, there's not 'rule' what is correct, good is whatever gives the best classifier (which of course needs to be evaluated on the 'propper' background (signal) mix as expected in the data) but as a guideline: MC background samples of different sources should be weighted such that their 'integrated luminosity' is equal for the various samples. The actual NUMBER OF EVENTS doesn't matter .. (well it will somehow of course be a a function of the integrated lumi and cross section). This OBVIOUSLY cannot be taken care of by TMVA automatically, as it doesn't know about your MC production. The relative weighting of signal vs background is more a matter of 'trial and error', but a good start is 'weight them such that they have both end up with the same number of events (i.e. their respective sum of event weights). This can be done 'automatically if you choose "NormMode=EqualNumEvents" or "NormMode=NumEvents) in the Factory ... but I never remember how that treats possible preselection cuts... so best if you choose NormMode=None and do it by hand (using the weights you mentioned above) cheers, Helge On 20 January 2017 at 22:26, Bernie M <ber...@gm...> wrote: > Hi, > > I have a question on how the mc-weights are implemented in TMVA. I have a > few background samples and one signal sample. They all have different cross > section and number of events. > The mc-weight is not by default on any of these samples but I can add it in > by hand. My question is what is to be received by TMVA. As far as I > understand, I could do the following: > > Ignore the fact that the samples have different number of events and weight > by cross section, that is. Assuming I have background samples with cross > sections Xb1, Xb2, Xb3 with number of events Nb1, Nb2 and Nb3 and a signal > sample with cross section Xs and Ns number of events, I simply pass: > > factory->SetBackgroundWeightExpression( "mc_weight_b" ); > factory->SetSignalWeightExpression( "mc_weight_s" ); > > where mc_weight_b = Xb1, Xb2, Xb3 (I merged the background samples into a > single file > mc_weight_s = Xs > > I assume the fact the samples have different number of events is taken into > account by TMVA directly. Is this correct? Also, is it better to really > merge the samples into one or prodive TMVA with more than one background > sample (should make no difference, but you never know). > > Cheers, > > Bernie > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > TMVA-users mailing list > TMV...@li... > https://lists.sourceforge.net/lists/listinfo/tmva-users > |
From: Bernie M <ber...@gm...> - 2017-01-20 21:26:35
|
Hi, I have a question on how the mc-weights are implemented in TMVA. I have a few background samples and one signal sample. They all have different cross section and number of events. The mc-weight is not by default on any of these samples but I can add it in by hand. My question is what is to be received by TMVA. As far as I understand, I could do the following: Ignore the fact that the samples have different number of events and weight by cross section, that is. Assuming I have background samples with cross sections Xb1, Xb2, Xb3 with number of events Nb1, Nb2 and Nb3 and a signal sample with cross section Xs and Ns number of events, I simply pass: factory->SetBackgroundWeightExpression( "mc_weight_b" ); factory->SetSignalWeightExpression( "mc_weight_s" ); where mc_weight_b = Xb1, Xb2, Xb3 (I merged the background samples into a single file mc_weight_s = Xs I assume the fact the samples have different number of events is taken into account by TMVA directly. Is this correct? Also, is it better to really merge the samples into one or prodive TMVA with more than one background sample (should make no difference, but you never know). Cheers, Bernie |
From: matt <mat...@gm...> - 2017-01-20 18:00:07
|
Hi Matthias, (and building on what Joosep wrote) I think this is the figure you wish to view: http://gerardnico.com/wiki/_media/data_mining/model_complexity_error_training_test.jpg I recommend this text by Abu-Mostafa (which he also used for a free course on edX): www.amazon.com/Learning-Data-Yaser-S-Abu-Mostafa/dp/1600490069 I think your conceptual blunder is that you are viewing BDTs as a forest of *separate* trees. I think you are correct that if you had something like a random forest and were just averaging the results of all the trees, then yes the predictions would reach some stable fixed values for the validation set as the number of random trees increased. But that is not what boosting is doing. Boosting uses all previous trees to construct the next tree in the series. After a certain point, the learning algorithm will begin to "learn" the random noise that is particular to the training set, while the validation/test set has its own unique noise, and so the predictions for the validation set will begin to grow less accurate. Note: I have had to use as few as 10 trees in cases of extreme data insufficiency. BTW, I usually found gradient boosting to provide slightly better results than adaboost. Some notes on overtraining tests: 1) I went to the trouble to put all event weighting into my particle search data (and updated the default TMVA plots to use the full event weights) so that such plots achieved some of the cross validation that Joosep mentioned. (Note: this took some work, but I think it was worth it.) 2) I used Brown's method to combine four different tests for overtraining into a single p-value (I needed to automate the overtraining check to run a grid search for optimal hyper-parameters): KS, Anderson-Darling (AD), chi squared, and one I invented (the two-sample versions of these). This may be overkill for you, but there are other tests. The AD test is supposed to be more sensitive to the edges of the CDFs, which may be useful since BDTs often produce highly skewed distributions. But AD takes a long time to run, and I found KS and AD p-values to be highly correlated. Chi squared p-values were not as strongly correlated with KS and AD, so chi squared might be useful. 3) Overtraining is a serious problem is applications where there is only the *one* dataset. In PP, we have two, the simulated and the measured. We are somewhat immune from overtraining because we perform validation *after* machine learning using some of the simulated data by comparing the remaining simulated data to the measured data in control regions, i.e., checking for mismodeling of the measured data by the simulated data. Nevertheless, I often found that overtraining led to mismodeling (and thus sent me back to the beginning). So I would claim that avoiding overtraining is still very important in the interest of saving you time later on. You will hear some say that a little overtraining is useful; I suspect it depends on the application (i.e., how risk averse you are to having to go back and redo everything if there is data-Monte Carlo disagreement later on). Good luck! -matt On Fri, Jan 20, 2017 at 9:14 AM, Matthias Komm <Mat...@ce...> wrote: > Hello, > > I have a general question on TMVA BDTs using Ada boosting. I noticed > that for some training setups the KS-test value for the BDT output > between the training and testing samples improves when decreasing the > number of trees. How can this be? I would expect that a lower number of > trees leads to overtraining instead since one approaches the limit of a > single decision tree. Furthermore, for higher number of trees the > KS-test gets even worse. This is counterintuitive to me since I would > expect that training more trees stabilizes the BDT against overtraining. > Can someone explain a bit this feature? Especially, would you recommend > even to reduce the number of trees in such a case to improve the KS-test > (although I cannot believe how this can mitigate overtraining)? > > Cheers, > Matthias > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > TMVA-users mailing list > TMV...@li... > https://lists.sourceforge.net/lists/listinfo/tmva-users -- ----------------------------------------------------------- R. Matt Leone Department of Physics University of Arizona www.physics.arizona.edu/~leone/ ----------------------------------------------------------- |
From: Joosep P. <joo...@ce...> - 2017-01-20 16:44:09
|
Hi Matthias, Actually having more trees can overtrain more than having fewer, since every tree is also a boosting iteration. Just look at the evolution of the test and training loss functions as a function of the number of trees/iterations. It's not trivial in tmva but you can do it in xgboost or scikit learn. You will see that the training loss always decreases whereas he testing loss will start to increase at some point. Another topic is the validity of the ks test as an overtraining check, i think the default "blue vs red" check of tmva is definitely not enough, one should rather do a cross validation check and look at overall performance spread. Cheers Joosep > On 20 Jan 2017, at 17:32, Matthias Komm <Mat...@ce...> wrote: > > Hello, > > I have a general question on TMVA BDTs using Ada boosting. I noticed > that for some training setups the KS-test value for the BDT output > between the training and testing samples improves when decreasing the > number of trees. How can this be? I would expect that a lower number of > trees leads to overtraining instead since one approaches the limit of a > single decision tree. Furthermore, for higher number of trees the > KS-test gets even worse. This is counterintuitive to me since I would > expect that training more trees stabilizes the BDT against overtraining. > Can someone explain a bit this feature? Especially, would you recommend > even to reduce the number of trees in such a case to improve the KS-test > (although I cannot believe how this can mitigate overtraining)? > > Cheers, > Matthias > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > TMVA-users mailing list > TMV...@li... > https://lists.sourceforge.net/lists/listinfo/tmva-users |
From: Matthias K. <Mat...@ce...> - 2017-01-20 16:30:39
|
Hello, I have a general question on TMVA BDTs using Ada boosting. I noticed that for some training setups the KS-test value for the BDT output between the training and testing samples improves when decreasing the number of trees. How can this be? I would expect that a lower number of trees leads to overtraining instead since one approaches the limit of a single decision tree. Furthermore, for higher number of trees the KS-test gets even worse. This is counterintuitive to me since I would expect that training more trees stabilizes the BDT against overtraining. Can someone explain a bit this feature? Especially, would you recommend even to reduce the number of trees in such a case to improve the KS-test (although I cannot believe how this can mitigate overtraining)? Cheers, Matthias |
From: Maksymilian W. <mak...@ce...> - 2017-01-20 12:39:50
|
Dear experts, So I was trying to train several neural networks at once, but it seems that after training the first one (MLP_10) on around 39k events, all the next ones (MLP_11...MLP_20) are trained on only 3.9k events. I attach a screenshot of a training log. Regarding the code, I add training and testing events one by one after the selection process: if(isigA == 0 ) { if( aTrainOption ) { aTmvaFactory ->AddSignalTrainingEvent( m_TmvaTrainDVars, GetWeight()); } else { aTmvaFactory ->AddSignalTestEvent( m_TmvaTrainDVars, GetWeight()); } } else { if( aTrainOption ) { aTmvaFactory ->AddBackgroundTrainingEvent( m_TmvaTrainDVars, aLBG-> NominalBackgrEventWeight()); } else { aTmvaFactory ->AddBackgroundTestEvent( m_TmvaTrainDVars, aLBG-> NominalBackgrEventWeight()); } } (weights are equal to one in this case) and prepare the trees using: aTmvaFactory->PrepareTrainingAndTestTree( cuts,"nTrain_Signal=0:nTrain_Background=0:nTest_Signal=0:nTest_Background=0:SplitMode=Random:NormMode=NumEvents:V" ); the MLP's are booked in a for loop: for (int i = 0; i<10; ++i) { number += i; aTmvaFactory->BookMethod(TMVA::Types::kMLP,"MLP_1"+number,"!H:!V:ConvergenceImprove=1e-4:ConvergenceTests=20:TestRate=5:Sampling=0.1:SamplingEpoch=100:SamplingImportance=2:Tau=3:HiddenLayers=60,40:VarTransform= G,D,G,Norm:NCycles= 300 :NeuronType= sigmoid:TrainingMethod=BFGS :UseRegulator=False:EstimatorType=MSE:RandomSeed=0"); number=""; } Can you help me with increasing the number of events for other MLP's? |
From: Helge V. <Hel...@ce...> - 2017-01-11 18:24:11
|
Hi Marcin, oh.. sorry, although I'm very happy that finally CrossValidation got introduced, I'm sorry to say that this was after my active involvement and I don't have the time to figure out the details of that anymore ... hopefully someone else can answer this :) Cheers, Helge On 11 January 2017 at 17:37, Marcin Wolter <mar...@if...> wrote: > > Hi Helge, > thanks for your answer. Maybe you have an example script using > hyperparameter optimisation, so I could start from it? > > I was just thinking, maybe it has something to do with k-folding. I was > filling the input by adding individual events: > factory->AddSignalTrainingEvent > factory->AddSignalTestEvent > factory->AddBackgroundTrainingEvent > factory->AddBackgroundTestEvent > > so they were from the beginning divided between training/testing. Maybe > folding fails in this case? > > Also I have a question concerning CrossValidation. In the example > TMVACrossValidation.C the 5-folding cross-validation was used. In this > example there are five independent trainings performed and the areas under > the ROC curve are printed. This is fine to check the stability of our > training. But is it possible to store the results of each training or get > the final method averaged over the k-folding training? > > Thanks, > Marcin > > > > > On 01/11/2017 12:12 PM, Helge Voss wrote: >> >> Hi Marcin, >> >> I suspect that this is because the ROCIntegrals are calculated >> differently in the case of the 'optimizer' and >> the final analysis (sorry..that's 'historic'.. there's no good reason >> for that and should be changed..) >> That can generate difference for non-smooth mva histograms used for >> the calculation. The final ROC curve >> is nowadays calculated from really the individual events, while for >> simplicity the optimizer calculates it >> from the binned distributions. So obviously there are differences.. >> >> Qualitatively the things should however not really matter that much >> and I suspect that in terms of the real >> performance - within the statistical fluctuations - you should >> probably get similar performance for the setting >> you find 'best' compared to the one that 'optimizer' finds. You said >> you get 0.856 to 0.840 hmm.. >> >> Cheers, >> >> Helge >> >> >> On 11 January 2017 at 10:39, Marcin Wolter <mar...@ce...> wrote: >>> >>> Hi All, >>> >>> I am setting up the analysis using TMVA and would like to use the >>> hyperparameter optimization. To do that I have added in my code the >>> OptimizeAllMethods("ROCIntegral","FitGA") and then tried with >>> tmva_factory.OptimizeAllMethods("ROCIntegral","Minuit") >>> >>> [...] >>> >>> dataloader = TMVA.DataLoader('dataset') >>> >>> tmva_factory = TMVA.Factory("TMVAClassification", file_out, >>> >>> "Transformations=I;D;P;G,D:AnalysisType=Classification",) >>> [...] >>> if "BDT" in mlist: >>> tmva_factory.BookMethod(dataloader, TMVA.Types.kBDT, "BDT", >>> >>> >>> "!H:!V:NTrees=850:MinNodeSize=1%:MaxDepth=3:BoostType=AdaBoost:AdaBoostBeta=0.5:SeparationType=GiniIndex:nCuts=20:PruneMethod=NoPruning" >>> ) >>> >>> [...] >>> >>> tmva_factory.OptimizeAllMethods("ROCIntegral","FitGA") >>> # tmva_factory.OptimizeAllMethods("ROCIntegral","Minuit") >>> >>> >>> In both cases the code was doing a long analysis training BDT many times, >>> but ended up with a result far from optimal, in fact much worse than >>> after >>> training with default parameters from the BookMethod calls (both for >>> Minuit >>> and FitGA). >>> >>> For the default parameters the ROC integral is 0.856, after optimization >>> I >>> got: >>> >>> FitGA >>> >>> ROC integral = 0.840 >>> >>> NTrees=10 >>> >>> MinNodeSize=7 >>> >>> MaxDepth=4 >>> >>> AdaBoostBeta=0.2 >>> >>> >>> >>> Minuit >>> >>> : Optimize method: BDT for Classification >>> : the following BDT parameters will be tuned >>> on >>> the respective *grid* >>> : >>> <WARNING> : AdaBoostBeta >>> : | 0.2 || 0.4 || 0.6 || 0.8 || 1 | >>> <WARNING> : MaxDepth >>> : | 2 || 3 || 4 | >>> <WARNING> : MinNodeSize >>> : | 1 || 1.12444 || 1.26436 || 1.42169 || >>> 1.5986 || >>> 1.79753 || 2.02121 || 2.27272 || 2.55553 || 2.87354 || 3.23111 || 3.63318 >>> || >>> 4.08529 || 4.59365 || 5.16527 || 5.80802 || 6.53076 || 7.34343 || 8.25722 >>> || >>> 9.28473 || 10.4401 || 11.7392 || 13.2 || 14.8426 || 16.6896 || 18.7664 || >>> 21.1016 || 23.7274 || 26.68 || 30 | >>> <WARNING> : NTrees >>> : | 10 || 257.5 || 505 || 752.5 || 1000 | >>> : Automatic optimisation of tuning parameters >>> in >>> BDT uses: >>> : AdaBoostBeta in range from: 0.2 to: 1 in : 5 >>> steps >>> : MaxDepth in range from: 2 to: 4 in : 3 steps >>> : MinNodeSize in range from: 1 to: 30 in : 30 >>> steps >>> : NTrees in range from: 10 to: 1000 in : 5 >>> steps >>> : using the options: ROCIntegral and Minuit >>> >>> >>> : For BDT the optimized Parameters are: >>> : AdaBoostBeta = 0.200513 >>> : MaxDepth = 2.49231 >>> : MinNodeSize = 4.06703 >>> : NTrees = 10.001 >>> : Optimization of tuning paremters finished for >>> Method:BDT >>> >>> >>> And the ROC integral obtained finally is 0.840, so is worse than for the >>> default parameters. >>> >>> >>> Probably I didn't set up properly the tuning. What should I do? >>> >>> Thanks, >>> >>> Marcin Wolter >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Developer Access Program for Intel Xeon Phi Processors >>> Access to Intel Xeon Phi processor-based developer platforms. >>> With one year of Intel Parallel Studio XE. >>> Training and support from Colfax. >>> Order your platform today. http://sdm.link/xeonphi >>> _______________________________________________ >>> TMVA-users mailing list >>> TMV...@li... >>> https://lists.sourceforge.net/lists/listinfo/tmva-users >>> > |
From: Marcin W. <mar...@if...> - 2017-01-11 16:53:47
|
Hi Helge, thanks for your answer. Maybe you have an example script using hyperparameter optimisation, so I could start from it? I was just thinking, maybe it has something to do with k-folding. I was filling the input by adding individual events: factory->AddSignalTrainingEvent factory->AddSignalTestEvent factory->AddBackgroundTrainingEvent factory->AddBackgroundTestEvent so they were from the beginning divided between training/testing. Maybe folding fails in this case? Also I have a question concerning CrossValidation. In the example TMVACrossValidation.C the 5-folding cross-validation was used. In this example there are five independent trainings performed and the areas under the ROC curve are printed. This is fine to check the stability of our training. But is it possible to store the results of each training or get the final method averaged over the k-folding training? Thanks, Marcin On 01/11/2017 12:12 PM, Helge Voss wrote: > Hi Marcin, > > I suspect that this is because the ROCIntegrals are calculated > differently in the case of the 'optimizer' and > the final analysis (sorry..that's 'historic'.. there's no good reason > for that and should be changed..) > That can generate difference for non-smooth mva histograms used for > the calculation. The final ROC curve > is nowadays calculated from really the individual events, while for > simplicity the optimizer calculates it > from the binned distributions. So obviously there are differences.. > > Qualitatively the things should however not really matter that much > and I suspect that in terms of the real > performance - within the statistical fluctuations - you should > probably get similar performance for the setting > you find 'best' compared to the one that 'optimizer' finds. You said > you get 0.856 to 0.840 hmm.. > > Cheers, > > Helge > > > On 11 January 2017 at 10:39, Marcin Wolter <mar...@ce...> wrote: >> Hi All, >> >> I am setting up the analysis using TMVA and would like to use the >> hyperparameter optimization. To do that I have added in my code the >> OptimizeAllMethods("ROCIntegral","FitGA") and then tried with >> tmva_factory.OptimizeAllMethods("ROCIntegral","Minuit") >> >> [...] >> >> dataloader = TMVA.DataLoader('dataset') >> >> tmva_factory = TMVA.Factory("TMVAClassification", file_out, >> >> "Transformations=I;D;P;G,D:AnalysisType=Classification",) >> [...] >> if "BDT" in mlist: >> tmva_factory.BookMethod(dataloader, TMVA.Types.kBDT, "BDT", >> >> "!H:!V:NTrees=850:MinNodeSize=1%:MaxDepth=3:BoostType=AdaBoost:AdaBoostBeta=0.5:SeparationType=GiniIndex:nCuts=20:PruneMethod=NoPruning" >> ) >> >> [...] >> >> tmva_factory.OptimizeAllMethods("ROCIntegral","FitGA") >> # tmva_factory.OptimizeAllMethods("ROCIntegral","Minuit") >> >> >> In both cases the code was doing a long analysis training BDT many times, >> but ended up with a result far from optimal, in fact much worse than after >> training with default parameters from the BookMethod calls (both for Minuit >> and FitGA). >> >> For the default parameters the ROC integral is 0.856, after optimization I >> got: >> >> FitGA >> >> ROC integral = 0.840 >> >> NTrees=10 >> >> MinNodeSize=7 >> >> MaxDepth=4 >> >> AdaBoostBeta=0.2 >> >> >> >> Minuit >> >> : Optimize method: BDT for Classification >> : the following BDT parameters will be tuned on >> the respective *grid* >> : >> <WARNING> : AdaBoostBeta >> : | 0.2 || 0.4 || 0.6 || 0.8 || 1 | >> <WARNING> : MaxDepth >> : | 2 || 3 || 4 | >> <WARNING> : MinNodeSize >> : | 1 || 1.12444 || 1.26436 || 1.42169 || 1.5986 || >> 1.79753 || 2.02121 || 2.27272 || 2.55553 || 2.87354 || 3.23111 || 3.63318 || >> 4.08529 || 4.59365 || 5.16527 || 5.80802 || 6.53076 || 7.34343 || 8.25722 || >> 9.28473 || 10.4401 || 11.7392 || 13.2 || 14.8426 || 16.6896 || 18.7664 || >> 21.1016 || 23.7274 || 26.68 || 30 | >> <WARNING> : NTrees >> : | 10 || 257.5 || 505 || 752.5 || 1000 | >> : Automatic optimisation of tuning parameters in >> BDT uses: >> : AdaBoostBeta in range from: 0.2 to: 1 in : 5 >> steps >> : MaxDepth in range from: 2 to: 4 in : 3 steps >> : MinNodeSize in range from: 1 to: 30 in : 30 steps >> : NTrees in range from: 10 to: 1000 in : 5 steps >> : using the options: ROCIntegral and Minuit >> >> >> : For BDT the optimized Parameters are: >> : AdaBoostBeta = 0.200513 >> : MaxDepth = 2.49231 >> : MinNodeSize = 4.06703 >> : NTrees = 10.001 >> : Optimization of tuning paremters finished for >> Method:BDT >> >> >> And the ROC integral obtained finally is 0.840, so is worse than for the >> default parameters. >> >> >> Probably I didn't set up properly the tuning. What should I do? >> >> Thanks, >> >> Marcin Wolter >> >> >> >> ------------------------------------------------------------------------------ >> Developer Access Program for Intel Xeon Phi Processors >> Access to Intel Xeon Phi processor-based developer platforms. >> With one year of Intel Parallel Studio XE. >> Training and support from Colfax. >> Order your platform today. http://sdm.link/xeonphi >> _______________________________________________ >> TMVA-users mailing list >> TMV...@li... >> https://lists.sourceforge.net/lists/listinfo/tmva-users >> |
From: Helge V. <Hel...@ce...> - 2017-01-11 11:12:31
|
Hi Marcin, I suspect that this is because the ROCIntegrals are calculated differently in the case of the 'optimizer' and the final analysis (sorry..that's 'historic'.. there's no good reason for that and should be changed..) That can generate difference for non-smooth mva histograms used for the calculation. The final ROC curve is nowadays calculated from really the individual events, while for simplicity the optimizer calculates it from the binned distributions. So obviously there are differences.. Qualitatively the things should however not really matter that much and I suspect that in terms of the real performance - within the statistical fluctuations - you should probably get similar performance for the setting you find 'best' compared to the one that 'optimizer' finds. You said you get 0.856 to 0.840 hmm.. Cheers, Helge On 11 January 2017 at 10:39, Marcin Wolter <mar...@ce...> wrote: > > Hi All, > > I am setting up the analysis using TMVA and would like to use the > hyperparameter optimization. To do that I have added in my code the > OptimizeAllMethods("ROCIntegral","FitGA") and then tried with > tmva_factory.OptimizeAllMethods("ROCIntegral","Minuit") > > [...] > > dataloader = TMVA.DataLoader('dataset') > > tmva_factory = TMVA.Factory("TMVAClassification", file_out, > > "Transformations=I;D;P;G,D:AnalysisType=Classification",) > [...] > if "BDT" in mlist: > tmva_factory.BookMethod(dataloader, TMVA.Types.kBDT, "BDT", > > "!H:!V:NTrees=850:MinNodeSize=1%:MaxDepth=3:BoostType=AdaBoost:AdaBoostBeta=0.5:SeparationType=GiniIndex:nCuts=20:PruneMethod=NoPruning" > ) > > [...] > > tmva_factory.OptimizeAllMethods("ROCIntegral","FitGA") > # tmva_factory.OptimizeAllMethods("ROCIntegral","Minuit") > > > In both cases the code was doing a long analysis training BDT many times, > but ended up with a result far from optimal, in fact much worse than after > training with default parameters from the BookMethod calls (both for Minuit > and FitGA). > > For the default parameters the ROC integral is 0.856, after optimization I > got: > > FitGA > > ROC integral = 0.840 > > NTrees=10 > > MinNodeSize=7 > > MaxDepth=4 > > AdaBoostBeta=0.2 > > > > Minuit > > : Optimize method: BDT for Classification > : the following BDT parameters will be tuned on > the respective *grid* > : > <WARNING> : AdaBoostBeta > : | 0.2 || 0.4 || 0.6 || 0.8 || 1 | > <WARNING> : MaxDepth > : | 2 || 3 || 4 | > <WARNING> : MinNodeSize > : | 1 || 1.12444 || 1.26436 || 1.42169 || 1.5986 || > 1.79753 || 2.02121 || 2.27272 || 2.55553 || 2.87354 || 3.23111 || 3.63318 || > 4.08529 || 4.59365 || 5.16527 || 5.80802 || 6.53076 || 7.34343 || 8.25722 || > 9.28473 || 10.4401 || 11.7392 || 13.2 || 14.8426 || 16.6896 || 18.7664 || > 21.1016 || 23.7274 || 26.68 || 30 | > <WARNING> : NTrees > : | 10 || 257.5 || 505 || 752.5 || 1000 | > : Automatic optimisation of tuning parameters in > BDT uses: > : AdaBoostBeta in range from: 0.2 to: 1 in : 5 > steps > : MaxDepth in range from: 2 to: 4 in : 3 steps > : MinNodeSize in range from: 1 to: 30 in : 30 steps > : NTrees in range from: 10 to: 1000 in : 5 steps > : using the options: ROCIntegral and Minuit > > > : For BDT the optimized Parameters are: > : AdaBoostBeta = 0.200513 > : MaxDepth = 2.49231 > : MinNodeSize = 4.06703 > : NTrees = 10.001 > : Optimization of tuning paremters finished for > Method:BDT > > > And the ROC integral obtained finally is 0.840, so is worse than for the > default parameters. > > > Probably I didn't set up properly the tuning. What should I do? > > Thanks, > > Marcin Wolter > > > > ------------------------------------------------------------------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > TMVA-users mailing list > TMV...@li... > https://lists.sourceforge.net/lists/listinfo/tmva-users > |
From: Marcin W. <mar...@ce...> - 2017-01-11 09:54:30
|
Hi All, I am setting up the analysis using TMVA and would like to use the hyperparameter optimization. To do that I have added in my code the OptimizeAllMethods("ROCIntegral","FitGA") and then tried with tmva_factory.OptimizeAllMethods("ROCIntegral","Minuit") [...] dataloader = TMVA.DataLoader('dataset') tmva_factory = TMVA.Factory("TMVAClassification", file_out, "Transformations=I;D;P;G,D:AnalysisType=Classification",) [...] if "BDT" in mlist: tmva_factory.BookMethod(dataloader, TMVA.Types.kBDT, "BDT", "!H:!V:NTrees=850:MinNodeSize=1%:MaxDepth=3:BoostType=AdaBoost:AdaBoostBeta=0.5:SeparationType=GiniIndex:nCuts=20:PruneMethod=NoPruning" ) [...] tmva_factory.OptimizeAllMethods("ROCIntegral","FitGA") # tmva_factory.OptimizeAllMethods("ROCIntegral","Minuit") In both cases the code was doing a long analysis training BDT many times, but ended up with a result far from optimal, in fact much worse than after training with default parameters from the BookMethod calls (both for Minuit and FitGA). For the default parameters the ROC integral is 0.856, after optimization I got: FitGA ROC integral = 0.840 NTrees=10 MinNodeSize=7 MaxDepth=4 AdaBoostBeta=0.2 Minuit : Optimize method: BDT for Classification : the following BDT parameters will be tuned on the respective *grid* : <WARNING> : AdaBoostBeta : | 0.2 || 0.4 || 0.6 || 0.8 || 1 | <WARNING> : MaxDepth : | 2 || 3 || 4 | <WARNING> : MinNodeSize : | 1 || 1.12444 || 1.26436 || 1.42169 || 1.5986 || 1.79753 || 2.02121 || 2.27272 || 2.55553 || 2.87354 || 3.23111 || 3.63318 || 4.08529 || 4.59365 || 5.16527 || 5.80802 || 6.53076 || 7.34343 || 8.25722 || 9.28473 || 10.4401 || 11.7392 || 13.2 || 14.8426 || 16.6896 || 18.7664 || 21.1016 || 23.7274 || 26.68 || 30 | <WARNING> : NTrees : | 10 || 257.5 || 505 || 752.5 || 1000 | : Automatic optimisation of tuning parameters in BDT uses: : AdaBoostBeta in range from: 0.2 to: 1 in : 5 steps : MaxDepth in range from: 2 to: 4 in : 3 steps : MinNodeSize in range from: 1 to: 30 in : 30 steps : NTrees in range from: 10 to: 1000 in : 5 steps : using the options: ROCIntegral and Minuit : For BDT the optimized Parameters are: : AdaBoostBeta = 0.200513 : MaxDepth = 2.49231 : MinNodeSize = 4.06703 : NTrees = 10.001 : Optimization of tuning paremters finished for Method:BDT And the ROC integral obtained finally is 0.840, so is worse than for the default parameters. Probably I didn't set up properly the tuning. What should I do? Thanks, Marcin Wolter |
From: Helge V. <Hel...@ce...> - 2016-12-18 13:12:42
|
Hi Divaya, now I got it - thanks to Thomas' hint/links to 'one-class-classification' - but also just as Thomas already said, something like this is not implemented in TMVA, and I don't see how one could even modify the algorithms present to achieve something similar. They are all relying on being able to discriminate something particular against something particuluar else.. Maybe one could use some of the 'generative algorithms' which basicaly 'learn' the pdf of 'signal' and 'background' , (i.e. Likelihood, PDERS) and which could easily learn also 'just the pdf of the background', and then discriminate aginst 'all other pdf's - perhaps something flat' rather than the particular 'signal pdf' ... just thinking. But you'd probably be better off by using something that is actually meant for that purpose, I have to admit though that I never read about 'one-class-classification' and how that's typically done. CHeers, Helge On 18 December 2016 at 12:14, Divya D Nair <div...@gm...> wrote: > Dear Helge, > > First of all thanks for the reply. > > I was planning to do the following. > > 1. Make the algorithm to learn the background. > > 2. Then take the weights and use it to separate the same background from an > 'input file of unknown composition'. I mean a 'signal+background' file. > > Is it possible with TMVA? > > On Sun, Dec 18, 2016 at 4:28 PM, Helge Voss <Hel...@ce...> wrote: >> >> Hi Divya, >> >> hmm.. maybe I don't understand what you are doing, but in a >> 'classification problem', if you would train on 'background only' the >> classification is very simple: >> the MVA classifier would be very simple.. the algorithm would only >> learn: Every event you show it is a background and it's output simply >> a constant '1' no matter >> what the actual event variables are. The corresponding weightfile >> would be equally boring :) >> >> guess you need to explain a bit more what you really mean :) >> >> Helge >> >> >> On 18 December 2016 at 10:53, Divya D Nair <div...@gm...> wrote: >> > Dear Experts, >> > >> > I have been using TMVA for solving a classification problem and this >> > question arised in my mind . Is it possible to train the algorithm with >> > only >> > 'background' and get the weight files ? >> > -- >> > >> > Thanks and Regards >> > Divya >> > >> > ----Don't be serious, be simple and sincere------ >> > >> > >> > >> > >> > ------------------------------------------------------------------------------ >> > Check out the vibrant tech community on one of the world's most >> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot >> > _______________________________________________ >> > TMVA-users mailing list >> > TMV...@li... >> > https://lists.sourceforge.net/lists/listinfo/tmva-users >> > > > > > > -- > > Thanks and Regards > Divya > > ----Don't be serious, be simple and sincere------ > > |
From: Thomas K. <t....@on...> - 2016-12-18 11:53:10
|
Hello Divya, you probably mean one-class classification: https://en.wikipedia.org/wiki/One-class_classification I guess you can find more information using this keyword. As far as I know this is not (yet) possible in TMVA. If I remember correctly there one of the IML talks in July mentioned using this kind of classification: https://indico.cern.ch/event/532992/ Best regards, Thomas On 18.12.2016 11:58, Helge Voss wrote: > Hi Divya, > > hmm.. maybe I don't understand what you are doing, but in a > 'classification problem', if you would train on 'background only' the > classification is very simple: > the MVA classifier would be very simple.. the algorithm would only > learn: Every event you show it is a background and it's output simply > a constant '1' no matter > what the actual event variables are. The corresponding weightfile > would be equally boring :) > > guess you need to explain a bit more what you really mean :) > > Helge > > > On 18 December 2016 at 10:53, Divya D Nair <div...@gm...> wrote: >> Dear Experts, >> >> I have been using TMVA for solving a classification problem and this >> question arised in my mind . Is it possible to train the algorithm with only >> 'background' and get the weight files ? >> -- >> >> Thanks and Regards >> Divya >> >> ----Don't be serious, be simple and sincere------ >> >> >> >> ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, SlashDot.org! http://sdm.link/slashdot >> _______________________________________________ >> TMVA-users mailing list >> TMV...@li... >> https://lists.sourceforge.net/lists/listinfo/tmva-users >> > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > TMVA-users mailing list > TMV...@li... > https://lists.sourceforge.net/lists/listinfo/tmva-users |
From: Divya D N. <div...@gm...> - 2016-12-18 11:34:27
|
Dear Thomas, Yes, 'one class classification' is what I was thinking. Thanks for the reply and links. On Sun, Dec 18, 2016 at 4:39 PM, Thomas Keck <t....@on...> wrote: > Hello Divya, > > you probably mean one-class classification: > https://en.wikipedia.org/wiki/One-class_classification > > I guess you can find more information using this keyword. > > As far as I know this is not (yet) possible in TMVA. > If I remember correctly there one of the IML talks in July mentioned > using this kind of classification: > https://indico.cern.ch/event/532992/ > > Best regards, > Thomas > > On 18.12.2016 11:58, Helge Voss wrote: > > Hi Divya, > > > > hmm.. maybe I don't understand what you are doing, but in a > > 'classification problem', if you would train on 'background only' the > > classification is very simple: > > the MVA classifier would be very simple.. the algorithm would only > > learn: Every event you show it is a background and it's output simply > > a constant '1' no matter > > what the actual event variables are. The corresponding weightfile > > would be equally boring :) > > > > guess you need to explain a bit more what you really mean :) > > > > Helge > > > > > > On 18 December 2016 at 10:53, Divya D Nair <div...@gm...> wrote: > >> Dear Experts, > >> > >> I have been using TMVA for solving a classification problem and this > >> question arised in my mind . Is it possible to train the algorithm with > only > >> 'background' and get the weight files ? > >> -- > >> > >> Thanks and Regards > >> Divya > >> > >> ----Don't be serious, be simple and sincere------ > >> > >> > >> > >> ------------------------------------------------------------ > ------------------ > >> Check out the vibrant tech community on one of the world's most > >> engaging tech sites, SlashDot.org! http://sdm.link/slashdot > >> _______________________________________________ > >> TMVA-users mailing list > >> TMV...@li... > >> https://lists.sourceforge.net/lists/listinfo/tmva-users > >> > > ------------------------------------------------------------ > ------------------ > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > > _______________________________________________ > > TMVA-users mailing list > > TMV...@li... > > https://lists.sourceforge.net/lists/listinfo/tmva-users > > > > -- Thanks and Regards Divya ----Don't be serious, be simple and sincere------ |
From: Divya D N. <div...@gm...> - 2016-12-18 11:15:00
|
Dear Helge, First of all thanks for the reply. I was planning to do the following. 1. Make the algorithm to learn the background. 2. Then take the weights and use it to separate the same background from an 'input file of unknown composition'. I mean a 'signal+background' file. Is it possible with TMVA? On Sun, Dec 18, 2016 at 4:28 PM, Helge Voss <Hel...@ce...> wrote: > Hi Divya, > > hmm.. maybe I don't understand what you are doing, but in a > 'classification problem', if you would train on 'background only' the > classification is very simple: > the MVA classifier would be very simple.. the algorithm would only > learn: Every event you show it is a background and it's output simply > a constant '1' no matter > what the actual event variables are. The corresponding weightfile > would be equally boring :) > > guess you need to explain a bit more what you really mean :) > > Helge > > > On 18 December 2016 at 10:53, Divya D Nair <div...@gm...> wrote: > > Dear Experts, > > > > I have been using TMVA for solving a classification problem and this > > question arised in my mind . Is it possible to train the algorithm with > only > > 'background' and get the weight files ? > > -- > > > > Thanks and Regards > > Divya > > > > ----Don't be serious, be simple and sincere------ > > > > > > > > ------------------------------------------------------------ > ------------------ > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > > _______________________________________________ > > TMVA-users mailing list > > TMV...@li... > > https://lists.sourceforge.net/lists/listinfo/tmva-users > > > -- Thanks and Regards Divya ----Don't be serious, be simple and sincere------ |
From: Helge V. <Hel...@ce...> - 2016-12-18 10:58:29
|
Hi Divya, hmm.. maybe I don't understand what you are doing, but in a 'classification problem', if you would train on 'background only' the classification is very simple: the MVA classifier would be very simple.. the algorithm would only learn: Every event you show it is a background and it's output simply a constant '1' no matter what the actual event variables are. The corresponding weightfile would be equally boring :) guess you need to explain a bit more what you really mean :) Helge On 18 December 2016 at 10:53, Divya D Nair <div...@gm...> wrote: > Dear Experts, > > I have been using TMVA for solving a classification problem and this > question arised in my mind . Is it possible to train the algorithm with only > 'background' and get the weight files ? > -- > > Thanks and Regards > Divya > > ----Don't be serious, be simple and sincere------ > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > TMVA-users mailing list > TMV...@li... > https://lists.sourceforge.net/lists/listinfo/tmva-users > |
From: Divya D N. <div...@gm...> - 2016-12-18 09:53:21
|
Dear Experts, I have been using TMVA for solving a classification problem and this question arised in my mind . Is it possible to train the algorithm with only 'background' and get the weight files ? -- Thanks and Regards Divya ----Don't be serious, be simple and sincere------ |
From: Helge V. <Hel...@ce...> - 2016-12-13 13:47:00
|
Hi, > Is it too hard to save the histograms as tree-leaf? well we simply cannot do 'everything' everyone ever possibly would like .. > Also, TestTree and TrainTree include all variables distribution, how can I understand they are from background or signal tree? I couldn’t figure it out The trees contain a variable (I think it is called 'type') which is 0 for signal and 1 for backgr (or the other way round) > Other question is: I run TMVAClassificationApplication.C and obtained mva-methods for signal and background. > Is it possible to have these histograms for all variables (such as var1_MVA_likelihood, var2_MVA_LD)? guess you need to think about 'what is a multivariate analysis output' .. or I didn't understand your quesiton. the MVA (i.e. the LD or likelihood output) is the classifier output built from all your input variables, hence asking for var1_MVA_likelihood is simply 'nonsense' :) Helge > > > thanks, > ilknur > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > TMVA-users mailing list > TMV...@li... > https://lists.sourceforge.net/lists/listinfo/tmva-users |
From: Ilknur K. <ilk...@gm...> - 2016-12-13 13:31:33
|
Dear Experts, I run the TMVAClassification.C and obtained 'weights’ folder and TMVA.root. There are histograms and two trees in In TMVA.root as TestTree and TrainTree. For example Method_Likelihood has many histograms and I want to obtain these histograms as tree format. Is it too hard to save the histograms as tree-leaf? Also, TestTree and TrainTree include all variables distribution, how can I understand they are from background or signal tree? I couldn’t figure it out Other question is: I run TMVAClassificationApplication.C and obtained mva-methods for signal and background. Is it possible to have these histograms for all variables (such as var1_MVA_likelihood, var2_MVA_LD)? thanks, ilknur |
From: Pikounis <pik...@in...> - 2016-12-05 15:50:38
|
Dear experts, I face a problem with the raking of my variables in a BDT and any help would be very much appreciated! In some of the configurations I tried this is what I see [1] and I would like to understand why I get "nan". Please note that all the variables are doubles except but the: Border_accepted, Sec_str_accepted, Extra_new_accepted are having a boolean behavior, so they are either 1, or 0. The configuration of this BDT is [2]. thank you very much in advance, Kostas [1] --- BDT3 : Ranking result (top variable is best ranked) --- BDT3 : ---------------------------------------------------- --- BDT3 : Rank : Variable : Variable Importance --- BDT3 : ---------------------------------------------------- --- BDT3 : 1 : Border_accepted : -nan --- BDT3 : 2 : Sec_str_accepted : -nan --- BDT3 : 3 : Extra_new_accepted : -nan --- BDT3 : 4 : Oms3 : -nan --- BDT3 : 5 : Oms3_in_coinc : -nan --- BDT3 : 6 : Oms3_with_angular : -nan --- BDT3 : 7 : Tot_beh_vtx : -nan --- BDT3 : 8 : Oms3_beh_vtx : -nan --- BDT3 : 9 : Ratio_tot : -nan --- BDT3 : 10 : Ratio_pulses : -nan --- BDT3 : ---------------------------------------------------- [2] factory->BookMethod( TMVA::Types::kBDT, "BDT3", "!H:!V:NTrees=900:MinNodeSize=0.5%:MaxDepth=4:BoostType=AdaBoost:AdaBoostBeta=0.5:UseBaggedBoost:BaggedSampleFraction=0.1:SeparationType=GiniIndex:nCuts=18" ); |
From: Duong N. <nhd...@gm...> - 2016-11-19 02:58:51
|
Dear Developer, It seems to me that TMVA can not handle correctly the variables which are indexed by another array (leaf) of the tree. For example, in the tree I have an array which is the collection of pT of all jets in an event, Jet_pt. I also have another array in the tree which stores the indexes of jets passing b-tag requirement, hJCidx. Therefore, I provide the variables like this Jet_pt[hJCidx[0]] Jet_pt[hJCidx[1]] However, I see that the two variables are exactly the same in TMVA. You can see in (*). I also check the distribution in the TMVA output and they are exactly the same. I tried to plot those variable directly in ROOT using TTree::Draw and they are two different distributions. I am not sure how TMVA parse the variable expressions but if it is the same as TTree::Draw, thing should works for TMVA as well. Can it possible to use a variable as an instance of an array which is indexed by another instance of another array (above example)? Thank you, Duong (*) --- Factory : --- Factory : current transformation string: 'I' --- Factory : Create Transformation "I" with events from all classes. --- Id : Transformation, Variable selection : --- Id : Input : variable 'Jet_pt[hJCidx[0]]' (index=0). <---> Output : variable 'Jet_pt[hJCidx[0]]' (index=0). --- Id : Input : variable 'Jet_pt[hJCidx[1]]' (index=1). <---> Output : variable 'Jet_pt[hJCidx[1]]' (index=1). --- Id : Preparing the Identity transformation... --- TFHandler_Factory : -------------------------------------------------------------------------------------------------------- --- TFHandler_Factory : Variable Mean RMS [ Min Max ] --- TFHandler_Factory : -------------------------------------------------------------------------------------------------------- --- TFHandler_Factory : Jet_pt[hJCidx[0]]: 193.83 83.860 [ 45.900 903.49 ] --- TFHandler_Factory : Jet_pt[hJCidx[1]]: 193.83 83.860 [ 45.900 903.49 ] --- TFHandler_Factory : -------------------------------------------------------------------------------------------------------- |
From: Betty C. <bet...@ce...> - 2016-11-03 14:37:19
|
Dear experts, I read the TMVAMulticlass.C tutorial but I need more precision, I wonder: - if I can use all the TMVA::Factory method in the TMVAMulticlass, in order to merge to signal samples could I use[1]? if not, how I can merge 2 signals sample? [1] factory->AddSignalTree(s1, weight1); factory->AddSignalTree(s2, weight2); Regards |
From: Betty C. <bet...@ce...> - 2016-11-01 08:01:12
|
Dear experts, when I run the TMVAMulticlass.C, I got the error message [1] after I open the TMVAMultiClassGui to draw the "input variables". Noticed that the other buttons (input variable correlation, test and trainning plot...) work fine. The tainning log file is here [2], do you see what is wrong? The code I used is [3]. Regards [1] ERROR!!! couldn't find pt histogram for class Signal [2] http://calpas.web.cern.ch/calpas/log_multiclass [3] http://calpas.web.cern.ch/calpas/TMVAMulticlass.C -- Cheers, Betty |