You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
(1) |
Apr
(4) |
May
(1) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
(2) |
Apr
(10) |
May
(1) |
Jun
(13) |
Jul
(69) |
Aug
(40) |
Sep
(45) |
Oct
(21) |
Nov
(15) |
Dec
(2) |
2008 |
Jan
(44) |
Feb
(21) |
Mar
(28) |
Apr
(33) |
May
(35) |
Jun
(16) |
Jul
(12) |
Aug
(29) |
Sep
(12) |
Oct
(24) |
Nov
(36) |
Dec
(22) |
2009 |
Jan
(25) |
Feb
(19) |
Mar
(47) |
Apr
(23) |
May
(39) |
Jun
(14) |
Jul
(33) |
Aug
(12) |
Sep
(31) |
Oct
(31) |
Nov
(19) |
Dec
(13) |
2010 |
Jan
(7) |
Feb
(27) |
Mar
(26) |
Apr
(17) |
May
(10) |
Jun
(11) |
Jul
(17) |
Aug
(20) |
Sep
(31) |
Oct
(13) |
Nov
(19) |
Dec
(6) |
2011 |
Jan
(13) |
Feb
(17) |
Mar
(36) |
Apr
(19) |
May
(4) |
Jun
(14) |
Jul
(24) |
Aug
(22) |
Sep
(47) |
Oct
(35) |
Nov
(24) |
Dec
(18) |
2012 |
Jan
(28) |
Feb
(19) |
Mar
(23) |
Apr
(36) |
May
(27) |
Jun
(39) |
Jul
(29) |
Aug
(23) |
Sep
(17) |
Oct
(36) |
Nov
(60) |
Dec
(28) |
2013 |
Jan
(34) |
Feb
(23) |
Mar
(44) |
Apr
(39) |
May
(89) |
Jun
(55) |
Jul
(31) |
Aug
(47) |
Sep
(6) |
Oct
(21) |
Nov
(21) |
Dec
(10) |
2014 |
Jan
(19) |
Feb
(32) |
Mar
(11) |
Apr
(33) |
May
(22) |
Jun
(7) |
Jul
(16) |
Aug
(4) |
Sep
(20) |
Oct
(17) |
Nov
(12) |
Dec
(6) |
2015 |
Jan
(9) |
Feb
(7) |
Mar
(16) |
Apr
(5) |
May
(13) |
Jun
(27) |
Jul
(25) |
Aug
(11) |
Sep
(10) |
Oct
(7) |
Nov
(47) |
Dec
(2) |
2016 |
Jan
(9) |
Feb
(2) |
Mar
(4) |
Apr
(18) |
May
(2) |
Jun
(8) |
Jul
|
Aug
(27) |
Sep
(47) |
Oct
(28) |
Nov
(3) |
Dec
(9) |
2017 |
Jan
(11) |
Feb
(23) |
Mar
(7) |
Apr
(7) |
May
(20) |
Jun
|
Jul
(6) |
Aug
(1) |
Sep
|
Oct
(3) |
Nov
(11) |
Dec
(8) |
2018 |
Jan
(9) |
Feb
(8) |
Mar
(2) |
Apr
(2) |
May
(2) |
Jun
|
Jul
(2) |
Aug
(1) |
Sep
(2) |
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(2) |
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
From: Helge V. <Hel...@ce...> - 2017-03-09 12:42:40
|
Hi, in the log file you find a printout of 'sum of weights' as they are seen by TMVA, there you can most easily check if you got the 'correct' factor (i.e. by seeing that the sum of the signal weights is 'about' the same as the sum of the background weights. >Nev_back1*W1 + Nev_back2*W2 + Nev_back3*W3 = Nev_sig*Weight_signal this formula seems 'right' but I don't see how this translate into: Weight_signal = (Xsec_back1 + Xsec_back_2 + Xsec_back3)/Nsignal_ events Cheers, Helge On 9 March 2017 at 13:15, Ben Smith <ben...@gm...> wrote: > Hi Helge, > > OK! I tried all sorts of ways to normalize signal and background and > indeed, doing NormMode = None and normalizing signal by hand and using > NormMode=EqualNumEvents, seems to be *very* different.... So, I wonder if > I'm doing the normalization of the signal by hand correctly. > > You said "I suggested to weight the signal sample with an overall > constant factor such that the total number of sum_of_weights for the > signal sample is equal to the sum over background events" > > So, I did the following: > > Nev_back1*W1 + Nev_back2*W2 + Nev_back3*W3 = Nev_sig*Weight_signal > > So > > Weight_signal = (Xsec_back1 + Xsec_back_2 + Xsec_back3)/Nsignal_ events > > In other words, the signal weight "by hand" should be the sum of all > background cross sections divided by the final number of events in the > signal sample that is passed to TMVA. > > Did I get that right? > > Thank you, > > Ben > > > > I > > > > > > On Sun, Feb 26, 2017 at 9:05 PM, Helge Voss <Hel...@ce...> wrote: >> >> Hi, >> >> yes exactly, NormMode=EqualNumEvents would take car of the bit that >> normalizes 'Signal' to 'background'. The relative weighting of the >> various >> background samples you still have to do yourself though. And as I >> never remember how possible preselection cuts in the factory are >> handled, >> I simply like to suggest to do the normalization 'by hand' :) >> >> Cheers, >> >> helge >> >> >> On 22 February 2017 at 11:09, Ben Smith <ben...@gm...> wrote: >> > Hi Helge, >> > >> > Hi Helge, >> > >> > Thanks a lot for taking the time to explain my confusions... >> > >> > I discovered that there is a "NormMode=EqualNumEvents" which should do >> > what >> > you proposed automatically, unless I misunderstood it. >> > >> > You say the signal should be normalized to the "explicit number of >> > events" >> > .. or if you you have weighted the events, the 'sum_of_event_weights' of >> > you >> > total background sample" (I got it why now) >> > >> > To normalize to the same number of events as the background, I wonder >> > what >> > is the best way to do this. Would it ork if I use >> > "NormMode=EqualNumEvents"? >> > Or is it the case that if I use the background weights as we discussed, >> > I >> > should than have "NormMode=None", necessarily? Because if that is the >> > case, >> > than I am not sure what to enter for Double_t signalWeight =?? , so that >> > this is acomplished (I don't know the cross section of the signal in >> > this >> > particular case). In your first message I re-read that you said that >> > this >> > signal_weight should be >> > "sum_over_background_weights/sum_over_signal_weights", so what was >> > missing >> > from "signalWeight= Xsec1*eff1/Nevents_mc1 + Xsec1*eff2/Nevents_mc2" >> > is the 1/(sum_over_signal_weights) but this I don't know (there are no >> > event >> > weights for the sinal, as far as I understand, and probably no way to >> > check). >> > >> > "so .. n_events should be the 'number of events' that corresponds to >> > the >> > 'eff' that you put in there." >> > >> > I'm thinking than I will just use Xsec/N_events_final, which should be >> > equivalent to having the correct final efficiency. The thing is that the >> > efficiency I quoted takes into account only the generator level cuts, >> > not >> > analysis cuts. I have the "integrated luminosity of the sample" but this >> > is >> > before applying any analysis cuts - I thought I could use this, but what >> > you >> > say does not corroborate that. So, if I take the actual number of events >> > I >> > see in the tuple I pass to TMVA and the cross section, this should be >> > enough. >> > >> > Thanks!!! >> > >> > Ben >> > >> > On Tue, Feb 21, 2017 at 8:57 PM, Helge Voss <Hel...@ce...> wrote: >> >> >> >> Hi Ben, >> >> >> >> There's still some misunderstanding, I'll try to explain below >> >> >> >> >> >> > >> >> > // global event weights per tree (see below for setting event-wise >> >> > weights) >> >> > Double_t backgroundWeight1 = Xsec1*eff1/Nevents_mc1; >> >> > Double_t backgroundWeight2 = Xsec1*eff2/Nevents_mc2; >> >> > Double_t signalWeight = Xsec1*eff1/Nevents_mc1 + >> >> > Xsec1*eff2/Nevents_mc2; >> >> >> >> For the 'signal' scaling, you really don't care about 'lumi' (when I >> >> say >> >> lumi, >> >> I mean 'integrated luminosity' obviously) of your background >> >> monte carlo events, but the "explicit number of events" .. or if you >> >> you >> >> have >> >> weighted the events, the 'sum_of_event_weights' of you total background >> >> sample >> >> >> >> As I tried to explain in the previous mail, the signal sample should >> >> NOT be normalized >> >> to the same lumi as the background, but the the same "number of >> >> events". And typically >> >> for "signal" that is a much much larger lumi than for background. >> >> (maybe if you read the >> >> previous mail again, you understand "WHY" I said this should be the >> >> case) >> >> >> >> >> >> > >> >> > // You can add an arbitrary number of signal or background trees >> >> > dataloader->AddBackgroundTree( background1, >> >> > backgroundWeightSample1 >> >> > ); >> >> > dataloader->AddBackgroundTree( background2, >> >> > backgroundWeightSample2 >> >> > ); >> >> > dataloader->AddSignalTree ( signal, signalWeight ); >> >> > >> >> > I confess I would have thought taking the largest 1/Lumi for the >> >> > background >> >> > would have been enough. Say I collect 10 fb-1 of background1 and 20 >> >> > fb-1 >> >> > of >> >> > background2 simultaneously. I would expect not to be able to collect >> >> > more >> >> > than 20 fb-1 of signal than. But I guess you're being very >> >> > conservative >> >> > to >> >> > be on the safe side. >> >> >> >> So.. after what I wrote above, it should now hopefully be clear that >> >> this is also >> >> wrong. I wasn't 'conservative' or 'on the safe side' by taking the sum >> >> of their integrated >> >> luminosities, I simply meant a different 'scaling', based on actual >> >> number of events (sum of >> >> event weightes) in the respecteive signal and background sample. >> >> >> >> > >> >> > I have one very last question, if you would not mind... I suddenly >> >> > realize >> >> > I don't know how exactly to take Nevent_mc. When the samples are >> >> > prepared >> >> > and are available to use they have a certain number of events. But >> >> > when >> >> > I >> >> > prepare the tuple to pass to TMVA, a few cuts are applied and I have >> >> > less >> >> > events. Which one does TMVA want? >> >> >> >> Again, TMVA want's nothing ;) You WANT to give it a background sample >> >> that is as close >> >> to that which you have in the data (i.e. the event distributions that >> >> TMVA sees and >> >> tries to discriminate you signal against, should be as similar as >> >> possible to what the >> >> trained classifier will be exposed to when it is in the end applied to >> >> your data. Hence >> >> you can always use this in order to determine how you want to 'scale' >> >> your various event >> >> samples. That's why I said: "scale your different background samples >> >> such that they all >> >> >> >> xsec * eff / n_events = 1/(integrated lumi) >> >> >> >> so .. n_events should be the 'number of events' that corresponds to >> >> the 'eff' that you put >> >> in there. So if you have some cuts, eff should take into account those >> >> cuts AND of course >> >> all cuts that your event generator might have applied etc.. >> >> >> >> Cheers, >> >> >> >> Helge >> >> >> >> >> >> > >> >> > Many thanks, >> >> > >> >> > Ben >> >> > >> >> > On Mon, Feb 20, 2017 at 5:09 PM, Helge Voss <Hel...@ce...> >> >> > wrote: >> >> >> >> >> >> Hi Ben, >> >> >> >> >> >> > normalize to 1/lumi(sample_i) than my impression that I should >> >> >> > pass >> >> >> > the >> >> >> > number of events of each sample as well was correct. For my >> >> >> > samples >> >> >> > I >> >> >> > would >> >> >> > have >> >> >> > >> >> >> > lumi(sample_i) = N_events_mc/Xsec*eff >> >> >> > >> >> >> > So 1/Lumi_sample_i = Xsec*eff/Nevents_mc >> >> >> > >> >> >> > the "lumi" I was using was just a global constant that would not >> >> >> > change >> >> >> > the >> >> >> > normalization between the samples so it can be omitted (like, I >> >> >> > could >> >> >> > multiply the lumi of all background samples by 10 and this should >> >> >> > not >> >> >> > make a >> >> >> > difference, as far as I understand). >> >> >> >> >> >> Yes exactly so far! >> >> >> >> >> >> > >> >> >> > In order to pass this, I understand I should do: >> >> >> > >> >> >> > factory->SetBackgroundWeightExpression( "weight_bkg" ); >> >> >> > >> >> >> > and have the variable "weight_bkg = 1/lumi(sample_i)" (read >> >> >> > directly >> >> >> > from >> >> >> > the ntuple) >> >> >> >> >> >> as this weight would be the same for 'every' event in a particular >> >> >> sample, rather >> >> >> than haveing to write this into the N-tuple, you can much easier >> >> >> use: >> >> >> >> >> >> // global event weights per tree (see below for setting >> >> >> event-wise >> >> >> weights) >> >> >> Double_t backgroundWeightSample1 = >> >> >> <theNumberYouCalculatedForSample1>; >> >> >> Double_t backgroundWeightSample2 = >> >> >> <theNumberYouCalculatedForSample2>; >> >> >> etc.. >> >> >> >> >> >> dataloader->AddBackgroundTree( background1, >> >> >> backgroundWeightSample1 >> >> >> ); >> >> >> dataloader->AddBackgroundTree( background2, >> >> >> backgroundWeightSample2 >> >> >> ); >> >> >> >> >> >> (or 'factory" instead of "dataloader" for older root/tmva versions, >> >> >> like root 5.xx) >> >> >> >> >> >> the "SetBackgroundWeightExpression" is meant if your monte carlo >> >> >> generator >> >> >> creates event weights rather than 'events', or if you train using >> >> >> 'sWeights' for >> >> >> example, where each event gets a particular weight in order to end >> >> >> up >> >> >> with >> >> >> the >> >> >> correct 'average distribution' of events. >> >> >> >> >> >> > For the signal, I'm not sure I get what you said... should I not >> >> >> > simply >> >> >> > have: >> >> >> > >> >> >> > factory->SetSignalWeightExpression("weight_sg" ); and have >> >> >> > weight_sg >> >> >> > = >> >> >> > 1? >> >> >> >> >> >> No, as obviously that makes 'nothing' :) >> >> >> >> >> >> > >> >> >> > I have only one signal sample and several background, not several >> >> >> > signal. Or >> >> >> > are you saying >> >> >> > >> >> >> > weight_sg = total sum of 1/lumi(sample_i)? (why?) >> >> >> >> >> >> This is also not 'recommended', but in general it is the best >> >> >> 'default' to have the >> >> >> same number of (weighted) events in the signal sample as in the >> >> >> background, even >> >> >> if in the real data, your signal sample is typically much smaller >> >> >> than the background. >> >> >> This is, because in the extreme case of a very rare signal, the >> >> >> simplest classifier which >> >> >> just 'call everything background' already has a very good 'overall' >> >> >> perfromance, as it is >> >> >> correct in 'almost all cases'. (as most events are background). But >> >> >> of >> >> >> course, that classifier >> >> >> is not what you want. Therefore I suggested to weight the signal >> >> >> sample with an overall >> >> >> constant factor such that the total number of sum_of_weights for the >> >> >> signal sample is >> >> >> equal to the sum over background events. >> >> >> >> >> >> Cheers, >> >> >> >> >> >> Helge >> >> >> >> >> >> >> >> >> > >> >> >> > Thanks really a lot!!! >> >> >> > >> >> >> > Ben >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > On Mon, Feb 20, 2017 at 3:04 PM, Helge Voss <Hel...@ce...> >> >> >> > wrote: >> >> >> >> >> >> >> >> Hi Ben, >> >> >> >> >> >> >> >> Maybe I didn't understand you, as I don't see at all why you use >> >> >> >> a >> >> >> >> factor Xsec*eff*lumi. >> >> >> >> >> >> >> >> TMVA just concatenates the different background source files >> >> >> >> together >> >> >> >> without doing anything... >> >> >> >> hence you should apply the factor 1/lumi(sample_i) to each MC >> >> >> >> sample >> >> >> >> (i=1,2,3) to normalize the >> >> >> >> various samples to the same integrated luminosity. Doing this, >> >> >> >> TMVA >> >> >> >> sees a background sample that >> >> >> >> has the same distribution as it would be in the data. Then >> >> >> >> afterwards, >> >> >> >> you should use "NormMode=None" >> >> >> >> (NormMode takes care of how the total Signal is weighted w.r.t. >> >> >> >> the >> >> >> >> total background). And if you choose "None" >> >> >> >> here, again, TMVA does nothing and you can normalize easily your >> >> >> >> signal sample to the background sample, >> >> >> >> buy multiplying as signal weight >> >> >> >> "sum_over_background_weights/sum_over_signal_weights") >> >> >> >> Where here sum goes over the events and 'background weight' for >> >> >> >> example would be the weights you >> >> >> >> caculated above for the relative background weighting, multiplied >> >> >> >> with >> >> >> >> eventual event weights from the monte carlo. >> >> >> >> For the signal, it would be simply the 'event_weights' if the MC >> >> >> >> you >> >> >> >> used produces weighted events rather than >> >> >> >> 'just events' >> >> >> >> >> >> >> >> Cheers, >> >> >> >> >> >> >> >> Helge >> >> >> >> >> >> >> >> >> >> >> >> On 20 February 2017 at 15:19, Ben Smith <ben...@gm...> >> >> >> >> wrote: >> >> >> >> > Hello! >> >> >> >> > >> >> >> >> > I have a hopefully simple question regarding how weights are >> >> >> >> > passed >> >> >> >> > to >> >> >> >> > TMVA. >> >> >> >> > >> >> >> >> > I have one signal sample and 3 background samples that I want >> >> >> >> > to >> >> >> >> > pass >> >> >> >> > to >> >> >> >> > TMVA. In ROOT, the background samples would be normalized in an >> >> >> >> > historgam h >> >> >> >> > as: >> >> >> >> > >> >> >> >> > h->Fill(var, weight); where weight = N_events/(h->Integral()) >> >> >> >> > >> >> >> >> > with N_events = Xsec*eff*lumi; >> >> >> >> > >> >> >> >> > var is a variable that will be used in TMVA, and N_events is >> >> >> >> > the >> >> >> >> > number >> >> >> >> > of >> >> >> >> > events I want to normalize to. In case of my samples this >> >> >> >> > number >> >> >> >> > depends >> >> >> >> > on >> >> >> >> > the cross-section (Xsec), on the efficiency of the sample >> >> >> >> > (eff), >> >> >> >> > and >> >> >> >> > on >> >> >> >> > the >> >> >> >> > luminosity. Note that the weight I actually use depends on >> >> >> >> > h->integral, >> >> >> >> > because each sample has a different number of events and this >> >> >> >> > must >> >> >> >> > be >> >> >> >> > taken >> >> >> >> > into account. >> >> >> >> > >> >> >> >> > I need to pass the correct weights to TMVA. The question is, >> >> >> >> > should I >> >> >> >> > pass >> >> >> >> > simply "Xsec*eff*lumi" (in other words, TMVA does an equivalent >> >> >> >> > of >> >> >> >> > 1/h->integral by default) or should I pass actually >> >> >> >> > Xsec*eff*lumi/N_events_mc, where N_events_mc is the original >> >> >> >> > number >> >> >> >> > of >> >> >> >> > events in each sample? Note that passing N_events_mc is not >> >> >> >> > really >> >> >> >> > ideal, as >> >> >> >> > there a few cuts involved. Alternatively, how would I do the >> >> >> >> > equivalent >> >> >> >> > of >> >> >> >> > h->Integral() at the TMVA level? >> >> >> >> > >> >> >> >> > Thanks a lot in advance for any help, and apologies if >> >> >> >> > something >> >> >> >> > is >> >> >> >> > not >> >> >> >> > very >> >> >> >> > well explained or confusing! >> >> >> >> > >> >> >> >> > Ben >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > ------------------------------------------------------------------------------ >> >> >> >> > Check out the vibrant tech community on one of the world's most >> >> >> >> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot >> >> >> >> > _______________________________________________ >> >> >> >> > TMVA-users mailing list >> >> >> >> > TMV...@li... >> >> >> >> > https://lists.sourceforge.net/lists/listinfo/tmva-users >> >> >> >> > >> >> >> > >> >> >> > >> >> > >> >> > >> > >> > > > |
From: Ben S. <ben...@gm...> - 2017-03-09 12:16:04
|
Hi Helge, OK! I tried all sorts of ways to normalize signal and background and indeed, doing NormMode = None and normalizing signal by hand and using NormMode=EqualNumEvents, seems to be *very* different.... So, I wonder if I'm doing the normalization of the signal by hand correctly. You said "I suggested to weight the signal sample with an overall constant factor such that the total number of sum_of_weights for the signal sample is equal to the sum over background events" So, I did the following: Nev_back1*W1 + Nev_back2*W2 + Nev_back3*W3 = Nev_sig*Weight_signal So Weight_signal = (Xsec_back1 + Xsec_back_2 + Xsec_back3)/Nsignal_ events In other words, the signal weight "by hand" should be the sum of all background cross sections divided by the final number of events in the signal sample that is passed to TMVA. Did I get that right? Thank you, Ben I On Sun, Feb 26, 2017 at 9:05 PM, Helge Voss <Hel...@ce...> wrote: > Hi, > > yes exactly, NormMode=EqualNumEvents would take car of the bit that > normalizes 'Signal' to 'background'. The relative weighting of the > various > background samples you still have to do yourself though. And as I > never remember how possible preselection cuts in the factory are > handled, > I simply like to suggest to do the normalization 'by hand' :) > > Cheers, > > helge > > > On 22 February 2017 at 11:09, Ben Smith <ben...@gm...> wrote: > > Hi Helge, > > > > Hi Helge, > > > > Thanks a lot for taking the time to explain my confusions... > > > > I discovered that there is a "NormMode=EqualNumEvents" which should do > what > > you proposed automatically, unless I misunderstood it. > > > > You say the signal should be normalized to the "explicit number of > events" > > .. or if you you have weighted the events, the 'sum_of_event_weights' of > you > > total background sample" (I got it why now) > > > > To normalize to the same number of events as the background, I wonder > what > > is the best way to do this. Would it ork if I use > "NormMode=EqualNumEvents"? > > Or is it the case that if I use the background weights as we discussed, I > > should than have "NormMode=None", necessarily? Because if that is the > case, > > than I am not sure what to enter for Double_t signalWeight =?? , so that > > this is acomplished (I don't know the cross section of the signal in this > > particular case). In your first message I re-read that you said that this > > signal_weight should be > > "sum_over_background_weights/sum_over_signal_weights", so what was > missing > > from "signalWeight= Xsec1*eff1/Nevents_mc1 + Xsec1*eff2/Nevents_mc2" > > is the 1/(sum_over_signal_weights) but this I don't know (there are no > event > > weights for the sinal, as far as I understand, and probably no way to > > check). > > > > "so .. n_events should be the 'number of events' that corresponds to the > > 'eff' that you put in there." > > > > I'm thinking than I will just use Xsec/N_events_final, which should be > > equivalent to having the correct final efficiency. The thing is that the > > efficiency I quoted takes into account only the generator level cuts, not > > analysis cuts. I have the "integrated luminosity of the sample" but this > is > > before applying any analysis cuts - I thought I could use this, but what > you > > say does not corroborate that. So, if I take the actual number of events > I > > see in the tuple I pass to TMVA and the cross section, this should be > > enough. > > > > Thanks!!! > > > > Ben > > > > On Tue, Feb 21, 2017 at 8:57 PM, Helge Voss <Hel...@ce...> wrote: > >> > >> Hi Ben, > >> > >> There's still some misunderstanding, I'll try to explain below > >> > >> > >> > > >> > // global event weights per tree (see below for setting event-wise > >> > weights) > >> > Double_t backgroundWeight1 = Xsec1*eff1/Nevents_mc1; > >> > Double_t backgroundWeight2 = Xsec1*eff2/Nevents_mc2; > >> > Double_t signalWeight = Xsec1*eff1/Nevents_mc1 + > >> > Xsec1*eff2/Nevents_mc2; > >> > >> For the 'signal' scaling, you really don't care about 'lumi' (when I say > >> lumi, > >> I mean 'integrated luminosity' obviously) of your background > >> monte carlo events, but the "explicit number of events" .. or if you you > >> have > >> weighted the events, the 'sum_of_event_weights' of you total background > >> sample > >> > >> As I tried to explain in the previous mail, the signal sample should > >> NOT be normalized > >> to the same lumi as the background, but the the same "number of > >> events". And typically > >> for "signal" that is a much much larger lumi than for background. > >> (maybe if you read the > >> previous mail again, you understand "WHY" I said this should be the > case) > >> > >> > >> > > >> > // You can add an arbitrary number of signal or background trees > >> > dataloader->AddBackgroundTree( background1, backgroundWeightSample1 > >> > ); > >> > dataloader->AddBackgroundTree( background2, backgroundWeightSample2 > >> > ); > >> > dataloader->AddSignalTree ( signal, signalWeight ); > >> > > >> > I confess I would have thought taking the largest 1/Lumi for the > >> > background > >> > would have been enough. Say I collect 10 fb-1 of background1 and 20 > fb-1 > >> > of > >> > background2 simultaneously. I would expect not to be able to collect > >> > more > >> > than 20 fb-1 of signal than. But I guess you're being very > conservative > >> > to > >> > be on the safe side. > >> > >> So.. after what I wrote above, it should now hopefully be clear that > >> this is also > >> wrong. I wasn't 'conservative' or 'on the safe side' by taking the sum > >> of their integrated > >> luminosities, I simply meant a different 'scaling', based on actual > >> number of events (sum of > >> event weightes) in the respecteive signal and background sample. > >> > >> > > >> > I have one very last question, if you would not mind... I suddenly > >> > realize > >> > I don't know how exactly to take Nevent_mc. When the samples are > >> > prepared > >> > and are available to use they have a certain number of events. But > when > >> > I > >> > prepare the tuple to pass to TMVA, a few cuts are applied and I have > >> > less > >> > events. Which one does TMVA want? > >> > >> Again, TMVA want's nothing ;) You WANT to give it a background sample > >> that is as close > >> to that which you have in the data (i.e. the event distributions that > >> TMVA sees and > >> tries to discriminate you signal against, should be as similar as > >> possible to what the > >> trained classifier will be exposed to when it is in the end applied to > >> your data. Hence > >> you can always use this in order to determine how you want to 'scale' > >> your various event > >> samples. That's why I said: "scale your different background samples > >> such that they all > >> > >> xsec * eff / n_events = 1/(integrated lumi) > >> > >> so .. n_events should be the 'number of events' that corresponds to > >> the 'eff' that you put > >> in there. So if you have some cuts, eff should take into account those > >> cuts AND of course > >> all cuts that your event generator might have applied etc.. > >> > >> Cheers, > >> > >> Helge > >> > >> > >> > > >> > Many thanks, > >> > > >> > Ben > >> > > >> > On Mon, Feb 20, 2017 at 5:09 PM, Helge Voss <Hel...@ce...> > wrote: > >> >> > >> >> Hi Ben, > >> >> > >> >> > normalize to 1/lumi(sample_i) than my impression that I should pass > >> >> > the > >> >> > number of events of each sample as well was correct. For my > samples > >> >> > I > >> >> > would > >> >> > have > >> >> > > >> >> > lumi(sample_i) = N_events_mc/Xsec*eff > >> >> > > >> >> > So 1/Lumi_sample_i = Xsec*eff/Nevents_mc > >> >> > > >> >> > the "lumi" I was using was just a global constant that would not > >> >> > change > >> >> > the > >> >> > normalization between the samples so it can be omitted (like, I > could > >> >> > multiply the lumi of all background samples by 10 and this should > not > >> >> > make a > >> >> > difference, as far as I understand). > >> >> > >> >> Yes exactly so far! > >> >> > >> >> > > >> >> > In order to pass this, I understand I should do: > >> >> > > >> >> > factory->SetBackgroundWeightExpression( "weight_bkg" ); > >> >> > > >> >> > and have the variable "weight_bkg = 1/lumi(sample_i)" (read > directly > >> >> > from > >> >> > the ntuple) > >> >> > >> >> as this weight would be the same for 'every' event in a particular > >> >> sample, rather > >> >> than haveing to write this into the N-tuple, you can much easier > use: > >> >> > >> >> // global event weights per tree (see below for setting event-wise > >> >> weights) > >> >> Double_t backgroundWeightSample1 = > >> >> <theNumberYouCalculatedForSample1>; > >> >> Double_t backgroundWeightSample2 = > >> >> <theNumberYouCalculatedForSample2>; > >> >> etc.. > >> >> > >> >> dataloader->AddBackgroundTree( background1, > backgroundWeightSample1 > >> >> ); > >> >> dataloader->AddBackgroundTree( background2, > backgroundWeightSample2 > >> >> ); > >> >> > >> >> (or 'factory" instead of "dataloader" for older root/tmva versions, > >> >> like root 5.xx) > >> >> > >> >> the "SetBackgroundWeightExpression" is meant if your monte carlo > >> >> generator > >> >> creates event weights rather than 'events', or if you train using > >> >> 'sWeights' for > >> >> example, where each event gets a particular weight in order to end up > >> >> with > >> >> the > >> >> correct 'average distribution' of events. > >> >> > >> >> > For the signal, I'm not sure I get what you said... should I not > >> >> > simply > >> >> > have: > >> >> > > >> >> > factory->SetSignalWeightExpression("weight_sg" ); and have > weight_sg > >> >> > = > >> >> > 1? > >> >> > >> >> No, as obviously that makes 'nothing' :) > >> >> > >> >> > > >> >> > I have only one signal sample and several background, not several > >> >> > signal. Or > >> >> > are you saying > >> >> > > >> >> > weight_sg = total sum of 1/lumi(sample_i)? (why?) > >> >> > >> >> This is also not 'recommended', but in general it is the best > >> >> 'default' to have the > >> >> same number of (weighted) events in the signal sample as in the > >> >> background, even > >> >> if in the real data, your signal sample is typically much smaller > >> >> than the background. > >> >> This is, because in the extreme case of a very rare signal, the > >> >> simplest classifier which > >> >> just 'call everything background' already has a very good 'overall' > >> >> perfromance, as it is > >> >> correct in 'almost all cases'. (as most events are background). But > of > >> >> course, that classifier > >> >> is not what you want. Therefore I suggested to weight the signal > >> >> sample with an overall > >> >> constant factor such that the total number of sum_of_weights for the > >> >> signal sample is > >> >> equal to the sum over background events. > >> >> > >> >> Cheers, > >> >> > >> >> Helge > >> >> > >> >> > >> >> > > >> >> > Thanks really a lot!!! > >> >> > > >> >> > Ben > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > On Mon, Feb 20, 2017 at 3:04 PM, Helge Voss <Hel...@ce...> > >> >> > wrote: > >> >> >> > >> >> >> Hi Ben, > >> >> >> > >> >> >> Maybe I didn't understand you, as I don't see at all why you use a > >> >> >> factor Xsec*eff*lumi. > >> >> >> > >> >> >> TMVA just concatenates the different background source files > >> >> >> together > >> >> >> without doing anything... > >> >> >> hence you should apply the factor 1/lumi(sample_i) to each MC > sample > >> >> >> (i=1,2,3) to normalize the > >> >> >> various samples to the same integrated luminosity. Doing this, > TMVA > >> >> >> sees a background sample that > >> >> >> has the same distribution as it would be in the data. Then > >> >> >> afterwards, > >> >> >> you should use "NormMode=None" > >> >> >> (NormMode takes care of how the total Signal is weighted w.r.t. > the > >> >> >> total background). And if you choose "None" > >> >> >> here, again, TMVA does nothing and you can normalize easily your > >> >> >> signal sample to the background sample, > >> >> >> buy multiplying as signal weight > >> >> >> "sum_over_background_weights/sum_over_signal_weights") > >> >> >> Where here sum goes over the events and 'background weight' for > >> >> >> example would be the weights you > >> >> >> caculated above for the relative background weighting, multiplied > >> >> >> with > >> >> >> eventual event weights from the monte carlo. > >> >> >> For the signal, it would be simply the 'event_weights' if the MC > you > >> >> >> used produces weighted events rather than > >> >> >> 'just events' > >> >> >> > >> >> >> Cheers, > >> >> >> > >> >> >> Helge > >> >> >> > >> >> >> > >> >> >> On 20 February 2017 at 15:19, Ben Smith <ben...@gm...> > >> >> >> wrote: > >> >> >> > Hello! > >> >> >> > > >> >> >> > I have a hopefully simple question regarding how weights are > >> >> >> > passed > >> >> >> > to > >> >> >> > TMVA. > >> >> >> > > >> >> >> > I have one signal sample and 3 background samples that I want to > >> >> >> > pass > >> >> >> > to > >> >> >> > TMVA. In ROOT, the background samples would be normalized in an > >> >> >> > historgam h > >> >> >> > as: > >> >> >> > > >> >> >> > h->Fill(var, weight); where weight = N_events/(h->Integral()) > >> >> >> > > >> >> >> > with N_events = Xsec*eff*lumi; > >> >> >> > > >> >> >> > var is a variable that will be used in TMVA, and N_events is the > >> >> >> > number > >> >> >> > of > >> >> >> > events I want to normalize to. In case of my samples this number > >> >> >> > depends > >> >> >> > on > >> >> >> > the cross-section (Xsec), on the efficiency of the sample (eff), > >> >> >> > and > >> >> >> > on > >> >> >> > the > >> >> >> > luminosity. Note that the weight I actually use depends on > >> >> >> > h->integral, > >> >> >> > because each sample has a different number of events and this > must > >> >> >> > be > >> >> >> > taken > >> >> >> > into account. > >> >> >> > > >> >> >> > I need to pass the correct weights to TMVA. The question is, > >> >> >> > should I > >> >> >> > pass > >> >> >> > simply "Xsec*eff*lumi" (in other words, TMVA does an equivalent > of > >> >> >> > 1/h->integral by default) or should I pass actually > >> >> >> > Xsec*eff*lumi/N_events_mc, where N_events_mc is the original > >> >> >> > number > >> >> >> > of > >> >> >> > events in each sample? Note that passing N_events_mc is not > really > >> >> >> > ideal, as > >> >> >> > there a few cuts involved. Alternatively, how would I do the > >> >> >> > equivalent > >> >> >> > of > >> >> >> > h->Integral() at the TMVA level? > >> >> >> > > >> >> >> > Thanks a lot in advance for any help, and apologies if something > >> >> >> > is > >> >> >> > not > >> >> >> > very > >> >> >> > well explained or confusing! > >> >> >> > > >> >> >> > Ben > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > ------------------------------------------------------------ > ------------------ > >> >> >> > Check out the vibrant tech community on one of the world's most > >> >> >> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > >> >> >> > _______________________________________________ > >> >> >> > TMVA-users mailing list > >> >> >> > TMV...@li... > >> >> >> > https://lists.sourceforge.net/lists/listinfo/tmva-users > >> >> >> > > >> >> > > >> >> > > >> > > >> > > > > > > |
From: Helge V. <Hel...@ce...> - 2017-03-02 07:27:45
|
Sure, it's implemented in 'MethodBase', hence intherited by all Methods.. Helge On 2 March 2017 at 07:55, Divya D Nair <div...@gm...> wrote: > Dear Experts, > > May I know if the ROC integral function is implemented for all MVA methods > in TMVA? > > -- > > Thanks and Regards > Divya > > ----Don't be serious, be simple and sincere------ > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > TMVA-users mailing list > TMV...@li... > https://lists.sourceforge.net/lists/listinfo/tmva-users > |
From: Divya D N. <div...@gm...> - 2017-03-02 06:55:31
|
Dear Experts, May I know if the ROC integral function is implemented for all MVA methods in TMVA? -- Thanks and Regards Divya ----Don't be serious, be simple and sincere------ |
From: Olivia J. <oli...@b2...> - 2017-02-27 18:05:49
|
Hi, Hope you doing well! We've released a potential client's list for Business information and media industry professionals for the year 2017. Would you be interested in acquiring client's list to sell or market your products in your location and worldwide? And, please let me know can I consider the below criteria as your target market? Accordingly, will get back to you with Data counts, Accuracy, Update and much more information Job titles/categories:- * Publishers, CEOs, Chief Revenue Officers * Business development and licensing executives * Marketers * Investors and deal makers * Human capital practitioners * Data strategists, systems engineers, online managers * Many more If you would like to reach any other Industries/Titles. You can just fill in below and revert back on the same email. Target Criteria:- Target Industries: ______? (We maintain data across all Business to Business (b2b) Industries across North America, UK and Europe) Job Titles: ______? Geography: ______? Would be pleased to hear back from you Regards, Olivia Jones Marketing Executive _____ Instructions to remove from this mailing: Reply with subject line as "Leave out" and indicate your email address to be removed from our database. |
From: Razvan M. <RX...@st...> - 2017-02-27 16:56:05
|
Hi, I was referring to another branch of the experimental framework I work with, not of ROOT. But I agree it can be some compatibility problem. I've decided to work around this by preparing another network of similar architecture but a larger training sample and the file sizes this time were more sensible. It might've just been a fluke on my side which caused the file to be written incorrectly. Thank you for all the advice and help! Dan ________________________________ From: hel...@gm... <hel...@gm...> on behalf of Helge Voss <Hel...@ce...> Sent: 26 February 2017 19:09:45 To: Razvan Moise Cc: Christopher Jones; tmv...@li... Subject: Re: [TMVA-users] Reader expression wrong name or order Hi, well, 30kB really doesn't sound terribly large at all, for this size I'm sure the problem must be something else, weird .. I don't have a good idea though. If you say you have used the same weights on a different branch (probably TMVA/ROOT vesion), maybe it's a compatibility problem. While we tried to make sure we are always 'backward compatible', it could of course be that some things might not be 'foward compabible' .. (so if it is trained on a later version, it might not necessarily run on an 'older' version..) Although I don't really remember big changes 'recently' on the MLP that would hint to that as a possible cause of your problem. Helge On 26 February 2017 at 19:38, <rx...@st...> wrote: > Hi Helge, > > > > It’s fairly large, 30KB compared to my other weight files. I’ve used these > weights before on a different branch of the framework I work on and I > haven’t seen this error before. The network is not very big: a single hidden > layer with only 30ish nodes. For now I’m using a similar one which has a > smaller xml file > > So from what I understand I have to change this hardcoded limit. In what > file can I find it and edit it? Or is there another way to solve the issue? > > > > Thank you, > > Dan > > > > From: Helge Voss > Sent: 26 February 2017 18:10 > To: rx...@st... > Cc: Christopher Jones; tmv...@li... > > > Subject: Re: [TMVA-users] Reader expression wrong name or order > > > > Hi Christopher, > > > > I remember this error appearing when the 'xml' file became too large. > > There's a hardcoded limit in ROOT (well, I'm not sure if > > it's still hardcoded) and it got increased at some point (from root > > 5.29 onwards) > > > > #if ROOT_VERSION_CODE >= ROOT_VERSION(5,29,0) > > void* doc = > > gTools().xmlengine().ParseFile(tfname,gTools().xmlenginebuffersize()); > > // the default buffer size in TXMLEngine::ParseFile is 100k. Starting > > with ROOT 5.29 one can set the buffer size, see: > > http://savannah.cern.ch/bugs/?78864. This might be necessary for large > > XML files > > #else > > void* doc = gTools().xmlengine().ParseFile(tfname); > > #endif > > > > haha..unforunatly, the 'xmlenginebuffersize()' still returns a > > 'hardcoded' limit: > > > > int xmlenginebuffersize() { return 10000000; } (in TMVA::Tools.h) > > > > so I guess you'll have to change and 'recompile' if that's the problem. > > > > Is your xml file particularly large? Are you training a huge neural network > ? ( > > > > Helge > > > > > > On 26 February 2017 at 13:48, <rx...@st...> wrote: > >> Hi all, > >> > >> > >> > >> The spaces were there just as examples, the actual names are identical and > >> contain no spaces. I also checked the encoding and it seems to be standard > >> Linux UTF-8. > >> > >> I found out that the previous error was caused by the fact that I was >> giving > >> the reader the .C file rather than the .xml. So that error disappeared but > >> now I have a new one: > >> > >> > >> > >> Error in <TXMLEngine::ParseFile>: XML syntax error at line 51 > >> > >> --- <FATAL> Tools : Trying to read non-existing >> attribute > >> 'Method' from xml node ' > >> > >> ***> abort program execution > >> > >> > >> > >> I’ve searched for the string ‘Method’ in my files (I thought maybe I was > >> doing something like BookMVA(“MLP Method”...) but that was not the case. >> How > >> can I fix this new error? > >> > >> > >> > >> Thank you, > >> > >> Dan > >> > >> > >> > >> From: Christopher Jones > >> Sent: 26 February 2017 10:02 > >> To: rx...@st... > >> Cc: tmv...@li... > >> Subject: Re: [TMVA-users] Reader expression wrong name or order > >> > >> > >> > >> Hi, > >> > >> > >> > >> This is a guess, but try removing the space from the MVA name… TMVA might > >> not be handling that well. > >> > >> > >> > >> Chris > >> > >> > >> > >> On 26 Feb 2017, at 1:50 am, rx...@st... wrote: > >> > >> > >> > >> Dear experts, > >> > >> > >> > >> I’ve been trying to write a class which among other things performs an MLP > >> MVA using an already existing weights file. > >> > >> I declare the reader in the header: > >> > >> > >> > >> TMVA::Reader *reader; > >> > >> > >> > >> In the constructor I do the following: > >> > >> > >> > >> reader = new TMVA::Reader(); > >> > >> reader->AddVariable("variable 1", &var1); > >> > >> ... > >> > >> reader->AddVariable("variable n", &varn); > >> > >> reader->BookMVA("MLP method", fWeights); > >> > >> > >> > >> The evaluation is then done in a separate method: > >> > >> > >> > >> return reader->EvaluateMVA("MLP method"); > >> > >> > >> > >> I’ve been very careful to keep the order in which I add the variables the > >> same as the one in which they feature in the weights file. However at > >> runtime I keep getting the following error: > >> > >> > >> > >> --- <FATAL> MLP : The expression declared to the >> Reader > >> needs to be checked (name or order are wrong) > >> > >> ***> abort program execution > >> > >> > >> > >> Is there anything that I am missing? I am using ROOT version 6.06.02 and > >> C++11. > >> > >> > >> > >> Kind Regards, > >> > >> Dan > >> > >> > >> > >> > >> > >> >> ------------------------------------------------------------------------------ > >> Check out the vibrant tech community on one of the world's most > >> engaging tech sites, SlashDot.org! > >> http://sdm.link/slashdot_______________________________________________ > >> TMVA-users mailing list > >> TMV...@li... > >> https://lists.sourceforge.net/lists/listinfo/tmva-users > >> > >> > >> > >> > >> > >> > >> >> ------------------------------------------------------------------------------ > >> Check out the vibrant tech community on one of the world's most > >> engaging tech sites, SlashDot.org! http://sdm.link/slashdot > >> _______________________________________________ > >> TMVA-users mailing list > >> TMV...@li... > >> https://lists.sourceforge.net/lists/listinfo/tmva-users > >> > > |
From: Helge V. <Hel...@ce...> - 2017-02-26 20:06:38
|
Hi, yes exactly, NormMode=EqualNumEvents would take car of the bit that normalizes 'Signal' to 'background'. The relative weighting of the various background samples you still have to do yourself though. And as I never remember how possible preselection cuts in the factory are handled, I simply like to suggest to do the normalization 'by hand' :) Cheers, helge On 22 February 2017 at 11:09, Ben Smith <ben...@gm...> wrote: > Hi Helge, > > Hi Helge, > > Thanks a lot for taking the time to explain my confusions... > > I discovered that there is a "NormMode=EqualNumEvents" which should do what > you proposed automatically, unless I misunderstood it. > > You say the signal should be normalized to the "explicit number of events" > .. or if you you have weighted the events, the 'sum_of_event_weights' of you > total background sample" (I got it why now) > > To normalize to the same number of events as the background, I wonder what > is the best way to do this. Would it ork if I use "NormMode=EqualNumEvents"? > Or is it the case that if I use the background weights as we discussed, I > should than have "NormMode=None", necessarily? Because if that is the case, > than I am not sure what to enter for Double_t signalWeight =?? , so that > this is acomplished (I don't know the cross section of the signal in this > particular case). In your first message I re-read that you said that this > signal_weight should be > "sum_over_background_weights/sum_over_signal_weights", so what was missing > from "signalWeight= Xsec1*eff1/Nevents_mc1 + Xsec1*eff2/Nevents_mc2" > is the 1/(sum_over_signal_weights) but this I don't know (there are no event > weights for the sinal, as far as I understand, and probably no way to > check). > > "so .. n_events should be the 'number of events' that corresponds to the > 'eff' that you put in there." > > I'm thinking than I will just use Xsec/N_events_final, which should be > equivalent to having the correct final efficiency. The thing is that the > efficiency I quoted takes into account only the generator level cuts, not > analysis cuts. I have the "integrated luminosity of the sample" but this is > before applying any analysis cuts - I thought I could use this, but what you > say does not corroborate that. So, if I take the actual number of events I > see in the tuple I pass to TMVA and the cross section, this should be > enough. > > Thanks!!! > > Ben > > On Tue, Feb 21, 2017 at 8:57 PM, Helge Voss <Hel...@ce...> wrote: >> >> Hi Ben, >> >> There's still some misunderstanding, I'll try to explain below >> >> >> > >> > // global event weights per tree (see below for setting event-wise >> > weights) >> > Double_t backgroundWeight1 = Xsec1*eff1/Nevents_mc1; >> > Double_t backgroundWeight2 = Xsec1*eff2/Nevents_mc2; >> > Double_t signalWeight = Xsec1*eff1/Nevents_mc1 + >> > Xsec1*eff2/Nevents_mc2; >> >> For the 'signal' scaling, you really don't care about 'lumi' (when I say >> lumi, >> I mean 'integrated luminosity' obviously) of your background >> monte carlo events, but the "explicit number of events" .. or if you you >> have >> weighted the events, the 'sum_of_event_weights' of you total background >> sample >> >> As I tried to explain in the previous mail, the signal sample should >> NOT be normalized >> to the same lumi as the background, but the the same "number of >> events". And typically >> for "signal" that is a much much larger lumi than for background. >> (maybe if you read the >> previous mail again, you understand "WHY" I said this should be the case) >> >> >> > >> > // You can add an arbitrary number of signal or background trees >> > dataloader->AddBackgroundTree( background1, backgroundWeightSample1 >> > ); >> > dataloader->AddBackgroundTree( background2, backgroundWeightSample2 >> > ); >> > dataloader->AddSignalTree ( signal, signalWeight ); >> > >> > I confess I would have thought taking the largest 1/Lumi for the >> > background >> > would have been enough. Say I collect 10 fb-1 of background1 and 20 fb-1 >> > of >> > background2 simultaneously. I would expect not to be able to collect >> > more >> > than 20 fb-1 of signal than. But I guess you're being very conservative >> > to >> > be on the safe side. >> >> So.. after what I wrote above, it should now hopefully be clear that >> this is also >> wrong. I wasn't 'conservative' or 'on the safe side' by taking the sum >> of their integrated >> luminosities, I simply meant a different 'scaling', based on actual >> number of events (sum of >> event weightes) in the respecteive signal and background sample. >> >> > >> > I have one very last question, if you would not mind... I suddenly >> > realize >> > I don't know how exactly to take Nevent_mc. When the samples are >> > prepared >> > and are available to use they have a certain number of events. But when >> > I >> > prepare the tuple to pass to TMVA, a few cuts are applied and I have >> > less >> > events. Which one does TMVA want? >> >> Again, TMVA want's nothing ;) You WANT to give it a background sample >> that is as close >> to that which you have in the data (i.e. the event distributions that >> TMVA sees and >> tries to discriminate you signal against, should be as similar as >> possible to what the >> trained classifier will be exposed to when it is in the end applied to >> your data. Hence >> you can always use this in order to determine how you want to 'scale' >> your various event >> samples. That's why I said: "scale your different background samples >> such that they all >> >> xsec * eff / n_events = 1/(integrated lumi) >> >> so .. n_events should be the 'number of events' that corresponds to >> the 'eff' that you put >> in there. So if you have some cuts, eff should take into account those >> cuts AND of course >> all cuts that your event generator might have applied etc.. >> >> Cheers, >> >> Helge >> >> >> > >> > Many thanks, >> > >> > Ben >> > >> > On Mon, Feb 20, 2017 at 5:09 PM, Helge Voss <Hel...@ce...> wrote: >> >> >> >> Hi Ben, >> >> >> >> > normalize to 1/lumi(sample_i) than my impression that I should pass >> >> > the >> >> > number of events of each sample as well was correct. For my samples >> >> > I >> >> > would >> >> > have >> >> > >> >> > lumi(sample_i) = N_events_mc/Xsec*eff >> >> > >> >> > So 1/Lumi_sample_i = Xsec*eff/Nevents_mc >> >> > >> >> > the "lumi" I was using was just a global constant that would not >> >> > change >> >> > the >> >> > normalization between the samples so it can be omitted (like, I could >> >> > multiply the lumi of all background samples by 10 and this should not >> >> > make a >> >> > difference, as far as I understand). >> >> >> >> Yes exactly so far! >> >> >> >> > >> >> > In order to pass this, I understand I should do: >> >> > >> >> > factory->SetBackgroundWeightExpression( "weight_bkg" ); >> >> > >> >> > and have the variable "weight_bkg = 1/lumi(sample_i)" (read directly >> >> > from >> >> > the ntuple) >> >> >> >> as this weight would be the same for 'every' event in a particular >> >> sample, rather >> >> than haveing to write this into the N-tuple, you can much easier use: >> >> >> >> // global event weights per tree (see below for setting event-wise >> >> weights) >> >> Double_t backgroundWeightSample1 = >> >> <theNumberYouCalculatedForSample1>; >> >> Double_t backgroundWeightSample2 = >> >> <theNumberYouCalculatedForSample2>; >> >> etc.. >> >> >> >> dataloader->AddBackgroundTree( background1, backgroundWeightSample1 >> >> ); >> >> dataloader->AddBackgroundTree( background2, backgroundWeightSample2 >> >> ); >> >> >> >> (or 'factory" instead of "dataloader" for older root/tmva versions, >> >> like root 5.xx) >> >> >> >> the "SetBackgroundWeightExpression" is meant if your monte carlo >> >> generator >> >> creates event weights rather than 'events', or if you train using >> >> 'sWeights' for >> >> example, where each event gets a particular weight in order to end up >> >> with >> >> the >> >> correct 'average distribution' of events. >> >> >> >> > For the signal, I'm not sure I get what you said... should I not >> >> > simply >> >> > have: >> >> > >> >> > factory->SetSignalWeightExpression("weight_sg" ); and have weight_sg >> >> > = >> >> > 1? >> >> >> >> No, as obviously that makes 'nothing' :) >> >> >> >> > >> >> > I have only one signal sample and several background, not several >> >> > signal. Or >> >> > are you saying >> >> > >> >> > weight_sg = total sum of 1/lumi(sample_i)? (why?) >> >> >> >> This is also not 'recommended', but in general it is the best >> >> 'default' to have the >> >> same number of (weighted) events in the signal sample as in the >> >> background, even >> >> if in the real data, your signal sample is typically much smaller >> >> than the background. >> >> This is, because in the extreme case of a very rare signal, the >> >> simplest classifier which >> >> just 'call everything background' already has a very good 'overall' >> >> perfromance, as it is >> >> correct in 'almost all cases'. (as most events are background). But of >> >> course, that classifier >> >> is not what you want. Therefore I suggested to weight the signal >> >> sample with an overall >> >> constant factor such that the total number of sum_of_weights for the >> >> signal sample is >> >> equal to the sum over background events. >> >> >> >> Cheers, >> >> >> >> Helge >> >> >> >> >> >> > >> >> > Thanks really a lot!!! >> >> > >> >> > Ben >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > On Mon, Feb 20, 2017 at 3:04 PM, Helge Voss <Hel...@ce...> >> >> > wrote: >> >> >> >> >> >> Hi Ben, >> >> >> >> >> >> Maybe I didn't understand you, as I don't see at all why you use a >> >> >> factor Xsec*eff*lumi. >> >> >> >> >> >> TMVA just concatenates the different background source files >> >> >> together >> >> >> without doing anything... >> >> >> hence you should apply the factor 1/lumi(sample_i) to each MC sample >> >> >> (i=1,2,3) to normalize the >> >> >> various samples to the same integrated luminosity. Doing this, TMVA >> >> >> sees a background sample that >> >> >> has the same distribution as it would be in the data. Then >> >> >> afterwards, >> >> >> you should use "NormMode=None" >> >> >> (NormMode takes care of how the total Signal is weighted w.r.t. the >> >> >> total background). And if you choose "None" >> >> >> here, again, TMVA does nothing and you can normalize easily your >> >> >> signal sample to the background sample, >> >> >> buy multiplying as signal weight >> >> >> "sum_over_background_weights/sum_over_signal_weights") >> >> >> Where here sum goes over the events and 'background weight' for >> >> >> example would be the weights you >> >> >> caculated above for the relative background weighting, multiplied >> >> >> with >> >> >> eventual event weights from the monte carlo. >> >> >> For the signal, it would be simply the 'event_weights' if the MC you >> >> >> used produces weighted events rather than >> >> >> 'just events' >> >> >> >> >> >> Cheers, >> >> >> >> >> >> Helge >> >> >> >> >> >> >> >> >> On 20 February 2017 at 15:19, Ben Smith <ben...@gm...> >> >> >> wrote: >> >> >> > Hello! >> >> >> > >> >> >> > I have a hopefully simple question regarding how weights are >> >> >> > passed >> >> >> > to >> >> >> > TMVA. >> >> >> > >> >> >> > I have one signal sample and 3 background samples that I want to >> >> >> > pass >> >> >> > to >> >> >> > TMVA. In ROOT, the background samples would be normalized in an >> >> >> > historgam h >> >> >> > as: >> >> >> > >> >> >> > h->Fill(var, weight); where weight = N_events/(h->Integral()) >> >> >> > >> >> >> > with N_events = Xsec*eff*lumi; >> >> >> > >> >> >> > var is a variable that will be used in TMVA, and N_events is the >> >> >> > number >> >> >> > of >> >> >> > events I want to normalize to. In case of my samples this number >> >> >> > depends >> >> >> > on >> >> >> > the cross-section (Xsec), on the efficiency of the sample (eff), >> >> >> > and >> >> >> > on >> >> >> > the >> >> >> > luminosity. Note that the weight I actually use depends on >> >> >> > h->integral, >> >> >> > because each sample has a different number of events and this must >> >> >> > be >> >> >> > taken >> >> >> > into account. >> >> >> > >> >> >> > I need to pass the correct weights to TMVA. The question is, >> >> >> > should I >> >> >> > pass >> >> >> > simply "Xsec*eff*lumi" (in other words, TMVA does an equivalent of >> >> >> > 1/h->integral by default) or should I pass actually >> >> >> > Xsec*eff*lumi/N_events_mc, where N_events_mc is the original >> >> >> > number >> >> >> > of >> >> >> > events in each sample? Note that passing N_events_mc is not really >> >> >> > ideal, as >> >> >> > there a few cuts involved. Alternatively, how would I do the >> >> >> > equivalent >> >> >> > of >> >> >> > h->Integral() at the TMVA level? >> >> >> > >> >> >> > Thanks a lot in advance for any help, and apologies if something >> >> >> > is >> >> >> > not >> >> >> > very >> >> >> > well explained or confusing! >> >> >> > >> >> >> > Ben >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > ------------------------------------------------------------------------------ >> >> >> > Check out the vibrant tech community on one of the world's most >> >> >> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot >> >> >> > _______________________________________________ >> >> >> > TMVA-users mailing list >> >> >> > TMV...@li... >> >> >> > https://lists.sourceforge.net/lists/listinfo/tmva-users >> >> >> > >> >> > >> >> > >> > >> > > > |
From: Helge V. <Hel...@ce...> - 2017-02-26 19:10:33
|
Hi, well, 30kB really doesn't sound terribly large at all, for this size I'm sure the problem must be something else, weird .. I don't have a good idea though. If you say you have used the same weights on a different branch (probably TMVA/ROOT vesion), maybe it's a compatibility problem. While we tried to make sure we are always 'backward compatible', it could of course be that some things might not be 'foward compabible' .. (so if it is trained on a later version, it might not necessarily run on an 'older' version..) Although I don't really remember big changes 'recently' on the MLP that would hint to that as a possible cause of your problem. Helge On 26 February 2017 at 19:38, <rx...@st...> wrote: > Hi Helge, > > > > It’s fairly large, 30KB compared to my other weight files. I’ve used these > weights before on a different branch of the framework I work on and I > haven’t seen this error before. The network is not very big: a single hidden > layer with only 30ish nodes. For now I’m using a similar one which has a > smaller xml file > > So from what I understand I have to change this hardcoded limit. In what > file can I find it and edit it? Or is there another way to solve the issue? > > > > Thank you, > > Dan > > > > From: Helge Voss > Sent: 26 February 2017 18:10 > To: rx...@st... > Cc: Christopher Jones; tmv...@li... > > > Subject: Re: [TMVA-users] Reader expression wrong name or order > > > > Hi Christopher, > > > > I remember this error appearing when the 'xml' file became too large. > > There's a hardcoded limit in ROOT (well, I'm not sure if > > it's still hardcoded) and it got increased at some point (from root > > 5.29 onwards) > > > > #if ROOT_VERSION_CODE >= ROOT_VERSION(5,29,0) > > void* doc = > > gTools().xmlengine().ParseFile(tfname,gTools().xmlenginebuffersize()); > > // the default buffer size in TXMLEngine::ParseFile is 100k. Starting > > with ROOT 5.29 one can set the buffer size, see: > > http://savannah.cern.ch/bugs/?78864. This might be necessary for large > > XML files > > #else > > void* doc = gTools().xmlengine().ParseFile(tfname); > > #endif > > > > haha..unforunatly, the 'xmlenginebuffersize()' still returns a > > 'hardcoded' limit: > > > > int xmlenginebuffersize() { return 10000000; } (in TMVA::Tools.h) > > > > so I guess you'll have to change and 'recompile' if that's the problem. > > > > Is your xml file particularly large? Are you training a huge neural network > ? ( > > > > Helge > > > > > > On 26 February 2017 at 13:48, <rx...@st...> wrote: > >> Hi all, > >> > >> > >> > >> The spaces were there just as examples, the actual names are identical and > >> contain no spaces. I also checked the encoding and it seems to be standard > >> Linux UTF-8. > >> > >> I found out that the previous error was caused by the fact that I was >> giving > >> the reader the .C file rather than the .xml. So that error disappeared but > >> now I have a new one: > >> > >> > >> > >> Error in <TXMLEngine::ParseFile>: XML syntax error at line 51 > >> > >> --- <FATAL> Tools : Trying to read non-existing >> attribute > >> 'Method' from xml node ' > >> > >> ***> abort program execution > >> > >> > >> > >> I’ve searched for the string ‘Method’ in my files (I thought maybe I was > >> doing something like BookMVA(“MLP Method”...) but that was not the case. >> How > >> can I fix this new error? > >> > >> > >> > >> Thank you, > >> > >> Dan > >> > >> > >> > >> From: Christopher Jones > >> Sent: 26 February 2017 10:02 > >> To: rx...@st... > >> Cc: tmv...@li... > >> Subject: Re: [TMVA-users] Reader expression wrong name or order > >> > >> > >> > >> Hi, > >> > >> > >> > >> This is a guess, but try removing the space from the MVA name… TMVA might > >> not be handling that well. > >> > >> > >> > >> Chris > >> > >> > >> > >> On 26 Feb 2017, at 1:50 am, rx...@st... wrote: > >> > >> > >> > >> Dear experts, > >> > >> > >> > >> I’ve been trying to write a class which among other things performs an MLP > >> MVA using an already existing weights file. > >> > >> I declare the reader in the header: > >> > >> > >> > >> TMVA::Reader *reader; > >> > >> > >> > >> In the constructor I do the following: > >> > >> > >> > >> reader = new TMVA::Reader(); > >> > >> reader->AddVariable("variable 1", &var1); > >> > >> ... > >> > >> reader->AddVariable("variable n", &varn); > >> > >> reader->BookMVA("MLP method", fWeights); > >> > >> > >> > >> The evaluation is then done in a separate method: > >> > >> > >> > >> return reader->EvaluateMVA("MLP method"); > >> > >> > >> > >> I’ve been very careful to keep the order in which I add the variables the > >> same as the one in which they feature in the weights file. However at > >> runtime I keep getting the following error: > >> > >> > >> > >> --- <FATAL> MLP : The expression declared to the >> Reader > >> needs to be checked (name or order are wrong) > >> > >> ***> abort program execution > >> > >> > >> > >> Is there anything that I am missing? I am using ROOT version 6.06.02 and > >> C++11. > >> > >> > >> > >> Kind Regards, > >> > >> Dan > >> > >> > >> > >> > >> > >> >> ------------------------------------------------------------------------------ > >> Check out the vibrant tech community on one of the world's most > >> engaging tech sites, SlashDot.org! > >> http://sdm.link/slashdot_______________________________________________ > >> TMVA-users mailing list > >> TMV...@li... > >> https://lists.sourceforge.net/lists/listinfo/tmva-users > >> > >> > >> > >> > >> > >> > >> >> ------------------------------------------------------------------------------ > >> Check out the vibrant tech community on one of the world's most > >> engaging tech sites, SlashDot.org! http://sdm.link/slashdot > >> _______________________________________________ > >> TMVA-users mailing list > >> TMV...@li... > >> https://lists.sourceforge.net/lists/listinfo/tmva-users > >> > > |
From: <rx...@st...> - 2017-02-26 18:39:10
|
Hi Helge, It’s fairly large, 30KB compared to my other weight files. I’ve used these weights before on a different branch of the framework I work on and I haven’t seen this error before. The network is not very big: a single hidden layer with only 30ish nodes. For now I’m using a similar one which has a smaller xml file So from what I understand I have to change this hardcoded limit. In what file can I find it and edit it? Or is there another way to solve the issue? Thank you, Dan From: Helge Voss Sent: 26 February 2017 18:10 To: rx...@st... Cc: Christopher Jones; tmv...@li... Subject: Re: [TMVA-users] Reader expression wrong name or order Hi Christopher, I remember this error appearing when the 'xml' file became too large. There's a hardcoded limit in ROOT (well, I'm not sure if it's still hardcoded) and it got increased at some point (from root 5.29 onwards) #if ROOT_VERSION_CODE >= ROOT_VERSION(5,29,0) void* doc = gTools().xmlengine().ParseFile(tfname,gTools().xmlenginebuffersize()); // the default buffer size in TXMLEngine::ParseFile is 100k. Starting with ROOT 5.29 one can set the buffer size, see: http://savannah.cern.ch/bugs/?78864. This might be necessary for large XML files #else void* doc = gTools().xmlengine().ParseFile(tfname); #endif haha..unforunatly, the 'xmlenginebuffersize()' still returns a 'hardcoded' limit: int xmlenginebuffersize() { return 10000000; } (in TMVA::Tools.h) so I guess you'll have to change and 'recompile' if that's the problem. Is your xml file particularly large? Are you training a huge neural network ? ( Helge On 26 February 2017 at 13:48, <rx...@st...> wrote: > Hi all, > > > > The spaces were there just as examples, the actual names are identical and > contain no spaces. I also checked the encoding and it seems to be standard > Linux UTF-8. > > I found out that the previous error was caused by the fact that I was giving > the reader the .C file rather than the .xml. So that error disappeared but > now I have a new one: > > > > Error in <TXMLEngine::ParseFile>: XML syntax error at line 51 > > --- <FATAL> Tools : Trying to read non-existing attribute > 'Method' from xml node ' > > ***> abort program execution > > > > I’ve searched for the string ‘Method’ in my files (I thought maybe I was > doing something like BookMVA(“MLP Method”...) but that was not the case. How > can I fix this new error? > > > > Thank you, > > Dan > > > > From: Christopher Jones > Sent: 26 February 2017 10:02 > To: rx...@st... > Cc: tmv...@li... > Subject: Re: [TMVA-users] Reader expression wrong name or order > > > > Hi, > > > > This is a guess, but try removing the space from the MVA name… TMVA might > not be handling that well. > > > > Chris > > > > On 26 Feb 2017, at 1:50 am, rx...@st... wrote: > > > > Dear experts, > > > > I’ve been trying to write a class which among other things performs an MLP > MVA using an already existing weights file. > > I declare the reader in the header: > > > > TMVA::Reader *reader; > > > > In the constructor I do the following: > > > > reader = new TMVA::Reader(); > > reader->AddVariable("variable 1", &var1); > > ... > > reader->AddVariable("variable n", &varn); > > reader->BookMVA("MLP method", fWeights); > > > > The evaluation is then done in a separate method: > > > > return reader->EvaluateMVA("MLP method"); > > > > I’ve been very careful to keep the order in which I add the variables the > same as the one in which they feature in the weights file. However at > runtime I keep getting the following error: > > > > --- <FATAL> MLP : The expression declared to the Reader > needs to be checked (name or order are wrong) > > ***> abort program execution > > > > Is there anything that I am missing? I am using ROOT version 6.06.02 and > C++11. > > > > Kind Regards, > > Dan > > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! > http://sdm.link/slashdot_______________________________________________ > TMVA-users mailing list > TMV...@li... > https://lists.sourceforge.net/lists/listinfo/tmva-users > > > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > TMVA-users mailing list > TMV...@li... > https://lists.sourceforge.net/lists/listinfo/tmva-users > |
From: Helge V. <Hel...@ce...> - 2017-02-26 18:10:46
|
Hi Christopher, I remember this error appearing when the 'xml' file became too large. There's a hardcoded limit in ROOT (well, I'm not sure if it's still hardcoded) and it got increased at some point (from root 5.29 onwards) #if ROOT_VERSION_CODE >= ROOT_VERSION(5,29,0) void* doc = gTools().xmlengine().ParseFile(tfname,gTools().xmlenginebuffersize()); // the default buffer size in TXMLEngine::ParseFile is 100k. Starting with ROOT 5.29 one can set the buffer size, see: http://savannah.cern.ch/bugs/?78864. This might be necessary for large XML files #else void* doc = gTools().xmlengine().ParseFile(tfname); #endif haha..unforunatly, the 'xmlenginebuffersize()' still returns a 'hardcoded' limit: int xmlenginebuffersize() { return 10000000; } (in TMVA::Tools.h) so I guess you'll have to change and 'recompile' if that's the problem. Is your xml file particularly large? Are you training a huge neural network ? ( Helge On 26 February 2017 at 13:48, <rx...@st...> wrote: > Hi all, > > > > The spaces were there just as examples, the actual names are identical and > contain no spaces. I also checked the encoding and it seems to be standard > Linux UTF-8. > > I found out that the previous error was caused by the fact that I was giving > the reader the .C file rather than the .xml. So that error disappeared but > now I have a new one: > > > > Error in <TXMLEngine::ParseFile>: XML syntax error at line 51 > > --- <FATAL> Tools : Trying to read non-existing attribute > 'Method' from xml node ' > > ***> abort program execution > > > > I’ve searched for the string ‘Method’ in my files (I thought maybe I was > doing something like BookMVA(“MLP Method”...) but that was not the case. How > can I fix this new error? > > > > Thank you, > > Dan > > > > From: Christopher Jones > Sent: 26 February 2017 10:02 > To: rx...@st... > Cc: tmv...@li... > Subject: Re: [TMVA-users] Reader expression wrong name or order > > > > Hi, > > > > This is a guess, but try removing the space from the MVA name… TMVA might > not be handling that well. > > > > Chris > > > > On 26 Feb 2017, at 1:50 am, rx...@st... wrote: > > > > Dear experts, > > > > I’ve been trying to write a class which among other things performs an MLP > MVA using an already existing weights file. > > I declare the reader in the header: > > > > TMVA::Reader *reader; > > > > In the constructor I do the following: > > > > reader = new TMVA::Reader(); > > reader->AddVariable("variable 1", &var1); > > ... > > reader->AddVariable("variable n", &varn); > > reader->BookMVA("MLP method", fWeights); > > > > The evaluation is then done in a separate method: > > > > return reader->EvaluateMVA("MLP method"); > > > > I’ve been very careful to keep the order in which I add the variables the > same as the one in which they feature in the weights file. However at > runtime I keep getting the following error: > > > > --- <FATAL> MLP : The expression declared to the Reader > needs to be checked (name or order are wrong) > > ***> abort program execution > > > > Is there anything that I am missing? I am using ROOT version 6.06.02 and > C++11. > > > > Kind Regards, > > Dan > > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! > http://sdm.link/slashdot_______________________________________________ > TMVA-users mailing list > TMV...@li... > https://lists.sourceforge.net/lists/listinfo/tmva-users > > > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > TMVA-users mailing list > TMV...@li... > https://lists.sourceforge.net/lists/listinfo/tmva-users > |
From: <rx...@st...> - 2017-02-26 12:48:18
|
Hi all, The spaces were there just as examples, the actual names are identical and contain no spaces. I also checked the encoding and it seems to be standard Linux UTF-8. I found out that the previous error was caused by the fact that I was giving the reader the .C file rather than the .xml. So that error disappeared but now I have a new one: Error in <TXMLEngine::ParseFile>: XML syntax error at line 51 --- <FATAL> Tools : Trying to read non-existing attribute 'Method' from xml node ' ***> abort program execution I’ve searched for the string ‘Method’ in my files (I thought maybe I was doing something like BookMVA(“MLP Method”...) but that was not the case. How can I fix this new error? Thank you, Dan From: Christopher Jones Sent: 26 February 2017 10:02 To: rx...@st... Cc: tmv...@li... Subject: Re: [TMVA-users] Reader expression wrong name or order Hi, This is a guess, but try removing the space from the MVA name… TMVA might not be handling that well. Chris On 26 Feb 2017, at 1:50 am, rx...@st... wrote: Dear experts, I’ve been trying to write a class which among other things performs an MLP MVA using an already existing weights file. I declare the reader in the header: TMVA::Reader *reader; In the constructor I do the following: reader = new TMVA::Reader(); reader->AddVariable("variable 1", &var1); ... reader->AddVariable("variable n", &varn); reader->BookMVA("MLP method", fWeights); The evaluation is then done in a separate method: return reader->EvaluateMVA("MLP method"); I’ve been very careful to keep the order in which I add the variables the same as the one in which they feature in the weights file. However at runtime I keep getting the following error: --- <FATAL> MLP : The expression declared to the Reader needs to be checked (name or order are wrong) ***> abort program execution Is there anything that I am missing? I am using ROOT version 6.06.02 and C++11. Kind Regards, Dan ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot_______________________________________________ TMVA-users mailing list TMV...@li... https://lists.sourceforge.net/lists/listinfo/tmva-users |
From: Wolf B. <wol...@gm...> - 2017-02-26 12:42:18
|
Hi there, in TMVA you can train any valid expression (TTreeFormula). For example, you could do dataloader->AddVariable("sin(a) + 2*(1 - exp(object.something))"); I am a bit surprised that you were able to train a name like "variable 1". Doesn't look like a valid formula to me. On names: TMVA does some name mangling using the gTools->ReplaceRegularExpressions function. For example, it replaces + with _P_ and spaces with underscores. To solve your problem: * do what Chris said: don't use spaces in names. * double check that your names passed to the reader match exactly the names in the weight file Suggestions for TMVA: * Let's make TMVA output a better error message. Instead of "(name or order are wrong)" let's make it output the both the expected names and the names given by the user. Makes it much easier to find the problem. I think I could provide a patch for this. * recently I ran into a problem with a long formula myself. On "AddVariable" was a formula with "branch1.leaf1 + ...". Hoewever, the "." is left alone by the ReplaceRegularExpressions function. In the output tree, I then got a branch containing a "." in its name which leads to a lot of problems (e.g. tree->FindBranch("br.something") does not work)... So shouldn't ReplaceRegularExpressions also replace the "."? Or should it be even stronger and replace everything except for [A-Za-z0-9_] ? Wolf Am Sonntag, 26. Februar 2017, 10:02:49 CET schrieb Christopher Jones: > Hi, > > This is a guess, but try removing the space from the MVA name… TMVA might > not be handling that well. > > Chris > > > On 26 Feb 2017, at 1:50 am, rx...@st... wrote: > > > > Dear experts, > > > > I’ve been trying to write a class which among other things performs an MLP > > MVA using an already existing weights file.> > > I declare the reader in the header: > > TMVA::Reader *reader; > > > > In the constructor I do the following: > > reader = new TMVA::Reader(); > > reader->AddVariable("variable 1", &var1); > > ... > > reader->AddVariable("variable n", &varn); > > reader->BookMVA("MLP method", fWeights); > > > > The evaluation is then done in a separate method: > > return reader->EvaluateMVA("MLP method"); > > > > I’ve been very careful to keep the order in which I add the variables the > > same as the one in which they feature in the weights file. However at > > runtime I keep getting the following error: > > > > --- <FATAL> MLP : The expression declared to the > > Reader needs to be checked (name or order are wrong) ***> abort program > > execution > > > > Is there anything that I am missing? I am using ROOT version 6.06.02 and > > C++11. > > > > Kind Regards, > > Dan > > > > > > -------------------------------------------------------------------------- > > ---- Check out the vibrant tech community on one of the world's most > > engaging tech sites, SlashDot.org <http://slashdot.org/>! > > http://sdm.link/slashdot_______________________________________________ > > <http://sdm.link/slashdot_______________________________________________> > > TMVA-users mailing list > > TMV...@li... <mailto:TMV...@li...> > > https://lists.sourceforge.net/lists/listinfo/tmva-users > > <https://lists.sourceforge.net/lists/listinfo/tmva-users> |
From: Christopher J. <jo...@he...> - 2017-02-26 10:19:35
|
Hi, This is a guess, but try removing the space from the MVA name… TMVA might not be handling that well. Chris > On 26 Feb 2017, at 1:50 am, rx...@st... wrote: > > Dear experts, > > I’ve been trying to write a class which among other things performs an MLP MVA using an already existing weights file. > I declare the reader in the header: > > TMVA::Reader *reader; > > In the constructor I do the following: > > reader = new TMVA::Reader(); > reader->AddVariable("variable 1", &var1); > ... > reader->AddVariable("variable n", &varn); > reader->BookMVA("MLP method", fWeights); > > The evaluation is then done in a separate method: > > return reader->EvaluateMVA("MLP method"); > > I’ve been very careful to keep the order in which I add the variables the same as the one in which they feature in the weights file. However at runtime I keep getting the following error: > > --- <FATAL> MLP : The expression declared to the Reader needs to be checked (name or order are wrong) > ***> abort program execution > > Is there anything that I am missing? I am using ROOT version 6.06.02 and C++11. > > Kind Regards, > Dan > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org <http://slashdot.org/>! http://sdm.link/slashdot_______________________________________________ <http://sdm.link/slashdot_______________________________________________> > TMVA-users mailing list > TMV...@li... <mailto:TMV...@li...> > https://lists.sourceforge.net/lists/listinfo/tmva-users <https://lists.sourceforge.net/lists/listinfo/tmva-users> |
From: <rx...@st...> - 2017-02-26 02:15:25
|
Dear experts, I’ve been trying to write a class which among other things performs an MLP MVA using an already existing weights file. I declare the reader in the header: TMVA::Reader *reader; In the constructor I do the following: reader = new TMVA::Reader(); reader->AddVariable("variable 1", &var1); ... reader->AddVariable("variable n", &varn); reader->BookMVA("MLP method", fWeights); The evaluation is then done in a separate method: return reader->EvaluateMVA("MLP method"); I’ve been very careful to keep the order in which I add the variables the same as the one in which they feature in the weights file. However at runtime I keep getting the following error: --- <FATAL> MLP : The expression declared to the Reader needs to be checked (name or order are wrong) ***> abort program execution Is there anything that I am missing? I am using ROOT version 6.06.02 and C++11. Kind Regards, Dan |
From: kailey s. <kai...@b2...> - 2017-02-23 18:28:07
|
Hi, Found your company listed as Sponsors and Exhibitors in Cloud Expo Europe Attendees list and thought you would Be Interested in Attendees Contact information for your ROI. List Contains: Name, Company's Name, Phone Number, Fax Number, Job Title, Email address, Complete Mailing Address, SIC code, Company revenue, size, Web address etc. We offer: Complete list with Email address in an Excel Sheet for unlimited usage. Do an email blast endorsing your product/services and providing your contact information. Email appending, multiple contacts appending, Data appending which will append or add the missing information to your existing database. Let me know your thoughts or pass on the message to the right person in your company. Thanks & regards, Kailey Spencer |
From: Helge V. <Hel...@ce...> - 2017-02-23 09:33:18
|
Hi, well in MethodBase you actually find two version: MethodBase.h: virtual Double_t GetROCIntegral(TH1D *histS, TH1D *histB) const; MethodBase.h: virtual Double_t GetROCIntegral(PDF *pdfS=0, PDF *pdfB=0) const; .. where the method that uses histograms simply makes pdfs (splines) out of the and then does exactly the same as the other method that uses pdfs as input. (note: the transformation into splines are ment to smooth out the histogram and to somewhat eliminate the effect of different binnings. That works well AS LONG AS the binning is reasonable (i.e. the histogram reasonably smooth to start with) and otherwise fails miserably. So be careful. ( I think the ROC curves in TMVA are now calculated from the 'events' rather than the histograms in order not to have to worry about binning anymore) Cheers, Helge On 23 February 2017 at 05:01, Shreya Saha <shr...@ce...> wrote: > Hello, > I am trying to calculate the ROC integral of two histogram distributions > using - > virtual Double_t > <https://root.cern.ch/root/html526/ListOfTypes.html#Double_t> > GetROCIntegral > <https://root.cern.ch/root/html526/TMVA__MethodBase.html#TMVA__MethodBase:GetROCIntegral> > (TMVA <https://root.cern.ch/root/html526/TMVA.html>::PDF > <https://root.cern.ch/root/html526/TMVA__PDF.html>* pdfS = 0, TMVA > <https://root.cern.ch/root/html526/TMVA.html>::PDF > <https://root.cern.ch/root/html526/TMVA__PDF.html>* pdfB = 0) const I am > not sure if I can apply it to histograms as I do not see TH1* in the > argument. Is there any way I can apply such a Figure of Merit for > histograms ? > > Thank you for your help ! > > Best, > Shreya > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > TMVA-users mailing list > TMV...@li... > https://lists.sourceforge.net/lists/listinfo/tmva-users > > |
From: Shreya S. <shr...@ce...> - 2017-02-23 03:01:51
|
Hello, I am trying to calculate the ROC integral of two histogram distributions using - virtual Double_t<https://root.cern.ch/root/html526/ListOfTypes.html#Double_t> GetROCIntegral<https://root.cern.ch/root/html526/TMVA__MethodBase.html#TMVA__MethodBase:GetROCIntegral>(TMVA<https://root.cern.ch/root/html526/TMVA.html>::PDF<https://root.cern.ch/root/html526/TMVA__PDF.html>* pdfS = 0, TMVA<https://root.cern.ch/root/html526/TMVA.html>::PDF<https://root.cern.ch/root/html526/TMVA__PDF.html>* pdfB = 0) const I am not sure if I can apply it to histograms as I do not see TH1* in the argument. Is there any way I can apply such a Figure of Merit for histograms ? Thank you for your help ! Best, Shreya |
From: Ben S. <ben...@gm...> - 2017-02-22 10:10:08
|
Hi Helge, Hi Helge, Thanks a lot for taking the time to explain my confusions... I discovered that there is a "NormMode=EqualNumEvents" which should do what you proposed automatically, unless I misunderstood it. You say the signal should be normalized to the "explicit number of events" .. or if you you have weighted the events, the 'sum_of_event_weights' of you total background sample" (I got it why now) To normalize to the same number of events as the background, I wonder what is the best way to do this. Would it ork if I use "NormMode=EqualNumEvents"? Or is it the case that if I use the background weights as we discussed, I should than have "NormMode=None", necessarily? Because if that is the case, than I am not sure what to enter for Double_t signalWeight =?? , so that this is acomplished (I don't know the cross section of the signal in this particular case). In your first message I re-read that you said that this signal_weight should be "sum_over_background_weights/sum_over_signal_weights", so what was missing from "signalWeight= Xsec1*eff1/Nevents_mc1 + Xsec1*eff2/Nevents_mc2" is the 1/(sum_over_signal_weights) but this I don't know (there are no event weights for the sinal, as far as I understand, and probably no way to check). "so .. n_events should be the 'number of events' that corresponds to the 'eff' that you put in there." I'm thinking than I will just use Xsec/N_events_final, which should be equivalent to having the correct final efficiency. The thing is that the efficiency I quoted takes into account only the generator level cuts, not analysis cuts. I have the "integrated luminosity of the sample" but this is before applying any analysis cuts - I thought I could use this, but what you say does not corroborate that. So, if I take the actual number of events I see in the tuple I pass to TMVA and the cross section, this should be enough. Thanks!!! Ben On Tue, Feb 21, 2017 at 8:57 PM, Helge Voss <Hel...@ce...> wrote: > Hi Ben, > > There's still some misunderstanding, I'll try to explain below > > > > > > // global event weights per tree (see below for setting event-wise > > weights) > > Double_t backgroundWeight1 = Xsec1*eff1/Nevents_mc1; > > Double_t backgroundWeight2 = Xsec1*eff2/Nevents_mc2; > > Double_t signalWeight = Xsec1*eff1/Nevents_mc1 + > > Xsec1*eff2/Nevents_mc2; > > For the 'signal' scaling, you really don't care about 'lumi' (when I say > lumi, > I mean 'integrated luminosity' obviously) of your background > monte carlo events, but the "explicit number of events" .. or if you you > have > weighted the events, the 'sum_of_event_weights' of you total background > sample > > As I tried to explain in the previous mail, the signal sample should > NOT be normalized > to the same lumi as the background, but the the same "number of > events". And typically > for "signal" that is a much much larger lumi than for background. > (maybe if you read the > previous mail again, you understand "WHY" I said this should be the case) > > > > > > // You can add an arbitrary number of signal or background trees > > dataloader->AddBackgroundTree( background1, backgroundWeightSample1 ); > > dataloader->AddBackgroundTree( background2, backgroundWeightSample2 ); > > dataloader->AddSignalTree ( signal, signalWeight ); > > > > I confess I would have thought taking the largest 1/Lumi for the > background > > would have been enough. Say I collect 10 fb-1 of background1 and 20 fb-1 > of > > background2 simultaneously. I would expect not to be able to collect more > > than 20 fb-1 of signal than. But I guess you're being very conservative > to > > be on the safe side. > > So.. after what I wrote above, it should now hopefully be clear that > this is also > wrong. I wasn't 'conservative' or 'on the safe side' by taking the sum > of their integrated > luminosities, I simply meant a different 'scaling', based on actual > number of events (sum of > event weightes) in the respecteive signal and background sample. > > > > > I have one very last question, if you would not mind... I suddenly > realize > > I don't know how exactly to take Nevent_mc. When the samples are prepared > > and are available to use they have a certain number of events. But when I > > prepare the tuple to pass to TMVA, a few cuts are applied and I have less > > events. Which one does TMVA want? > > Again, TMVA want's nothing ;) You WANT to give it a background sample > that is as close > to that which you have in the data (i.e. the event distributions that > TMVA sees and > tries to discriminate you signal against, should be as similar as > possible to what the > trained classifier will be exposed to when it is in the end applied to > your data. Hence > you can always use this in order to determine how you want to 'scale' > your various event > samples. That's why I said: "scale your different background samples > such that they all > > xsec * eff / n_events = 1/(integrated lumi) > > so .. n_events should be the 'number of events' that corresponds to > the 'eff' that you put > in there. So if you have some cuts, eff should take into account those > cuts AND of course > all cuts that your event generator might have applied etc.. > > Cheers, > > Helge > > > > > > Many thanks, > > > > Ben > > > > On Mon, Feb 20, 2017 at 5:09 PM, Helge Voss <Hel...@ce...> wrote: > >> > >> Hi Ben, > >> > >> > normalize to 1/lumi(sample_i) than my impression that I should pass > the > >> > number of events of each sample as well was correct. For my samples I > >> > would > >> > have > >> > > >> > lumi(sample_i) = N_events_mc/Xsec*eff > >> > > >> > So 1/Lumi_sample_i = Xsec*eff/Nevents_mc > >> > > >> > the "lumi" I was using was just a global constant that would not > change > >> > the > >> > normalization between the samples so it can be omitted (like, I could > >> > multiply the lumi of all background samples by 10 and this should not > >> > make a > >> > difference, as far as I understand). > >> > >> Yes exactly so far! > >> > >> > > >> > In order to pass this, I understand I should do: > >> > > >> > factory->SetBackgroundWeightExpression( "weight_bkg" ); > >> > > >> > and have the variable "weight_bkg = 1/lumi(sample_i)" (read directly > >> > from > >> > the ntuple) > >> > >> as this weight would be the same for 'every' event in a particular > >> sample, rather > >> than haveing to write this into the N-tuple, you can much easier use: > >> > >> // global event weights per tree (see below for setting event-wise > >> weights) > >> Double_t backgroundWeightSample1 = <theNumberYouCalculatedForSampl > e1>; > >> Double_t backgroundWeightSample2 = <theNumberYouCalculatedForSampl > e2>; > >> etc.. > >> > >> dataloader->AddBackgroundTree( background1, backgroundWeightSample1 > ); > >> dataloader->AddBackgroundTree( background2, backgroundWeightSample2 > ); > >> > >> (or 'factory" instead of "dataloader" for older root/tmva versions, > >> like root 5.xx) > >> > >> the "SetBackgroundWeightExpression" is meant if your monte carlo > generator > >> creates event weights rather than 'events', or if you train using > >> 'sWeights' for > >> example, where each event gets a particular weight in order to end up > with > >> the > >> correct 'average distribution' of events. > >> > >> > For the signal, I'm not sure I get what you said... should I not > simply > >> > have: > >> > > >> > factory->SetSignalWeightExpression("weight_sg" ); and have weight_sg > = > >> > 1? > >> > >> No, as obviously that makes 'nothing' :) > >> > >> > > >> > I have only one signal sample and several background, not several > >> > signal. Or > >> > are you saying > >> > > >> > weight_sg = total sum of 1/lumi(sample_i)? (why?) > >> > >> This is also not 'recommended', but in general it is the best > >> 'default' to have the > >> same number of (weighted) events in the signal sample as in the > >> background, even > >> if in the real data, your signal sample is typically much smaller > >> than the background. > >> This is, because in the extreme case of a very rare signal, the > >> simplest classifier which > >> just 'call everything background' already has a very good 'overall' > >> perfromance, as it is > >> correct in 'almost all cases'. (as most events are background). But of > >> course, that classifier > >> is not what you want. Therefore I suggested to weight the signal > >> sample with an overall > >> constant factor such that the total number of sum_of_weights for the > >> signal sample is > >> equal to the sum over background events. > >> > >> Cheers, > >> > >> Helge > >> > >> > >> > > >> > Thanks really a lot!!! > >> > > >> > Ben > >> > > >> > > >> > > >> > > >> > > >> > On Mon, Feb 20, 2017 at 3:04 PM, Helge Voss <Hel...@ce...> > wrote: > >> >> > >> >> Hi Ben, > >> >> > >> >> Maybe I didn't understand you, as I don't see at all why you use a > >> >> factor Xsec*eff*lumi. > >> >> > >> >> TMVA just concatenates the different background source files together > >> >> without doing anything... > >> >> hence you should apply the factor 1/lumi(sample_i) to each MC sample > >> >> (i=1,2,3) to normalize the > >> >> various samples to the same integrated luminosity. Doing this, TMVA > >> >> sees a background sample that > >> >> has the same distribution as it would be in the data. Then > afterwards, > >> >> you should use "NormMode=None" > >> >> (NormMode takes care of how the total Signal is weighted w.r.t. the > >> >> total background). And if you choose "None" > >> >> here, again, TMVA does nothing and you can normalize easily your > >> >> signal sample to the background sample, > >> >> buy multiplying as signal weight > >> >> "sum_over_background_weights/sum_over_signal_weights") > >> >> Where here sum goes over the events and 'background weight' for > >> >> example would be the weights you > >> >> caculated above for the relative background weighting, multiplied > with > >> >> eventual event weights from the monte carlo. > >> >> For the signal, it would be simply the 'event_weights' if the MC you > >> >> used produces weighted events rather than > >> >> 'just events' > >> >> > >> >> Cheers, > >> >> > >> >> Helge > >> >> > >> >> > >> >> On 20 February 2017 at 15:19, Ben Smith <ben...@gm...> > wrote: > >> >> > Hello! > >> >> > > >> >> > I have a hopefully simple question regarding how weights are passed > >> >> > to > >> >> > TMVA. > >> >> > > >> >> > I have one signal sample and 3 background samples that I want to > pass > >> >> > to > >> >> > TMVA. In ROOT, the background samples would be normalized in an > >> >> > historgam h > >> >> > as: > >> >> > > >> >> > h->Fill(var, weight); where weight = N_events/(h->Integral()) > >> >> > > >> >> > with N_events = Xsec*eff*lumi; > >> >> > > >> >> > var is a variable that will be used in TMVA, and N_events is the > >> >> > number > >> >> > of > >> >> > events I want to normalize to. In case of my samples this number > >> >> > depends > >> >> > on > >> >> > the cross-section (Xsec), on the efficiency of the sample (eff), > and > >> >> > on > >> >> > the > >> >> > luminosity. Note that the weight I actually use depends on > >> >> > h->integral, > >> >> > because each sample has a different number of events and this must > be > >> >> > taken > >> >> > into account. > >> >> > > >> >> > I need to pass the correct weights to TMVA. The question is, > should I > >> >> > pass > >> >> > simply "Xsec*eff*lumi" (in other words, TMVA does an equivalent of > >> >> > 1/h->integral by default) or should I pass actually > >> >> > Xsec*eff*lumi/N_events_mc, where N_events_mc is the original number > >> >> > of > >> >> > events in each sample? Note that passing N_events_mc is not really > >> >> > ideal, as > >> >> > there a few cuts involved. Alternatively, how would I do the > >> >> > equivalent > >> >> > of > >> >> > h->Integral() at the TMVA level? > >> >> > > >> >> > Thanks a lot in advance for any help, and apologies if something is > >> >> > not > >> >> > very > >> >> > well explained or confusing! > >> >> > > >> >> > Ben > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > ------------------------------------------------------------ > ------------------ > >> >> > Check out the vibrant tech community on one of the world's most > >> >> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > >> >> > _______________________________________________ > >> >> > TMVA-users mailing list > >> >> > TMV...@li... > >> >> > https://lists.sourceforge.net/lists/listinfo/tmva-users > >> >> > > >> > > >> > > > > > > |
From: Helge V. <Hel...@ce...> - 2017-02-21 19:58:21
|
Hi Ben, There's still some misunderstanding, I'll try to explain below > > // global event weights per tree (see below for setting event-wise > weights) > Double_t backgroundWeight1 = Xsec1*eff1/Nevents_mc1; > Double_t backgroundWeight2 = Xsec1*eff2/Nevents_mc2; > Double_t signalWeight = Xsec1*eff1/Nevents_mc1 + > Xsec1*eff2/Nevents_mc2; For the 'signal' scaling, you really don't care about 'lumi' (when I say lumi, I mean 'integrated luminosity' obviously) of your background monte carlo events, but the "explicit number of events" .. or if you you have weighted the events, the 'sum_of_event_weights' of you total background sample As I tried to explain in the previous mail, the signal sample should NOT be normalized to the same lumi as the background, but the the same "number of events". And typically for "signal" that is a much much larger lumi than for background. (maybe if you read the previous mail again, you understand "WHY" I said this should be the case) > > // You can add an arbitrary number of signal or background trees > dataloader->AddBackgroundTree( background1, backgroundWeightSample1 ); > dataloader->AddBackgroundTree( background2, backgroundWeightSample2 ); > dataloader->AddSignalTree ( signal, signalWeight ); > > I confess I would have thought taking the largest 1/Lumi for the background > would have been enough. Say I collect 10 fb-1 of background1 and 20 fb-1 of > background2 simultaneously. I would expect not to be able to collect more > than 20 fb-1 of signal than. But I guess you're being very conservative to > be on the safe side. So.. after what I wrote above, it should now hopefully be clear that this is also wrong. I wasn't 'conservative' or 'on the safe side' by taking the sum of their integrated luminosities, I simply meant a different 'scaling', based on actual number of events (sum of event weightes) in the respecteive signal and background sample. > > I have one very last question, if you would not mind... I suddenly realize > I don't know how exactly to take Nevent_mc. When the samples are prepared > and are available to use they have a certain number of events. But when I > prepare the tuple to pass to TMVA, a few cuts are applied and I have less > events. Which one does TMVA want? Again, TMVA want's nothing ;) You WANT to give it a background sample that is as close to that which you have in the data (i.e. the event distributions that TMVA sees and tries to discriminate you signal against, should be as similar as possible to what the trained classifier will be exposed to when it is in the end applied to your data. Hence you can always use this in order to determine how you want to 'scale' your various event samples. That's why I said: "scale your different background samples such that they all xsec * eff / n_events = 1/(integrated lumi) so .. n_events should be the 'number of events' that corresponds to the 'eff' that you put in there. So if you have some cuts, eff should take into account those cuts AND of course all cuts that your event generator might have applied etc.. Cheers, Helge > > Many thanks, > > Ben > > On Mon, Feb 20, 2017 at 5:09 PM, Helge Voss <Hel...@ce...> wrote: >> >> Hi Ben, >> >> > normalize to 1/lumi(sample_i) than my impression that I should pass the >> > number of events of each sample as well was correct. For my samples I >> > would >> > have >> > >> > lumi(sample_i) = N_events_mc/Xsec*eff >> > >> > So 1/Lumi_sample_i = Xsec*eff/Nevents_mc >> > >> > the "lumi" I was using was just a global constant that would not change >> > the >> > normalization between the samples so it can be omitted (like, I could >> > multiply the lumi of all background samples by 10 and this should not >> > make a >> > difference, as far as I understand). >> >> Yes exactly so far! >> >> > >> > In order to pass this, I understand I should do: >> > >> > factory->SetBackgroundWeightExpression( "weight_bkg" ); >> > >> > and have the variable "weight_bkg = 1/lumi(sample_i)" (read directly >> > from >> > the ntuple) >> >> as this weight would be the same for 'every' event in a particular >> sample, rather >> than haveing to write this into the N-tuple, you can much easier use: >> >> // global event weights per tree (see below for setting event-wise >> weights) >> Double_t backgroundWeightSample1 = <theNumberYouCalculatedForSample1>; >> Double_t backgroundWeightSample2 = <theNumberYouCalculatedForSample2>; >> etc.. >> >> dataloader->AddBackgroundTree( background1, backgroundWeightSample1 ); >> dataloader->AddBackgroundTree( background2, backgroundWeightSample2 ); >> >> (or 'factory" instead of "dataloader" for older root/tmva versions, >> like root 5.xx) >> >> the "SetBackgroundWeightExpression" is meant if your monte carlo generator >> creates event weights rather than 'events', or if you train using >> 'sWeights' for >> example, where each event gets a particular weight in order to end up with >> the >> correct 'average distribution' of events. >> >> > For the signal, I'm not sure I get what you said... should I not simply >> > have: >> > >> > factory->SetSignalWeightExpression("weight_sg" ); and have weight_sg = >> > 1? >> >> No, as obviously that makes 'nothing' :) >> >> > >> > I have only one signal sample and several background, not several >> > signal. Or >> > are you saying >> > >> > weight_sg = total sum of 1/lumi(sample_i)? (why?) >> >> This is also not 'recommended', but in general it is the best >> 'default' to have the >> same number of (weighted) events in the signal sample as in the >> background, even >> if in the real data, your signal sample is typically much smaller >> than the background. >> This is, because in the extreme case of a very rare signal, the >> simplest classifier which >> just 'call everything background' already has a very good 'overall' >> perfromance, as it is >> correct in 'almost all cases'. (as most events are background). But of >> course, that classifier >> is not what you want. Therefore I suggested to weight the signal >> sample with an overall >> constant factor such that the total number of sum_of_weights for the >> signal sample is >> equal to the sum over background events. >> >> Cheers, >> >> Helge >> >> >> > >> > Thanks really a lot!!! >> > >> > Ben >> > >> > >> > >> > >> > >> > On Mon, Feb 20, 2017 at 3:04 PM, Helge Voss <Hel...@ce...> wrote: >> >> >> >> Hi Ben, >> >> >> >> Maybe I didn't understand you, as I don't see at all why you use a >> >> factor Xsec*eff*lumi. >> >> >> >> TMVA just concatenates the different background source files together >> >> without doing anything... >> >> hence you should apply the factor 1/lumi(sample_i) to each MC sample >> >> (i=1,2,3) to normalize the >> >> various samples to the same integrated luminosity. Doing this, TMVA >> >> sees a background sample that >> >> has the same distribution as it would be in the data. Then afterwards, >> >> you should use "NormMode=None" >> >> (NormMode takes care of how the total Signal is weighted w.r.t. the >> >> total background). And if you choose "None" >> >> here, again, TMVA does nothing and you can normalize easily your >> >> signal sample to the background sample, >> >> buy multiplying as signal weight >> >> "sum_over_background_weights/sum_over_signal_weights") >> >> Where here sum goes over the events and 'background weight' for >> >> example would be the weights you >> >> caculated above for the relative background weighting, multiplied with >> >> eventual event weights from the monte carlo. >> >> For the signal, it would be simply the 'event_weights' if the MC you >> >> used produces weighted events rather than >> >> 'just events' >> >> >> >> Cheers, >> >> >> >> Helge >> >> >> >> >> >> On 20 February 2017 at 15:19, Ben Smith <ben...@gm...> wrote: >> >> > Hello! >> >> > >> >> > I have a hopefully simple question regarding how weights are passed >> >> > to >> >> > TMVA. >> >> > >> >> > I have one signal sample and 3 background samples that I want to pass >> >> > to >> >> > TMVA. In ROOT, the background samples would be normalized in an >> >> > historgam h >> >> > as: >> >> > >> >> > h->Fill(var, weight); where weight = N_events/(h->Integral()) >> >> > >> >> > with N_events = Xsec*eff*lumi; >> >> > >> >> > var is a variable that will be used in TMVA, and N_events is the >> >> > number >> >> > of >> >> > events I want to normalize to. In case of my samples this number >> >> > depends >> >> > on >> >> > the cross-section (Xsec), on the efficiency of the sample (eff), and >> >> > on >> >> > the >> >> > luminosity. Note that the weight I actually use depends on >> >> > h->integral, >> >> > because each sample has a different number of events and this must be >> >> > taken >> >> > into account. >> >> > >> >> > I need to pass the correct weights to TMVA. The question is, should I >> >> > pass >> >> > simply "Xsec*eff*lumi" (in other words, TMVA does an equivalent of >> >> > 1/h->integral by default) or should I pass actually >> >> > Xsec*eff*lumi/N_events_mc, where N_events_mc is the original number >> >> > of >> >> > events in each sample? Note that passing N_events_mc is not really >> >> > ideal, as >> >> > there a few cuts involved. Alternatively, how would I do the >> >> > equivalent >> >> > of >> >> > h->Integral() at the TMVA level? >> >> > >> >> > Thanks a lot in advance for any help, and apologies if something is >> >> > not >> >> > very >> >> > well explained or confusing! >> >> > >> >> > Ben >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > ------------------------------------------------------------------------------ >> >> > Check out the vibrant tech community on one of the world's most >> >> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot >> >> > _______________________________________________ >> >> > TMVA-users mailing list >> >> > TMV...@li... >> >> > https://lists.sourceforge.net/lists/listinfo/tmva-users >> >> > >> > >> > > > |
From: Ben S. <ben...@gm...> - 2017-02-21 13:50:41
|
Dear Helge, Ah, OK, that makes good sense, thanks a lot! I wrote down everything you said in my macro, in a nutshell: TFile *input_signal = TFile::Open("signal.root" ); TFile *input_bkg1 = TFile::Open( "background1.root" ); TFile *input_bkg2 = TFile::Open( "background2.root" ); // --- Register the training and test trees TTree *signal = (TTree*)input_signal->Get("T_S"); TTree *background1 = (TTree*)input_bkg1->Get("T_B"); TTree *background2 = (TTree*)input_bkg2->Get("T_B"); // global event weights per tree (see below for setting event-wise weights) Double_t backgroundWeight1 = Xsec1*eff1/Nevents_mc1; Double_t backgroundWeight2 = Xsec1*eff2/Nevents_mc2; Double_t signalWeight = Xsec1*eff1/Nevents_mc1 + Xsec1*eff2/Nevents_mc2; // You can add an arbitrary number of signal or background trees dataloader->AddBackgroundTree( background1, backgroundWeightSample1 ); dataloader->AddBackgroundTree( background2, backgroundWeightSample2 ); dataloader->AddSignalTree ( signal, signalWeight ); I confess I would have thought taking the largest 1/Lumi for the background would have been enough. Say I collect 10 fb-1 of background1 and 20 fb-1 of background2 *simultaneously*. I would expect not to be able to collect more than 20 fb-1 of signal than. But I guess you're being very conservative to be on the safe side. I have one very last question, if you would not mind... I suddenly realize I don't know how exactly to take Nevent_mc. When the samples are prepared and are available to use they have a certain number of events. But when I prepare the tuple to pass to TMVA, a few cuts are applied and I have less events. Which one does TMVA want? Many thanks, Ben On Mon, Feb 20, 2017 at 5:09 PM, Helge Voss <Hel...@ce...> wrote: > Hi Ben, > > > normalize to 1/lumi(sample_i) than my impression that I should pass the > > number of events of each sample as well was correct. For my samples I > would > > have > > > > lumi(sample_i) = N_events_mc/Xsec*eff > > > > So 1/Lumi_sample_i = Xsec*eff/Nevents_mc > > > > the "lumi" I was using was just a global constant that would not change > the > > normalization between the samples so it can be omitted (like, I could > > multiply the lumi of all background samples by 10 and this should not > make a > > difference, as far as I understand). > > Yes exactly so far! > > > > > In order to pass this, I understand I should do: > > > > factory->SetBackgroundWeightExpression( "weight_bkg" ); > > > > and have the variable "weight_bkg = 1/lumi(sample_i)" (read directly from > > the ntuple) > > as this weight would be the same for 'every' event in a particular > sample, rather > than haveing to write this into the N-tuple, you can much easier use: > > // global event weights per tree (see below for setting event-wise > weights) > Double_t backgroundWeightSample1 = <theNumberYouCalculatedForSample1>; > Double_t backgroundWeightSample2 = <theNumberYouCalculatedForSample2>; > etc.. > > dataloader->AddBackgroundTree( background1, backgroundWeightSample1 ); > dataloader->AddBackgroundTree( background2, backgroundWeightSample2 ); > > (or 'factory" instead of "dataloader" for older root/tmva versions, > like root 5.xx) > > the "SetBackgroundWeightExpression" is meant if your monte carlo generator > creates event weights rather than 'events', or if you train using > 'sWeights' for > example, where each event gets a particular weight in order to end up with > the > correct 'average distribution' of events. > > > For the signal, I'm not sure I get what you said... should I not simply > > have: > > > > factory->SetSignalWeightExpression("weight_sg" ); and have weight_sg = > 1? > > No, as obviously that makes 'nothing' :) > > > > > I have only one signal sample and several background, not several > signal. Or > > are you saying > > > > weight_sg = total sum of 1/lumi(sample_i)? (why?) > > This is also not 'recommended', but in general it is the best > 'default' to have the > same number of (weighted) events in the signal sample as in the > background, even > if in the real data, your signal sample is typically much smaller > than the background. > This is, because in the extreme case of a very rare signal, the > simplest classifier which > just 'call everything background' already has a very good 'overall' > perfromance, as it is > correct in 'almost all cases'. (as most events are background). But of > course, that classifier > is not what you want. Therefore I suggested to weight the signal > sample with an overall > constant factor such that the total number of sum_of_weights for the > signal sample is > equal to the sum over background events. > > Cheers, > > Helge > > > > > > Thanks really a lot!!! > > > > Ben > > > > > > > > > > > > On Mon, Feb 20, 2017 at 3:04 PM, Helge Voss <Hel...@ce...> wrote: > >> > >> Hi Ben, > >> > >> Maybe I didn't understand you, as I don't see at all why you use a > >> factor Xsec*eff*lumi. > >> > >> TMVA just concatenates the different background source files together > >> without doing anything... > >> hence you should apply the factor 1/lumi(sample_i) to each MC sample > >> (i=1,2,3) to normalize the > >> various samples to the same integrated luminosity. Doing this, TMVA > >> sees a background sample that > >> has the same distribution as it would be in the data. Then afterwards, > >> you should use "NormMode=None" > >> (NormMode takes care of how the total Signal is weighted w.r.t. the > >> total background). And if you choose "None" > >> here, again, TMVA does nothing and you can normalize easily your > >> signal sample to the background sample, > >> buy multiplying as signal weight > >> "sum_over_background_weights/sum_over_signal_weights") > >> Where here sum goes over the events and 'background weight' for > >> example would be the weights you > >> caculated above for the relative background weighting, multiplied with > >> eventual event weights from the monte carlo. > >> For the signal, it would be simply the 'event_weights' if the MC you > >> used produces weighted events rather than > >> 'just events' > >> > >> Cheers, > >> > >> Helge > >> > >> > >> On 20 February 2017 at 15:19, Ben Smith <ben...@gm...> wrote: > >> > Hello! > >> > > >> > I have a hopefully simple question regarding how weights are passed to > >> > TMVA. > >> > > >> > I have one signal sample and 3 background samples that I want to pass > to > >> > TMVA. In ROOT, the background samples would be normalized in an > >> > historgam h > >> > as: > >> > > >> > h->Fill(var, weight); where weight = N_events/(h->Integral()) > >> > > >> > with N_events = Xsec*eff*lumi; > >> > > >> > var is a variable that will be used in TMVA, and N_events is the > number > >> > of > >> > events I want to normalize to. In case of my samples this number > depends > >> > on > >> > the cross-section (Xsec), on the efficiency of the sample (eff), and > on > >> > the > >> > luminosity. Note that the weight I actually use depends on > h->integral, > >> > because each sample has a different number of events and this must be > >> > taken > >> > into account. > >> > > >> > I need to pass the correct weights to TMVA. The question is, should I > >> > pass > >> > simply "Xsec*eff*lumi" (in other words, TMVA does an equivalent of > >> > 1/h->integral by default) or should I pass actually > >> > Xsec*eff*lumi/N_events_mc, where N_events_mc is the original number of > >> > events in each sample? Note that passing N_events_mc is not really > >> > ideal, as > >> > there a few cuts involved. Alternatively, how would I do the > equivalent > >> > of > >> > h->Integral() at the TMVA level? > >> > > >> > Thanks a lot in advance for any help, and apologies if something is > not > >> > very > >> > well explained or confusing! > >> > > >> > Ben > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > ------------------------------------------------------------ > ------------------ > >> > Check out the vibrant tech community on one of the world's most > >> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > >> > _______________________________________________ > >> > TMVA-users mailing list > >> > TMV...@li... > >> > https://lists.sourceforge.net/lists/listinfo/tmva-users > >> > > > > > > |
From: Helge V. <Hel...@ce...> - 2017-02-20 16:10:27
|
Hi Ben, > normalize to 1/lumi(sample_i) than my impression that I should pass the > number of events of each sample as well was correct. For my samples I would > have > > lumi(sample_i) = N_events_mc/Xsec*eff > > So 1/Lumi_sample_i = Xsec*eff/Nevents_mc > > the "lumi" I was using was just a global constant that would not change the > normalization between the samples so it can be omitted (like, I could > multiply the lumi of all background samples by 10 and this should not make a > difference, as far as I understand). Yes exactly so far! > > In order to pass this, I understand I should do: > > factory->SetBackgroundWeightExpression( "weight_bkg" ); > > and have the variable "weight_bkg = 1/lumi(sample_i)" (read directly from > the ntuple) as this weight would be the same for 'every' event in a particular sample, rather than haveing to write this into the N-tuple, you can much easier use: // global event weights per tree (see below for setting event-wise weights) Double_t backgroundWeightSample1 = <theNumberYouCalculatedForSample1>; Double_t backgroundWeightSample2 = <theNumberYouCalculatedForSample2>; etc.. dataloader->AddBackgroundTree( background1, backgroundWeightSample1 ); dataloader->AddBackgroundTree( background2, backgroundWeightSample2 ); (or 'factory" instead of "dataloader" for older root/tmva versions, like root 5.xx) the "SetBackgroundWeightExpression" is meant if your monte carlo generator creates event weights rather than 'events', or if you train using 'sWeights' for example, where each event gets a particular weight in order to end up with the correct 'average distribution' of events. > For the signal, I'm not sure I get what you said... should I not simply > have: > > factory->SetSignalWeightExpression("weight_sg" ); and have weight_sg = 1? No, as obviously that makes 'nothing' :) > > I have only one signal sample and several background, not several signal. Or > are you saying > > weight_sg = total sum of 1/lumi(sample_i)? (why?) This is also not 'recommended', but in general it is the best 'default' to have the same number of (weighted) events in the signal sample as in the background, even if in the real data, your signal sample is typically much smaller than the background. This is, because in the extreme case of a very rare signal, the simplest classifier which just 'call everything background' already has a very good 'overall' perfromance, as it is correct in 'almost all cases'. (as most events are background). But of course, that classifier is not what you want. Therefore I suggested to weight the signal sample with an overall constant factor such that the total number of sum_of_weights for the signal sample is equal to the sum over background events. Cheers, Helge > > Thanks really a lot!!! > > Ben > > > > > > On Mon, Feb 20, 2017 at 3:04 PM, Helge Voss <Hel...@ce...> wrote: >> >> Hi Ben, >> >> Maybe I didn't understand you, as I don't see at all why you use a >> factor Xsec*eff*lumi. >> >> TMVA just concatenates the different background source files together >> without doing anything... >> hence you should apply the factor 1/lumi(sample_i) to each MC sample >> (i=1,2,3) to normalize the >> various samples to the same integrated luminosity. Doing this, TMVA >> sees a background sample that >> has the same distribution as it would be in the data. Then afterwards, >> you should use "NormMode=None" >> (NormMode takes care of how the total Signal is weighted w.r.t. the >> total background). And if you choose "None" >> here, again, TMVA does nothing and you can normalize easily your >> signal sample to the background sample, >> buy multiplying as signal weight >> "sum_over_background_weights/sum_over_signal_weights") >> Where here sum goes over the events and 'background weight' for >> example would be the weights you >> caculated above for the relative background weighting, multiplied with >> eventual event weights from the monte carlo. >> For the signal, it would be simply the 'event_weights' if the MC you >> used produces weighted events rather than >> 'just events' >> >> Cheers, >> >> Helge >> >> >> On 20 February 2017 at 15:19, Ben Smith <ben...@gm...> wrote: >> > Hello! >> > >> > I have a hopefully simple question regarding how weights are passed to >> > TMVA. >> > >> > I have one signal sample and 3 background samples that I want to pass to >> > TMVA. In ROOT, the background samples would be normalized in an >> > historgam h >> > as: >> > >> > h->Fill(var, weight); where weight = N_events/(h->Integral()) >> > >> > with N_events = Xsec*eff*lumi; >> > >> > var is a variable that will be used in TMVA, and N_events is the number >> > of >> > events I want to normalize to. In case of my samples this number depends >> > on >> > the cross-section (Xsec), on the efficiency of the sample (eff), and on >> > the >> > luminosity. Note that the weight I actually use depends on h->integral, >> > because each sample has a different number of events and this must be >> > taken >> > into account. >> > >> > I need to pass the correct weights to TMVA. The question is, should I >> > pass >> > simply "Xsec*eff*lumi" (in other words, TMVA does an equivalent of >> > 1/h->integral by default) or should I pass actually >> > Xsec*eff*lumi/N_events_mc, where N_events_mc is the original number of >> > events in each sample? Note that passing N_events_mc is not really >> > ideal, as >> > there a few cuts involved. Alternatively, how would I do the equivalent >> > of >> > h->Integral() at the TMVA level? >> > >> > Thanks a lot in advance for any help, and apologies if something is not >> > very >> > well explained or confusing! >> > >> > Ben >> > >> > >> > >> > >> > >> > >> > >> > ------------------------------------------------------------------------------ >> > Check out the vibrant tech community on one of the world's most >> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot >> > _______________________________________________ >> > TMVA-users mailing list >> > TMV...@li... >> > https://lists.sourceforge.net/lists/listinfo/tmva-users >> > > > |
From: Ben S. <ben...@gm...> - 2017-02-20 14:34:43
|
Hi Helge, thanks a lot! I think I got it and you answered my question. If I should normalize to 1/lumi(sample_i) than my impression that I should pass the number of events of each sample as well was correct. For my samples I would have lumi(sample_i) = N_events_mc/Xsec*eff So 1/Lumi_sample_i = Xsec*eff/Nevents_mc the "lumi" I was using was just a global constant that would not change the normalization between the samples so it can be omitted (like, I could multiply the lumi of all background samples by 10 and this should not make a difference, as far as I understand). In order to pass this, I understand I should do: factory->SetBackgroundWeightExpression( "weight_bkg" ); and have the variable "weight_bkg = 1/lumi(sample_i)" (read directly from the ntuple) For the signal, I'm not sure I get what you said... should I not simply have: factory->SetSignalWeightExpression("weight_sg" ); and have weight_sg = 1? I have only one signal sample and several background, not several signal. Or are you saying weight_sg = total sum of 1/lumi(sample_i)? (why?) Thanks really a lot!!! Ben On Mon, Feb 20, 2017 at 3:04 PM, Helge Voss <Hel...@ce...> wrote: > Hi Ben, > > Maybe I didn't understand you, as I don't see at all why you use a > factor Xsec*eff*lumi. > > TMVA just concatenates the different background source files together > without doing anything... > hence you should apply the factor 1/lumi(sample_i) to each MC sample > (i=1,2,3) to normalize the > various samples to the same integrated luminosity. Doing this, TMVA > sees a background sample that > has the same distribution as it would be in the data. Then afterwards, > you should use "NormMode=None" > (NormMode takes care of how the total Signal is weighted w.r.t. the > total background). And if you choose "None" > here, again, TMVA does nothing and you can normalize easily your > signal sample to the background sample, > buy multiplying as signal weight > "sum_over_background_weights/sum_over_signal_weights") > Where here sum goes over the events and 'background weight' for > example would be the weights you > caculated above for the relative background weighting, multiplied with > eventual event weights from the monte carlo. > For the signal, it would be simply the 'event_weights' if the MC you > used produces weighted events rather than > 'just events' > > Cheers, > > Helge > > > On 20 February 2017 at 15:19, Ben Smith <ben...@gm...> wrote: > > Hello! > > > > I have a hopefully simple question regarding how weights are passed to > TMVA. > > > > I have one signal sample and 3 background samples that I want to pass to > > TMVA. In ROOT, the background samples would be normalized in an > historgam h > > as: > > > > h->Fill(var, weight); where weight = N_events/(h->Integral()) > > > > with N_events = Xsec*eff*lumi; > > > > var is a variable that will be used in TMVA, and N_events is the number > of > > events I want to normalize to. In case of my samples this number depends > on > > the cross-section (Xsec), on the efficiency of the sample (eff), and on > the > > luminosity. Note that the weight I actually use depends on h->integral, > > because each sample has a different number of events and this must be > taken > > into account. > > > > I need to pass the correct weights to TMVA. The question is, should I > pass > > simply "Xsec*eff*lumi" (in other words, TMVA does an equivalent of > > 1/h->integral by default) or should I pass actually > > Xsec*eff*lumi/N_events_mc, where N_events_mc is the original number of > > events in each sample? Note that passing N_events_mc is not really > ideal, as > > there a few cuts involved. Alternatively, how would I do the equivalent > of > > h->Integral() at the TMVA level? > > > > Thanks a lot in advance for any help, and apologies if something is not > very > > well explained or confusing! > > > > Ben > > > > > > > > > > > > > > ------------------------------------------------------------ > ------------------ > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > > _______________________________________________ > > TMVA-users mailing list > > TMV...@li... > > https://lists.sourceforge.net/lists/listinfo/tmva-users > > > |
From: Helge V. <Hel...@ce...> - 2017-02-20 14:05:40
|
Hi Ben, Maybe I didn't understand you, as I don't see at all why you use a factor Xsec*eff*lumi. TMVA just concatenates the different background source files together without doing anything... hence you should apply the factor 1/lumi(sample_i) to each MC sample (i=1,2,3) to normalize the various samples to the same integrated luminosity. Doing this, TMVA sees a background sample that has the same distribution as it would be in the data. Then afterwards, you should use "NormMode=None" (NormMode takes care of how the total Signal is weighted w.r.t. the total background). And if you choose "None" here, again, TMVA does nothing and you can normalize easily your signal sample to the background sample, buy multiplying as signal weight "sum_over_background_weights/sum_over_signal_weights") Where here sum goes over the events and 'background weight' for example would be the weights you caculated above for the relative background weighting, multiplied with eventual event weights from the monte carlo. For the signal, it would be simply the 'event_weights' if the MC you used produces weighted events rather than 'just events' Cheers, Helge On 20 February 2017 at 15:19, Ben Smith <ben...@gm...> wrote: > Hello! > > I have a hopefully simple question regarding how weights are passed to TMVA. > > I have one signal sample and 3 background samples that I want to pass to > TMVA. In ROOT, the background samples would be normalized in an historgam h > as: > > h->Fill(var, weight); where weight = N_events/(h->Integral()) > > with N_events = Xsec*eff*lumi; > > var is a variable that will be used in TMVA, and N_events is the number of > events I want to normalize to. In case of my samples this number depends on > the cross-section (Xsec), on the efficiency of the sample (eff), and on the > luminosity. Note that the weight I actually use depends on h->integral, > because each sample has a different number of events and this must be taken > into account. > > I need to pass the correct weights to TMVA. The question is, should I pass > simply "Xsec*eff*lumi" (in other words, TMVA does an equivalent of > 1/h->integral by default) or should I pass actually > Xsec*eff*lumi/N_events_mc, where N_events_mc is the original number of > events in each sample? Note that passing N_events_mc is not really ideal, as > there a few cuts involved. Alternatively, how would I do the equivalent of > h->Integral() at the TMVA level? > > Thanks a lot in advance for any help, and apologies if something is not very > well explained or confusing! > > Ben > > > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > TMVA-users mailing list > TMV...@li... > https://lists.sourceforge.net/lists/listinfo/tmva-users > |
From: Ben S. <ben...@gm...> - 2017-02-20 13:19:32
|
Hello! I have a hopefully simple question regarding how weights are passed to TMVA. I have one signal sample and 3 background samples that I want to pass to TMVA. In ROOT, the background samples would be normalized in an historgam h as: h->Fill(var, weight); where weight = N_events/(h->Integral()) with N_events = Xsec*eff*lumi; var is a variable that will be used in TMVA, and N_events is the number of events I want to normalize to. In case of my samples this number depends on the cross-section (Xsec), on the efficiency of the sample (eff), and on the luminosity. Note that the weight I actually use depends on h->integral, because each sample has a different number of events and this must be taken into account. I need to pass the correct weights to TMVA. The question is, should I pass simply "Xsec*eff*lumi" (in other words, TMVA does an equivalent of 1/h->integral by default) or should I pass actually Xsec*eff*lumi/N_events_mc, where N_events_mc is the original number of events in each sample? Note that passing N_events_mc is not really ideal, as there a few cuts involved. Alternatively, how would I do the equivalent of h->Integral() at the TMVA level? Thanks a lot in advance for any help, and apologies if something is not very well explained or confusing! Ben |
From: GMAIL <a.m...@gm...> - 2017-02-14 16:50:59
|
Dear all, a test TreeS and TreeB were used to train the TMVA with BDT and the weights were produced. Then I used such weights at the application stage on the same sample (TreeS once and TreeB another time) and the response of the BDT is always the same in all the events. I mean the BDT response distribution is an histogram that is filled at the same bin in all the events. The variable names and the way I add them to the reader looks ok. Have you any suggestion on how to further debug? Thanks in advance. Annalisa |