tmva-users Mailing List for TMVA Toolkit for Multi Variate Analysis (Page 5)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

in the log file you find a printout of 'sum of weights' as they are
seen by TMVA, there you can most easily check if you got the 'correct'
factor (i.e. by seeing that the sum of the signal weights is 'about'
the same as the sum of the background weights.

>Nev_back1*W1 + Nev_back2*W2 + Nev_back3*W3 = Nev_sig*Weight_signal
this formula seems 'right'
but
I don't see how this translate into:

Weight_signal = (Xsec_back1 + Xsec_back_2 + Xsec_back3)/Nsignal_ events

Cheers,
Helge

On 9 March 2017 at 13:15, Ben Smith <ben...@gm...> wrote:
> Hi Helge,
>
> OK!  I tried all sorts of ways to normalize signal and background and
> indeed, doing NormMode = None and normalizing signal by hand and using
> NormMode=EqualNumEvents, seems to be *very* different.... So, I wonder if
> I'm doing the normalization of the signal by hand correctly.
>
> You said "I suggested to weight the signal sample with an overall
> constant factor such that the total number of sum_of_weights for the
> signal sample is equal to the sum over background events"
>
> So, I did the following:
>
> Nev_back1*W1 + Nev_back2*W2 + Nev_back3*W3 = Nev_sig*Weight_signal
>
> So
>
> Weight_signal = (Xsec_back1 + Xsec_back_2 + Xsec_back3)/Nsignal_ events
>
> In other words, the signal weight "by hand" should be the sum of all
> background cross sections divided by the final number of events in the
> signal sample that is passed to TMVA.
>
> Did I get that right?
>
> Thank you,
>
> Ben
>
>
>
> I
>
>
>
>
>
> On Sun, Feb 26, 2017 at 9:05 PM, Helge Voss <Hel...@ce...> wrote:
>>
>> Hi,
>>
>> yes exactly, NormMode=EqualNumEvents would take car of the bit that
>> normalizes 'Signal' to 'background'. The relative weighting of the
>> various
>> background samples you still have to do yourself though. And as I
>> never remember how possible preselection cuts in the factory are
>> handled,
>> I simply like to suggest to do the normalization 'by hand' :)
>>
>> Cheers,
>>
>> helge
>>
>>
>> On 22 February 2017 at 11:09, Ben Smith <ben...@gm...> wrote:
>> > Hi Helge,
>> >
>> > Hi Helge,
>> >
>> > Thanks a lot for taking the time to explain my confusions...
>> >
>> > I discovered that there is a "NormMode=EqualNumEvents" which should do
>> > what
>> > you proposed automatically, unless I misunderstood it.
>> >
>> > You say the signal should be normalized to the "explicit number of
>> > events"
>> > .. or if you you have weighted the events, the 'sum_of_event_weights' of
>> > you
>> > total background sample" (I got it why now)
>> >
>> > To normalize to the same number of events as the background, I wonder
>> > what
>> > is the best way to do this. Would it ork if I use
>> > "NormMode=EqualNumEvents"?
>> > Or is it the case that if I use the background weights as we discussed,
>> > I
>> > should than have "NormMode=None", necessarily? Because if that is the
>> > case,
>> > than I am not sure what to enter for Double_t signalWeight =?? , so that
>> > this is acomplished (I don't know the cross section of the signal in
>> > this
>> > particular case). In your first message I re-read that you said that
>> > this
>> > signal_weight should be
>> > "sum_over_background_weights/sum_over_signal_weights", so what was
>> > missing
>> > from "signalWeight= Xsec1*eff1/Nevents_mc1 + Xsec1*eff2/Nevents_mc2"
>> > is the 1/(sum_over_signal_weights) but this I don't know (there are no
>> > event
>> > weights for the sinal, as far as I understand, and probably no way to
>> > check).
>> >
>> > "so ..  n_events should be the 'number of events' that corresponds to
>> > the
>> > 'eff' that you put in there."
>> >
>> > I'm thinking than I will just use Xsec/N_events_final, which should be
>> > equivalent to having the correct final efficiency. The thing is that the
>> > efficiency I quoted takes into account only the generator level cuts,
>> > not
>> > analysis cuts. I have the "integrated luminosity of the sample" but this
>> > is
>> > before applying any analysis cuts - I thought I could use this, but what
>> > you
>> > say does not corroborate that. So, if I take the actual number of events
>> > I
>> > see in the tuple I pass to TMVA and the cross section, this should be
>> > enough.
>> >
>> > Thanks!!!
>> >
>> > Ben
>> >
>> > On Tue, Feb 21, 2017 at 8:57 PM, Helge Voss <Hel...@ce...> wrote:
>> >>
>> >> Hi Ben,
>> >>
>> >> There's still some misunderstanding, I'll try to explain below
>> >>
>> >>
>> >> >
>> >> >    // global event weights per tree (see below for setting event-wise
>> >> > weights)
>> >> >    Double_t backgroundWeight1 =  Xsec1*eff1/Nevents_mc1;
>> >> >    Double_t backgroundWeight2 =  Xsec1*eff2/Nevents_mc2;
>> >> >    Double_t signalWeight      = Xsec1*eff1/Nevents_mc1 +
>> >> > Xsec1*eff2/Nevents_mc2;
>> >>
>> >> For the 'signal' scaling, you really don't care about 'lumi' (when I
>> >> say
>> >> lumi,
>> >> I mean 'integrated luminosity' obviously)  of your background
>> >> monte carlo events, but the "explicit number of events" .. or if you
>> >> you
>> >> have
>> >> weighted the events, the 'sum_of_event_weights' of you total background
>> >> sample
>> >>
>> >> As I tried to explain in the previous mail, the signal sample should
>> >> NOT be normalized
>> >> to the same lumi as the background, but the the same "number of
>> >> events". And typically
>> >> for "signal" that is a much much larger lumi than for background.
>> >> (maybe if you read the
>> >> previous mail again, you understand "WHY" I said this should be the
>> >> case)
>> >>
>> >>
>> >> >
>> >> >    // You can add an arbitrary number of signal or background trees
>> >> >    dataloader->AddBackgroundTree( background1,
>> >> > backgroundWeightSample1
>> >> > );
>> >> >    dataloader->AddBackgroundTree( background2,
>> >> > backgroundWeightSample2
>> >> > );
>> >> >    dataloader->AddSignalTree    ( signal,     signalWeight );
>> >> >
>> >> > I confess I would have thought taking the largest 1/Lumi for the
>> >> > background
>> >> > would have been enough. Say I collect 10 fb-1 of background1 and 20
>> >> > fb-1
>> >> > of
>> >> > background2 simultaneously. I would expect not to be able to collect
>> >> > more
>> >> > than 20 fb-1 of signal than. But I guess you're being very
>> >> > conservative
>> >> > to
>> >> > be on the safe side.
>> >>
>> >> So.. after what I wrote above, it should now hopefully be clear that
>> >> this is also
>> >> wrong. I wasn't 'conservative' or 'on the safe side' by taking the sum
>> >> of their integrated
>> >> luminosities, I simply meant a different 'scaling', based on actual
>> >> number of events (sum of
>> >> event weightes)  in the respecteive signal and background sample.
>> >>
>> >> >
>> >> > I have one very last question, if you would not mind...  I suddenly
>> >> > realize
>> >> > I don't know how exactly to take Nevent_mc. When the samples are
>> >> > prepared
>> >> > and are available to use they have a certain number of events. But
>> >> > when
>> >> > I
>> >> > prepare the tuple to pass to TMVA, a few cuts are applied and I have
>> >> > less
>> >> > events. Which one does TMVA want?
>> >>
>> >> Again, TMVA want's nothing ;) You WANT  to give it a background sample
>> >> that is as close
>> >> to that which you have in the data (i.e. the event distributions that
>> >> TMVA sees and
>> >> tries to discriminate you signal against, should be as similar as
>> >> possible to what the
>> >> trained classifier will be exposed to when it is in the end applied to
>> >> your data. Hence
>> >> you can always use this in order to determine how you want to 'scale'
>> >> your various event
>> >> samples. That's why I said: "scale your different background samples
>> >> such that they all
>> >>
>> >> xsec * eff / n_events  =  1/(integrated lumi)
>> >>
>> >> so ..  n_events should be the 'number of events' that corresponds to
>> >> the 'eff' that you put
>> >> in there. So if you have some cuts, eff should take into account those
>> >> cuts AND of course
>> >> all cuts that your event generator might have applied etc..
>> >>
>> >> Cheers,
>> >>
>> >> Helge
>> >>
>> >>
>> >> >
>> >> > Many thanks,
>> >> >
>> >> > Ben
>> >> >
>> >> > On Mon, Feb 20, 2017 at 5:09 PM, Helge Voss <Hel...@ce...>
>> >> > wrote:
>> >> >>
>> >> >> Hi Ben,
>> >> >>
>> >> >> > normalize to 1/lumi(sample_i) than my impression that I should
>> >> >> > pass
>> >> >> > the
>> >> >> > number of events of each sample as  well was correct. For my
>> >> >> > samples
>> >> >> > I
>> >> >> > would
>> >> >> > have
>> >> >> >
>> >> >> > lumi(sample_i) = N_events_mc/Xsec*eff
>> >> >> >
>> >> >> > So 1/Lumi_sample_i = Xsec*eff/Nevents_mc
>> >> >> >
>> >> >> > the "lumi" I was using was just a global constant that would not
>> >> >> > change
>> >> >> > the
>> >> >> > normalization between the samples so it can be omitted (like, I
>> >> >> > could
>> >> >> > multiply the lumi of all background samples by 10 and this should
>> >> >> > not
>> >> >> > make a
>> >> >> > difference, as far as I understand).
>> >> >>
>> >> >> Yes exactly so far!
>> >> >>
>> >> >> >
>> >> >> > In order to pass this, I understand I should do:
>> >> >> >
>> >> >> > factory->SetBackgroundWeightExpression( "weight_bkg" );
>> >> >> >
>> >> >> > and have the variable "weight_bkg = 1/lumi(sample_i)" (read
>> >> >> > directly
>> >> >> > from
>> >> >> > the ntuple)
>> >> >>
>> >> >> as this weight would be the same for 'every' event in a particular
>> >> >> sample, rather
>> >> >> than haveing to write this into the N-tuple,  you can much easier
>> >> >> use:
>> >> >>
>> >> >>    // global event weights per tree (see below for setting
>> >> >> event-wise
>> >> >> weights)
>> >> >>    Double_t backgroundWeightSample1 =
>> >> >> <theNumberYouCalculatedForSample1>;
>> >> >>    Double_t backgroundWeightSample2 =
>> >> >> <theNumberYouCalculatedForSample2>;
>> >> >>    etc..
>> >> >>
>> >> >>    dataloader->AddBackgroundTree( background1,
>> >> >> backgroundWeightSample1
>> >> >> );
>> >> >>    dataloader->AddBackgroundTree( background2,
>> >> >> backgroundWeightSample2
>> >> >> );
>> >> >>
>> >> >> (or 'factory" instead of "dataloader" for older root/tmva versions,
>> >> >> like root 5.xx)
>> >> >>
>> >> >> the "SetBackgroundWeightExpression" is meant if your monte carlo
>> >> >> generator
>> >> >> creates event weights rather than 'events', or if you train using
>> >> >> 'sWeights' for
>> >> >> example, where each event gets a particular weight in order to end
>> >> >> up
>> >> >> with
>> >> >> the
>> >> >> correct 'average distribution' of events.
>> >> >>
>> >> >> > For the signal, I'm not sure I get what you said... should I not
>> >> >> > simply
>> >> >> > have:
>> >> >> >
>> >> >> > factory->SetSignalWeightExpression("weight_sg" ); and have
>> >> >> > weight_sg
>> >> >> > =
>> >> >> > 1?
>> >> >>
>> >> >> No, as obviously that makes 'nothing' :)
>> >> >>
>> >> >> >
>> >> >> > I have only one signal sample and several background, not several
>> >> >> > signal. Or
>> >> >> > are you saying
>> >> >> >
>> >> >> > weight_sg = total sum of 1/lumi(sample_i)? (why?)
>> >> >>
>> >> >> This is also not 'recommended', but in general it is the best
>> >> >> 'default' to have the
>> >> >> same number of (weighted) events in the signal sample as in the
>> >> >> background, even
>> >> >> if in the real data,  your signal sample is typically much smaller
>> >> >> than the background.
>> >> >> This is, because in the extreme case of a very rare signal, the
>> >> >> simplest classifier which
>> >> >> just 'call everything background' already has a very good 'overall'
>> >> >> perfromance, as it is
>> >> >> correct in 'almost all cases'. (as most events are background). But
>> >> >> of
>> >> >> course, that classifier
>> >> >> is not what you want. Therefore I suggested to weight the signal
>> >> >> sample with an overall
>> >> >> constant factor such that the total number of sum_of_weights for the
>> >> >> signal sample is
>> >> >> equal to the sum over background events.
>> >> >>
>> >> >> Cheers,
>> >> >>
>> >> >> Helge
>> >> >>
>> >> >>
>> >> >> >
>> >> >> > Thanks really a lot!!!
>> >> >> >
>> >> >> > Ben
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Mon, Feb 20, 2017 at 3:04 PM, Helge Voss <Hel...@ce...>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi Ben,
>> >> >> >>
>> >> >> >> Maybe I didn't understand you, as I don't see at all why you use
>> >> >> >> a
>> >> >> >> factor Xsec*eff*lumi.
>> >> >> >>
>> >> >> >> TMVA just concatenates the different background source files
>> >> >> >> together
>> >> >> >> without doing anything...
>> >> >> >> hence you should apply the factor 1/lumi(sample_i) to each MC
>> >> >> >> sample
>> >> >> >> (i=1,2,3) to normalize the
>> >> >> >> various samples to the same integrated luminosity. Doing this,
>> >> >> >> TMVA
>> >> >> >> sees a background sample that
>> >> >> >> has the same distribution as it would be in the data. Then
>> >> >> >> afterwards,
>> >> >> >> you should use "NormMode=None"
>> >> >> >> (NormMode takes care of how the total Signal is weighted w.r.t.
>> >> >> >> the
>> >> >> >> total background). And if you choose "None"
>> >> >> >> here, again, TMVA does nothing and you can normalize easily your
>> >> >> >> signal sample to the background sample,
>> >> >> >> buy multiplying as signal weight
>> >> >> >> "sum_over_background_weights/sum_over_signal_weights")
>> >> >> >> Where here sum goes over the events and 'background weight' for
>> >> >> >> example would be the weights you
>> >> >> >> caculated above for the relative background weighting, multiplied
>> >> >> >> with
>> >> >> >> eventual event weights from the monte carlo.
>> >> >> >> For the signal, it would be simply the 'event_weights' if the MC
>> >> >> >> you
>> >> >> >> used produces weighted events rather than
>> >> >> >> 'just events'
>> >> >> >>
>> >> >> >> Cheers,
>> >> >> >>
>> >> >> >> Helge
>> >> >> >>
>> >> >> >>
>> >> >> >> On 20 February 2017 at 15:19, Ben Smith <ben...@gm...>
>> >> >> >> wrote:
>> >> >> >> > Hello!
>> >> >> >> >
>> >> >> >> > I have a hopefully simple question regarding how weights are
>> >> >> >> > passed
>> >> >> >> > to
>> >> >> >> > TMVA.
>> >> >> >> >
>> >> >> >> > I have one signal sample and 3 background samples that I want
>> >> >> >> > to
>> >> >> >> > pass
>> >> >> >> > to
>> >> >> >> > TMVA. In ROOT, the background samples would be normalized in an
>> >> >> >> > historgam h
>> >> >> >> > as:
>> >> >> >> >
>> >> >> >> > h->Fill(var, weight); where weight = N_events/(h->Integral())
>> >> >> >> >
>> >> >> >> > with N_events = Xsec*eff*lumi;
>> >> >> >> >
>> >> >> >> > var is a variable that will be used in TMVA, and N_events is
>> >> >> >> > the
>> >> >> >> > number
>> >> >> >> > of
>> >> >> >> > events I want to normalize to. In case of my samples this
>> >> >> >> > number
>> >> >> >> > depends
>> >> >> >> > on
>> >> >> >> > the cross-section (Xsec), on the efficiency of the sample
>> >> >> >> > (eff),
>> >> >> >> > and
>> >> >> >> > on
>> >> >> >> > the
>> >> >> >> > luminosity. Note that the weight I actually use depends on
>> >> >> >> > h->integral,
>> >> >> >> > because each sample has a different number of events and this
>> >> >> >> > must
>> >> >> >> > be
>> >> >> >> > taken
>> >> >> >> > into account.
>> >> >> >> >
>> >> >> >> > I need to pass the correct weights to TMVA. The question is,
>> >> >> >> > should I
>> >> >> >> > pass
>> >> >> >> > simply "Xsec*eff*lumi" (in other words, TMVA does an equivalent
>> >> >> >> > of
>> >> >> >> > 1/h->integral by default) or should I pass actually
>> >> >> >> > Xsec*eff*lumi/N_events_mc, where N_events_mc is the original
>> >> >> >> > number
>> >> >> >> > of
>> >> >> >> > events in each sample? Note that passing N_events_mc is not
>> >> >> >> > really
>> >> >> >> > ideal, as
>> >> >> >> > there a few cuts involved. Alternatively, how would I do the
>> >> >> >> > equivalent
>> >> >> >> > of
>> >> >> >> > h->Integral() at the TMVA level?
>> >> >> >> >
>> >> >> >> > Thanks a lot in advance for any help, and apologies if
>> >> >> >> > something
>> >> >> >> > is
>> >> >> >> > not
>> >> >> >> > very
>> >> >> >> > well explained or confusing!
>> >> >> >> >
>> >> >> >> > Ben
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > ------------------------------------------------------------------------------
>> >> >> >> > Check out the vibrant tech community on one of the world's most
>> >> >> >> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> >> >> >> > _______________________________________________
>> >> >> >> > TMVA-users mailing list
>> >> >> >> > TMV...@li...
>> >> >> >> > https://lists.sourceforge.net/lists/listinfo/tmva-users
>> >> >> >> >
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>
>

2006	Jan	Feb	Mar (1)	Apr (4)	May (1)	Jun (1)	Jul	Aug	Sep	Oct	Nov	Dec (1)
2007	Jan	Feb	Mar (2)	Apr (10)	May (1)	Jun (13)	Jul (69)	Aug (40)	Sep (45)	Oct (21)	Nov (15)	Dec (2)
2008	Jan (44)	Feb (21)	Mar (28)	Apr (33)	May (35)	Jun (16)	Jul (12)	Aug (29)	Sep (12)	Oct (24)	Nov (36)	Dec (22)
2009	Jan (25)	Feb (19)	Mar (47)	Apr (23)	May (39)	Jun (14)	Jul (33)	Aug (12)	Sep (31)	Oct (31)	Nov (19)	Dec (13)
2010	Jan (7)	Feb (27)	Mar (26)	Apr (17)	May (10)	Jun (11)	Jul (17)	Aug (20)	Sep (31)	Oct (13)	Nov (19)	Dec (6)
2011	Jan (13)	Feb (17)	Mar (36)	Apr (19)	May (4)	Jun (14)	Jul (24)	Aug (22)	Sep (47)	Oct (35)	Nov (24)	Dec (18)
2012	Jan (28)	Feb (19)	Mar (23)	Apr (36)	May (27)	Jun (39)	Jul (29)	Aug (23)	Sep (17)	Oct (36)	Nov (60)	Dec (28)
2013	Jan (34)	Feb (23)	Mar (44)	Apr (39)	May (89)	Jun (55)	Jul (31)	Aug (47)	Sep (6)	Oct (21)	Nov (21)	Dec (10)
2014	Jan (19)	Feb (32)	Mar (11)	Apr (33)	May (22)	Jun (7)	Jul (16)	Aug (4)	Sep (20)	Oct (17)	Nov (12)	Dec (6)
2015	Jan (9)	Feb (7)	Mar (16)	Apr (5)	May (13)	Jun (27)	Jul (25)	Aug (11)	Sep (10)	Oct (7)	Nov (47)	Dec (2)
2016	Jan (9)	Feb (2)	Mar (4)	Apr (18)	May (2)	Jun (8)	Jul	Aug (27)	Sep (47)	Oct (28)	Nov (3)	Dec (9)
2017	Jan (11)	Feb (23)	Mar (7)	Apr (7)	May (20)	Jun	Jul (6)	Aug (1)	Sep	Oct (3)	Nov (11)	Dec (8)
2018	Jan (9)	Feb (8)	Mar (2)	Apr (2)	May (2)	Jun	Jul (2)	Aug (1)	Sep (2)	Oct	Nov	Dec
2020	Jan	Feb	Mar (2)	Apr	May	Jun	Jul	Aug	Sep	Oct (2)	Nov	Dec
2021	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep (1)	Oct (2)	Nov	Dec
2022	Jan	Feb	Mar	Apr	May	Jun (1)	Jul	Aug (1)	Sep	Oct	Nov	Dec
2023	Jan	Feb	Mar (1)	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2024	Jan	Feb	Mar	Apr	May	Jun (1)	Jul	Aug	Sep	Oct	Nov (1)	Dec

tmva-users Mailing List for TMVA Toolkit for Multi Variate Analysis (Page 5)

A ROOT-integrated toolkit for multivariate analysis

tmva-users — Open mailing list for TMVA users