Re: [TMVA-users] Signal/Background target responses inverted.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

Just to follow up on this, the DNN implementation has been updated so 
the workaround should no longer be necessary.

Cheers,
   Kim

Kim Albertsson wrote:
> Hi Chris!
>> Thanks, I wondered if it was a ‘feature’, but I failed to find any
>> good documentation explaining the new loader class. Is there anything
>> I can read explaining how to go about using it ? I’ve found the
>> doxygen docs, but thats not really what I am looking for.
> There is supposed be information about the dataloader in the User's 
> Guide 
> <https://root.cern.ch/gitweb/?p=root.git;a=blob;f=documentation/tmva/UsersGuide/TMVAUsersGuide.pdf> 
> but it has unfortunately not been updated with this yet. It is a 
> simple transformation however, methods that dealt with loading and 
> preparing input was moved to a separate class; It should work the same 
> as the factory did before.
>
> Digging a little further into this I see now that this behaviour has 
> been in TMVA since Jun 22, 2009 and I realise that it is possibly a 
> bug of the DNN. Could you check whether the output of the MLP is as 
> you expect? I will look into a proper fix. For now I can only provide 
> the workarounds previously discussed :/.
>> Specifically, on your suggestion below, its not clear to me how I go
>> about ‘ensuring the expected order’ as you describe. Is it just a case
>> of adding the Signal first, then the background, with
>> 'tmvaLoader->GetDataSetInfo().AddClass("ClassName”)’ ?
> I think you want
> tmvaLoader->GetDataSetInfo().AddClass("Background”) // adds class 0
> tmvaLoader->GetDataSetInfo().AddClass("Signal”) // adds class 1
> to get the expected output signal(background) => 1(0).
>
> Thanks for reporting this to us!
>
> Cheers,
>   Kim
>>
>> cheers Chris
>>
>>>
>>> On 12 Jul 2017, at 4:18 pm, Kim Albertsson <kia...@ce...
>>> <mailto:kia...@ce...>> wrote:
>>>
>>> Hi Chris,
>>>
>>> This is a feature of the Dataloader, it creates the class indices
>>> dynamically making the order that classes are added important. This
>>> to allow more than two classes and custom class names. One can check
>>> what index the signal class has by querying the DataSetInfo method
>>> GetSignalClassIndex. In your case this would
>>> betmvaLoader->GetDataSetInfo().GetSignalClassIndex().
>>>
>>> Another approach would be to add the classes first and ensure the
>>> expected order through
>>> tmvaLoader->GetDataSetInfo().AddClass("ClassName"). If you use this
>>> second approach the names must be "Background" and "Signal" as you
>>> use AddSignalTrainingEvent and friends which expects these classes to
>>> exist.
>>>
>>> Cheers,
>>> Kim
>>>>
>>>> Begin forwarded message:
>>>>
>>>> *From: *Chris Jones <jo...@he...
>>>> <mailto:jo...@he...>>
>>>> *Subject: **[TMVA-users] Signal/Background target responses inverted.*
>>>> *Date: *12 July 2017 at 15:15:31 GMT+2
>>>> *To: *"tmv...@li...
>>>> <mailto:tmv...@li...>"
>>>> <tmv...@li...
>>>> <mailto:tmv...@li...>>
>>>>
>>>> Hi,
>>>>
>>>> After a while away from running TMVA I am back looking at the new
>>>> DNN MVA in ROOT 6.10.02. I have noticed what appears to me slightly
>>>> odd behavior, which is in some of my trainings the target response
>>>> (1 or 0) for signal or background are inverted. By which I mean
>>>> signal(background) is trained to give 0(1), instead of the expected
>>>> 1(0).
>>>>
>>>> In the end I think I have tracked this down to the fact I use the
>>>> following logic to fill my training and testing samples.
>>>>
>>>> for ( 'some loop over data entries' ) {
>>>>
>>>> if ( target )
>>>> {
>>>> if ( !useForTesting )
>>>> {
>>>> tmvaLoader->AddSignalTrainingEvent( InputDoubles, 1.0 );
>>>> }
>>>> else
>>>> {
>>>> tmvaLoader->AddSignalTestEvent ( InputDoubles, 1.0 );
>>>> }
>>>> }
>>>> else
>>>> {
>>>> if ( !useForTesting )
>>>> {
>>>> tmvaLoader->AddBackgroundTrainingEvent( InputDoubles, 1.0 );
>>>> }
>>>> else
>>>> {
>>>> tmvaLoader->AddBackgroundTestEvent ( InputDoubles, 1.0 );
>>>> }
>>>> }
>>>>
>>>> }
>>>>
>>>> where 'target' is a boolean that indicates if the data entry is
>>>> signal or background, and 'useForTesting' another boolean to
>>>> indicate if the entry should be used for training or testing.
>>>> InputDoubles is an array with all the input parameters for the given
>>>> data entry.
>>>>
>>>> tmvaLoader is an instance of TMVA::DataLoader.
>>>>
>>>> The issues is, the order that the above calls are first made is not
>>>> always the same. It depends on the conditionals, and if the first
>>>> data entry is declared to be signal or not. What I have found is if
>>>> the first entry is signal, so 'AddSignalTrainingEvent' is called
>>>> first, then TMVA trains the network so give signal the expected
>>>> response of 1, and background 0. However, if the first data entry is
>>>> background, so AddBackgroundTrainingEvent is called first, then the
>>>> logic is for some reason inverted, and signal is trained to give a
>>>> response of 0....
>>>>
>>>> Note I have used the above logic many times in the past, with
>>>> previous ROOT versions (using the MLP classifier). So this issue is
>>>> new to the new ROOT version (6.10.02).
>>>>
>>>> It is also the case that the use of TMVA::DataLoader is also new. So
>>>> I am not clear if the issue is related to this, or the use of the
>>>> DNN classifier.
>>>>
>>>> I have a work around, which is just to make sure
>>>> AddSignalTrainingEvent is called first (I skip entries until I get
>>>> to the first training signal entry) and this seems to do the job.
>>>> However, I am curious as to what people think about the above
>>>> behavior. I doubt somehow its intentional so looks to me like a bug
>>>> somewhere in TMVA, either in TMVA::DataLoader or perhaps specific to
>>>> the DNN MVA ?
>>>>
>>>> cheers Chris
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org <http://slashdot.org/>!
>>>> http://sdm.link/slashdot
>>>> _______________________________________________
>>>> TMVA-users mailing list
>>>> TMV...@li...
>>>> <mailto:TMV...@li...>
>>>> https://lists.sourceforge.net/lists/listinfo/tmva-users
>>>
>>
>
>
> Christopher Jones wrote:
>> Hi,
>>
>> Thanks, I wondered if it was a ‘feature’, but I failed to find any 
>> good documentation explaining the new loader class. Is there anything 
>> I can read explaining how to go about using it ? I’ve found the 
>> doxygen docs, but thats not really what I am looking for.
>>
>> Specifically, on your suggestion below, its not clear to me how I go 
>> about ‘ensuring the expected order’ as you describe. Is it just a 
>> case of adding the Signal first, then the background, with 
>> 'tmvaLoader->GetDataSetInfo().AddClass("ClassName”)’ ?
>>
>> cheers Chris
>>
>>> On 12 Jul 2017, at 4:18 pm, Kim Albertsson <kia...@ce... 
>>> <mailto:kia...@ce...>> wrote:
>>>
>>> Hi Chris,
>>>
>>> This is a feature of the Dataloader, it creates the class indices 
>>> dynamically making the order that classes are added important. This 
>>> to allow more than two classes and custom class names. One can check 
>>> what index the signal class has by querying the DataSetInfo method 
>>> GetSignalClassIndex. In your case this would 
>>> betmvaLoader->GetDataSetInfo().GetSignalClassIndex().
>>>
>>> Another approach would be to add the classes first and ensure the 
>>> expected order through 
>>> tmvaLoader->GetDataSetInfo().AddClass("ClassName"). If you use this 
>>> second approach the names must be "Background" and "Signal" as you 
>>> use AddSignalTrainingEvent and friends which expects these classes 
>>> to exist.
>>>
>>> Cheers,
>>>   Kim
>>>> Begin forwarded message:
>>>>
>>>> *From: *Chris Jones <jo...@he... 
>>>> <mailto:jo...@he...>>
>>>> *Subject: **[TMVA-users] Signal/Background target responses inverted.*
>>>> *Date: *12 July 2017 at 15:15:31 GMT+2
>>>> *To: *"tmv...@li... 
>>>> <mailto:tmv...@li...>" 
>>>> <tmv...@li... 
>>>> <mailto:tmv...@li...>>
>>>>
>>>> Hi,
>>>>
>>>> After a while away from running TMVA I am back looking at the new 
>>>> DNN MVA in ROOT 6.10.02. I have noticed what appears to me slightly 
>>>> odd behavior, which is in some of my trainings the target response 
>>>> (1 or 0) for signal or background are inverted. By which I mean 
>>>> signal(background) is trained to give 0(1), instead of the expected 
>>>> 1(0).
>>>>
>>>> In the end I think I have tracked this down to the fact I use the 
>>>> following logic to fill my training and testing samples.
>>>>
>>>> for ( 'some loop over data entries' ) {
>>>>
>>>> if ( target )
>>>> {
>>>>  if ( !useForTesting )
>>>>  {
>>>>     tmvaLoader->AddSignalTrainingEvent( InputDoubles, 1.0 );
>>>>  }
>>>>  else
>>>>  {
>>>>     tmvaLoader->AddSignalTestEvent    ( InputDoubles, 1.0 );
>>>>  }
>>>> }
>>>> else
>>>> {
>>>>   if ( !useForTesting )
>>>>   {
>>>>     tmvaLoader->AddBackgroundTrainingEvent( InputDoubles, 1.0 );
>>>>   }
>>>>   else
>>>>   {
>>>>     tmvaLoader->AddBackgroundTestEvent    ( InputDoubles, 1.0 );
>>>>   }
>>>> }
>>>>
>>>> }
>>>>
>>>> where 'target' is a boolean that indicates if the data entry is 
>>>> signal or background, and 'useForTesting' another boolean to 
>>>> indicate if the entry should be used for training or testing. 
>>>> InputDoubles is an array with all the input parameters for the 
>>>> given data entry.
>>>>
>>>> tmvaLoader is an instance of TMVA::DataLoader.
>>>>
>>>> The issues is, the order that the above calls are first made is not 
>>>> always the same. It depends on the conditionals, and if the first 
>>>> data entry is declared to be signal or not. What I have found is if 
>>>> the first entry is signal, so 'AddSignalTrainingEvent' is called 
>>>> first, then TMVA trains the network so give signal the expected 
>>>> response of 1, and background 0. However, if the first data entry 
>>>> is background, so AddBackgroundTrainingEvent is called first, then 
>>>> the logic is for some reason inverted, and signal is trained to 
>>>> give a response of 0....
>>>>
>>>> Note I have used the above logic many times in the past, with 
>>>> previous ROOT versions (using the MLP classifier). So this issue is 
>>>> new to the new ROOT version (6.10.02).
>>>>
>>>> It is also the case that the use of TMVA::DataLoader is also new. 
>>>> So I am not clear if the issue is related to this, or the use of 
>>>> the DNN classifier.
>>>>
>>>> I have a work around, which is just to make sure 
>>>> AddSignalTrainingEvent is called first (I skip entries until I get 
>>>> to the first training signal entry) and this seems to do the job. 
>>>> However, I am curious as to what people think about the above 
>>>> behavior. I doubt somehow its intentional so looks to me like a bug 
>>>> somewhere in TMVA, either in TMVA::DataLoader or perhaps specific 
>>>> to the DNN MVA ?
>>>>
>>>> cheers Chris
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org <http://slashdot.org/>! 
>>>> http://sdm.link/slashdot
>>>> _______________________________________________
>>>> TMVA-users mailing list
>>>> TMV...@li... 
>>>> <mailto:TMV...@li...>
>>>> https://lists.sourceforge.net/lists/listinfo/tmva-users
>>
>

Re: [TMVA-users] Signal/Background target responses inverted.

A ROOT-integrated toolkit for multivariate analysis

Re: [TMVA-users] Signal/Background target responses inverted.