The idea with basically any framework is that the user can take advantage of the features offered by the framework without having to touch the framework's code itself. The scheme is the same with SFrame. While doing the first tests by editing the files under SFrame/user is fine, this is not the model for creating a full-blown analysis code.
This page lists the "official" suggestions for creating a user analysis package, creating the code for it, and using some of the basic features of SFrame. The page only serves as a tutorial, the detailed explanations of the listed features may be found on other pages. For more advanced features, have a look at page [AdvancedFeatures].
Let's assume that you checked out the SFrame code into a directory called $(HOME)/Analysis/SFrame/
. This means that the compiled libraries and executables from the package ended up in the $(HOME)/Analysis/SFrame/lib
and $(HOME)/Analysis/SFrame/bin
directories, respectively.
Now you should create your own "SFrame package" to hold your analysis code. To make it easier, a script called sframe_new_package.sh is provided. To use it, go to the $(HOME)/Analysis/
directory, and type:
sframe_new_package.sh MyAnalysis
This will create a new directory called $(HOME)/Analysis/MyAnalysis/
, and fill it with the directories/files that will serve as the skeleton for the new package. Notice, that the new package has the same layout as the SFrame/user directory. To easily add a new skeleton cycle to the package, the script sframe_create_cycle.py is provided. Go to the directory $(HOME)/Analysis/MyAnalysis/
and execute the following:
sframe_create_cycle.py -n MyCycle -l include/MyAnalysis_LinkDef.h
This will create the files $(HOME)/Analysis/MyAnalysis/include/MyCycle.h
and $(HOME)/Analysis/MyAnalysis/src/MyCycle.cxx
. It also adds a line to $(HOME)/Analysis/MyAnalysis/include/MyAnalysis_LinkDef.h
, instructing CINT to generate a dictionary for this cycle.
At this point the package is basically ready for compilation. Go to the main directory of the package ($(HOME)/Analysis/MyAnalysis/
) and simply execute:
make
The created library will be put alongside the other SFrame libraries, in the directory $(HOME)/Analysis/SFrame/lib/
. This might seem counter-intuitive at first, but this organization has served us well so far. Notice, that you are not restricted to only using one package. A full-grown analysis code area could look something like this for instance:
$(HOME)/Analysis/SFrame/
$(HOME)/Analysis/CommonTools/
$(HOME)/Analysis/SelectionCycles/
$(HOME)/Analysis/AnalysisCycles/
The packages are even allowed to use code from each other. The only thing to keep in mind in this case is to properly load all the needed libraries/packages in the SFrame jobs. This usually also means loading the libraries/packages in the correct order. (If Pkg2 uses Pkg1, then Pkg1 has to be declared first in the configuration XML.)
Here we explain briefly how to add code to the cycle MyCycle
created in the last step. All analysis cycles have to implement the ISCycleBase interface. This is done the easiest by making the user cycle inherit from SCycleBase. Notice that since SFrame only requires the user cycle to implement the ISCycleBase
interface and not he SCycleBase
one, it's possible to extend the functionality of SFrame by introducing new base classes that add more features. For an example of this, have a look at the [SFrameARA] page.
The helper script created the user cycle so that it inherits from SCycleBase
. First off, let's review what the different virtual functions of the SCycleBase class do:
sframe_main
process, even when running in PROOF mode.The branches of the input and output ntuples are handled individually. This means that for each input branch that you want to use in your analysis, you have to declare a variable (a "simple" variable in case of primitives like '''Int_t''' or '''Double_t''', or a pointer in case of STL containers) and connect this variable to the appropriate branch in the '''BeginInputFile(...)''' function. The function for connecting variables to an input branch is:
template< typename T >
void ConnectVariable( const char* treeName, const char* branchName, T& variable ) throw( SError )
Let's say you have a branch in your input tree which is of type std::vector< double >
. You can use this branch in your analysis by creating a member variable in your cycle with a pointer to such an object, and connecting it to the branch like this:
std::vector< double >* m_variable;
ConnectVariable( "Reco0", "vec_var", m_variable );
Output variables are handled similarly. For each output primitive or object you have to create the object as a member of your cycle class, then you can declare it to be written to the output ntuple, with the function:
template< typename T >
TBranch* DeclareVariable( T& obj, const char* name, const char* treeName = 0 ) throw( SError )
To write out a simple Double_t
variable to the output TTree
, you have to do the following:
Double_t m_out_var;
DeclareVariable( m_out_var, "out_var" )
Note: that if you only declared one output TTree
in your XML, then you don't have to specify the tree name for the function.
Note: For all the data types that you want to read or write from/to a TTree, you have to load the appropriate dictionary. For the basic STL classes (std::vector< double >
, std::vector< int >
, ...) ROOT has a built in dictionary. But if you want to write out a custom object for instance, you have to create a dictionary for this object, and load it in your SFrame job.
More detailed documentation on these functions can be found in the Doxygen pages, here.
You can put basically any kind of ROOT object (inheriting from TObject) into the output ROOT file. There are three functions that you can use to put ROOT object to the output file:
template< typename T >
T* Book( const T& obj, const char* directory = 0 ) throw( SError )
template< typename T >
T* Retrieve( const char* name, const char* directory = 0 ) throw( SError )
TH1* Hist( const char* name, const char* directory = 0 ) throw( SError )
You can use the first in the following way to declare a 1 dimensional output histogram:
TH1* hist = Book( TH1D( "hist", "Histogram", 100, 0.0, 100.0 ) );
To access this histogram somewhere else in your code, you could do:
TH1* hist = Retrieve< TH1 >( "hist" );
or for 1-dimensional histograms it's much better to use:
TH1* hist = Hist( "hist" );
Note: The Book(...)
and Retrieve(...)
functions (because of the underlying ROOT implementations) are quite slow. So it's good practice to store the pointers to the output histograms in your cycle, and possibly never use Retrieve(...)
. The Hist(...)
function is quicker, since it caches the pointers to the histograms for itself. If you run in PROOF mode, you should make sure you understand where each SCycleBase function is called, otherwise you might end up trying to access histogram pointer which have not been initialized in a specific cycle instance yet. For more details have a look at page [SFramePROOF].
The functions can be used to put non-ROOT-native objects in the output as well. A good example for this is the SH1 class. ([AdvancedFeatures])
More detailed documentation on these functions can be found in the Doxygen pages, here.
This section lists all the configuration options available in the XML files. The example file (FirstCycle_config.xml) gives a fair amount of documentation about most of the features, so that can in principle serve as template for any other configurations. The basic layout of the file is demonstrated in that example, only the meaning of the configuration options is explained here.
Each analysis cycle can be defined just using its name, thanks to ROOT's dictionary generation capability. This means that the users can implement their own analysis cycles in their own shared libraries, and sframe_main
will be able to load these libraries and find the cycle implementation just from a string name. The following properties can be specified for each <cycle ...=""> block:</cycle>
sframe_main
is started.The remainder of the options specify how/where the cycle should be run:
sframe_main
). It is usually some sort of network drive. An example could be "root://username@machine.institute.org//workdir/". In PROOF-Lite mode it should just be left as an empty string.An InputData is regarded as a homogeneous set of events, which have to be handled in the same way by the analysis code. First and foremost, events belonging to the same InputData will be added to the result with the same weight. This means for instance that if your Monte Carlo data is composed of different datasets, each with different generator level cuts, then you will have to process these datasets in separate InputData definitions.
The following properties can/must be specified for the InputData object:
.sframe.[Type].[Version].idcache.root
where [Type]
and [Version]
stand for the type and version specified for the InputData object. Notice that you can change the composition of the InputData (add/remove files) as long as the files are unchanged. (So they are all identified uniquely by their full path names.) When a change in the InputData composition is detected, only the previously unknown files are investigated, and the cache is updated with the new information for the subsequent executions.The following objects can be defined within an InputData definition:
Properties for the cycle can be defined in the <userconfig> block. (Again, see the example XML.) The format is very simple. Each property can be configured with one line like:</userconfig>
<Item Name="PropertyName" Value="PropertyValue" />
Properties can be declared to the SCycleBase base class with the function DeclareProperty(...)
. For more information on the supported types of configurable properties, have a look at the Doxygen documentation here.