General Idea
MATLAB is a high-level scientific computation environment with its own programming language. Many projects use MATLAB to build data analysis tools and libraries. However, using MATLAB code from different projects in one data analysis stream often requires substantial programming efforts. This may not be too difficult for experienced programmers, but it certainly is a problem for scientists that want to analyse their data in MATLAB without being experts in programming. This batch system tries to address these issues in a generic way:
- To end-users, it provides a convenient way to create templates for recurring data processing tasks either using a GUI or MATLAB command line syntax. The templates can be re-used efficiently to analyse large datasets in a consistent way. All settings and parameters are documented within the templates for future reference or to re-run an analysis.
- To programmers, it provides a structured language to describe inputs to their data processing programs. This language is encoded in MATLAB objects. Programmers can add constraints and consistency checks for these inputs. The batch system will only run an algorithm if the inputs are all available and meet the consistency criteria. Therefore, the algorithms itself can be freed from code that checks input completeness or consistency.
This Wiki describes three aspects of the batch system, which reflect the different levels at which one may want to interact with the system:
- User interface
- Running the GUI
- Creation of batch jobs for applications using matlabbatch
- Batch management, batch execution, including use of MATLAB scripts to run batch jobs on multiple datasets with very little user interaction
- Application development
- Requirements on code structure of application
- Introduction to internal representation of batch configurations
- Introduction to writing batch configuration scripts
- Integration of an application into configuration management and GUI
- Implementation
- Classes used
- Methods
- Details about job management, runtime etc.
Terminology
- An application is a collection of program modules that are related to each other.
- A program module is a MATLAB function that performs computations on its structured input data and returns some output.
- The batch interface describes the input to each module in terms of configuration items (cfg_items).
- In a less formal way, the outputs of a module are declared by virtual outputs (vout). Virtual, because these output items will not yet be computed at batch configuration time.
- The configuration items are organised in a tree-like structure, the configuration tree.
Getting Started
Configuration File Examples
Some examples of configuration files are included in matlabbatch itself:
- There is a small toy example, that does some simple calculations but is powerful enough to demonstrate both batch creation and configuration creation. The files can be found in the "examples" folder.
- There is a collection of useful input modules, which provide a shared data source to application specific computation modules, and a collection of output modules, which collect data from computations to save them to disk or to MATLAB workspace. This application is added to the configuration tree by default through the included cfg_mlbatch_appcfg.m file. The code lives in the "cfg_basicio" folder.
- The configuration data structures have been described itself in terms of batch modules. This allows to create batch configurations for new applications without writing configuration files by hand, just by plugging the input structuring items together in the batch GUI. This application can be found in the cfg_cfg folder. It is not added to the configuration by default. To add it, it is necessary to run "Add application" from the GUI and point to conf_gui/cfg_confgui.m or include the cfg_confgui folder in the MATLAB path and then run cfg_util('addapp', cfg_confgui).