MATLAB Batch System Wiki

Brought to you by: glauche

ApplicationDevelopmentIntroduction

Labels:

Introduction

This guide gives general advice to programmers how to enable existing MATLAB code to use the batch system. Development of the matlabbatch system itself is discussed in here. For detailed discussion of the issues mentioned here, see the referenced subtopics and especially the code of the Toy Example and its explanations and MATLAB help texts. A quick tour through the users guide documents may also give a first impression and help in understanding how the batch system is supposed to work. This user experience may help to decide how the existing application can be adapted to the batch system.

General Considerations

The existing code should be reviewed to decide which MATLAB functions will become separate batch modules. Some of the criteria for this decision might be:

A module should compute results on the input data that are useful themselves or can serve as inputs to several other modules.
If the same processing step is shared by different parts of the code, it might be worth to provide a separate module for it.
The module structure should reflect a good description of the data processing pipeline from the original data to the final result.

If the existing code is already divided into several MATLAB functions or scripts, decisions should be based on this structure. Not all functions need to become separate modules, only those that implement a significant step in data processing. If the existing code is not modularised yet, the benefits of monolithic code should be compared to benefits of a more modular structure.

Input Structure

Application specific Input Description

Each application has to provide a set of configuration files which describe the modules and their allowed inputs. The batch system enforces two main constraints on the inputs to each module:

The inputs to each module have to be described in a tree-like structure.
During data entry, there is no way to change the tree structure based on input data.

In other terms, the input data structure is described by a context-free grammar. There is one simple reason for this: Theoretically, all inputs to a module might depend on outputs of other modules. Of course, these outputs are not yet available at the time of data entry. Therefore, they can not be used to decide whether input options should be available or not.
This may seem to be a very strong constraint, but in almost any case there is a natural way to ask for the correct inputs. For a more detailed discussion, see the section on input structure and the code in the examples directory (described in more detail here).

Structure of a Job extracted from the Configuration Tree

Data entered in a configuration tree are passed to the application modules in a single variable. This variable is a struct/cell construct and reflects the tree structure of the job in an unambiguous way. The structure of this variable is described in more detail in input structure and its use can be studied in the examples/*run*.m code.

Application specific Functions

Application specific functions have to be created to check input consistency, define virtual outputs or actually run a computation. These functions take a single input argument - the part of the job structure that is extracted from the configuration subtree to which the function is associated.
Check functions can be associated to any part of the configuration tree. This allows to check for data integrity as early as possible.
Virtual output functions and computation functions are linked to the top level node of each module. Virtual output functions are executed as soon as the structure of the module inputs is defined (i.e. the leafs of the configuration tree have to be present, but they do not need to contain values). Computation functions are executed only if all inputs of a module are available and meet the consistency check criteria.

Wiki: ApplicationDevelopmentExampleApplication
Wiki: Home
Wiki: space.menu