Menu

ParallelComputing

Anonymous

Parallel Computing Help

ODToolbox has the ability to run some features in a parallel computing environment. This means that ODToolbox can take a large problem, break it up into sections that do not need to be run in sequence, and have these sections executed simultaneously by multiple processors or multiple computers. This can decrease the total time it takes for a large problem to finish. For instance, estseq is a sequential estimator that runs multiple monte-carlo simulations. Instead of running 20 simulations one right after another, a quad-processor machine could have each processor run 5 simulations and finish in (theoretically) one quarter the time.

To take advantage of parallel computing you need:

1. multiple computers or a computer with multiple processors

2. Matlab's Parallel Computing Toolbox

3. An ODToolbox feature that supports parallel operation -- Currently only the estbat.m, estseq.m, and estspf.m estimators support parallel operation. Each is capable of running the monte-carlo simulations in parallel.

If the user does not have Matlab's Parallel Computing Toolbox or does not currently have a pool of running Matlab instances to use, ODToolbox will simply run the programs sequentially.

Using the Parallel Computing Toolbox

If the user has a parallel computing capability, they can setup a pool of Matlab workers with the command "matlabpool open". Once a pool of workers is opened then any estimator invoked will automatically farm Monte-Carlo runs out to those workers. To close the parallel computing environment type "matlabpool close".

The default behavior of "matlabpool open" is to open a Matlab worker for every processor on the local system. If you want more control of how workers use the processors, or if you want to create workers on another machine, you must configure the pool. You can do this with the "Parallel" menu option which is present in the top-level menu if you have Parallel Computing Toolbox.

When parallel workers are started they must be able to find and load the same environment as your main Matlab. They will load and run the startup.m file, but they will be started with the same command line arguments that the main Matlab was started with. So environment setup should not be put into or initiated by command line arguments. However, there is a mechanism to explicitly tell the workers directories that should be in the paths. That is possible through the "Parallel" menu option.

Constraints on Developer When Using Parallel Computing

When using the estimators in parallel, this does place constraints on the dynamics and data functions passed into the estimators.

1. The Monte-Carlo simulations will be run on different Matlabs and therefore, the dynamics and data functions should not rely on any data computed in other Monte-Carlo runs.

2. If the dynamics and data functions use Java functions in their internal state, the Java objects must be serializable. When ODToolbox starts a Monte-Carlo run in another Matlab, it will send the Matlab all variables that have been initialized outside the run but will be needed inside the run, and if any Java objects have been setup, they will be serialized to binary data, sent to the other Matlab, and unserialized back into objects. If the Java objects are not serializable, Matlab will throw an error. A Java object being serializable implies that every object it contains is serializable.

3. All variables that are initialized before calling the estimator and then used inside the dynamics or data functions must be initialized in a way that the variable name is explicitly used in the code. For instance:

X = 5;

explicitly mentions the variable X. Good.

load("saved_workspace.mat";

does not explicitly have 'X' in the code but may be setting the value of X. MATLAB doesn't know the variable is being initialized before the parallel runs and will not send X to the other Matlab instances. Bad.

4. Dynamic and data functions that require a large amount of initialization data may perform poorly in parallel. If a dynamics function uses a giant lookup table that is 5 megabytes in memory, then that 5 megabyte table has to be sent over the wire to the other Matlabs before Monte-carlo runs can be performed. This may take more time than running in parallel saves you.


Related

Wiki: Home

MongoDB Logo MongoDB