This version 2.0 release represents the future of DMTCP. The older DMTCP
version 1.2.x branch will continue to be maintained for bug fixes and
back-porting of simple enhancements to DMTCP, in order to provide backward
compatibility. But DMTCP version 1.2.x will not see most new features.
DMTCP version 2.0 has been re-designed around the concept of DMTCP
plugins (similar in spirit to web browser plugins). Much of the internal
architecture of DMTCP has been moved into plugins, for greater modularity.
Further, the plugin capability has been exposed, to make it easy for end
users to write their own plugins. Among the capabilities of plugins are:
- the ability for user code to initiate or delay a checkpoint;
- the ability for user code to take special actions at the time
of checkpoint, resume, or restart (for example, disconnect from
a database at checkpoint time, and re-connect at restart time);
- the ability of user code to virtualize ids and other interfaces
(for example virtualize global ids to data objects, in case they
change between the time of checkpoint and restart);
For details on how to use plugins, see:
Other changes in this new DMTCP 2.0 branch include:
- The command dmtcp_checkpoint has been renamed to dmtcp_launch.
The older dmtcp_checkpoint is still supported for backwards compatibility,
- Checkpointing of ssh connections is now more general and much more robust.
This may improve the robustness of DMTCP in checkpointing certain
dialects of MPI with unusual cluster configurations. (The newer support
for ssh is based on an internal DMTCP plugin.)
- There is now a contrib directory with support for several extensions to DMTCP.
Note that many of these extensions represent new code that has not yet
been thoroughly tested. Feedback is welcome.
- Checkpointing of KVM virtual machines from the outside, without
the need for KVM-specific snapshots (contrib/kvm)
- checkpoint of network of KVM virtual machines (contrib/tun with contrib/kvm)
- plugin support both for Torque and SLURM batch queues (resource managers)
(contrib/torque and contrib/rm)
- integrated support for calling DMTCP from inside Python (contrib/python)
- support for checkpointing over InfiniBand (contrib/infiniband)
[ Note that this code is very new and is probably less than robust.
Feedback is very welcome, as we work on a future, improved version. ]
- support for checkpointing within Condor (contrib/condor)
[ This support was available for some years, but is now collected
into contrib. ]