Menu

DMTCP 2.0 released

This version 2.0 release represents the future of DMTCP. The older DMTCP
version 1.2.x branch will continue to be maintained for bug fixes and
back-porting of simple enhancements to DMTCP, in order to provide backward
compatibility. But DMTCP version 1.2.x will not see most new features.

DMTCP version 2.0 has been re-designed around the concept of DMTCP
plugins (similar in spirit to web browser plugins). Much of the internal
architecture of DMTCP has been moved into plugins, for greater modularity.
Further, the plugin capability has been exposed, to make it easy for end
users to write their own plugins. Among the capabilities of plugins are:

  • the ability for user code to initiate or delay a checkpoint;
  • the ability for user code to take special actions at the time
    of checkpoint, resume, or restart (for example, disconnect from
    a database at checkpoint time, and re-connect at restart time);
  • the ability of user code to virtualize ids and other interfaces
    (for example virtualize global ids to data objects, in case they
    change between the time of checkpoint and restart);

For details on how to use plugins, see:

Other changes in this new DMTCP 2.0 branch include:

  • The command dmtcp_checkpoint has been renamed to dmtcp_launch.
    The older dmtcp_checkpoint is still supported for backwards compatibility,
    but deprecated.
  • Checkpointing of ssh connections is now more general and much more robust.
    This may improve the robustness of DMTCP in checkpointing certain
    dialects of MPI with unusual cluster configurations. (The newer support
    for ssh is based on an internal DMTCP plugin.)
  • There is now a contrib directory with support for several extensions to DMTCP.
    Note that many of these extensions represent new code that has not yet
    been thoroughly tested. Feedback is welcome.
    • Checkpointing of KVM virtual machines from the outside, without
      the need for KVM-specific snapshots (contrib/kvm)
    • checkpoint of network of KVM virtual machines (contrib/tun with contrib/kvm)
    • plugin support both for Torque and SLURM batch queues (resource managers)
      (contrib/torque and contrib/rm)
    • integrated support for calling DMTCP from inside Python (contrib/python)
    • support for checkpointing over InfiniBand (contrib/infiniband)
      [ Note that this code is very new and is probably less than robust.
      Feedback is very welcome, as we work on a future, improved version. ]
    • support for checkpointing within Condor (contrib/condor)
      [ This support was available for some years, but is now collected
      into contrib. ]
Posted by Kapil Arya 2013-10-03

Log in to post a comment.