Menu

DMTCP 2.1 released

DMTCP version 2.1. has now been released.

As before, it runs on most Linux distros, and supports both x86 and x86_64
(Intel/AMD for 32- and 64-bits), and 32-bit ARM (ARMv7). In addition, the
older DMTCP version 1.2.x (currently 1.2.8) continues to be maintained, but on
a bug-fix basis only.

  • Change needed for all plugins:
    • If you have plugins that include "dmtcpplugin.h", they will now have to be
      changed to include "dmtcp.h". This is to reflect that "dmtcp.h" has more
      uses than just for plugins.
  • This new release includes:
    • some newly stable plugins - batch-queue, modify-env, ptrace (see below)
    • full support for 32-/64-bit multilib architecture. (see below)
    • other enhancements to the core feature set (see below)
    • adapting DMTCP to application requirements: removal of the old dmtcpaware
      interface in favor of the newer interface: test/plugin/applic-*ckpt/
      (see below)
    • attempt to restore current working directory on restart (may be impossible
      if restart host has different filesystem)
    • 'dmtcp_coordinator --port-file <FILE>' causes coordinator to write the port
    • number on which it listens into FILE. This is useful in
      conjunction with 'dmtcp_coordinator --port 0', which starts a coordinator
      at a random unused port.
    • 'dmtcp_restart --ckptdir \<DIR>' and 'dmtcp_restart_script.sh --ckptdir \<DIR>'
      will change to a new directory to hold checkpoint images on restart.
    • 'dmtcp_restart --no-strict-uid-checking'
      or 'dmtcp_coordinator --no-strict-uid-checking'
      [ allows a user with a different uid to restart a checkpoint image;
      process uid will be changed to that of the new user ]
    • './configure --enable-run-as-root' [ self explanatory; normally running
      as root is bad practice ]
    • a new internal plugin to handle 'ssh' uniformly; Some corner cases
      in checkpointing MPI could have been affected by this.
    • some bug fixes related to the new plugin software architecture initiated
      with DMTCP 2.0.
  • Some newly stable plugins:
    This release continues to emphasize the use of DMTCP plugins.
    The plugins are now organized into two top-level subdirectories:
    • plugin - plugin is built by './configure; make', but must be invoked,
      typically through command-line option of 'dmtcp_launch'
    • contrib - plugin not built; user must cd to the subdirectory of the plugin,
      build it, and invoke it with 'dmtcp_launch --with-plugin ...'
    • Plugins in the top-level plugin directory:
      • ptrace : 'dmtcp_launch --ptrace'
        a plugin to support checkpointing ptrace-based applications,
        notably including GDB.
      • batch-queue : 'dmtcp_launch --batch-queue'
        a resource manager plugin that supports the Torque/PBS and SLURM
        batch queue systems. (This plugin is now mature, and was renamed
        from 'rm' in DMTCP-2.0 to 'batch-queue' to better reflect its use.)
        [ improved in DMTCP 2.1 ]
      • modify-env : 'dmtcp_launch --modify-env'
        Normally, on dmtcp_restart, a process can see only the original
        environment variables in effect during dmtcp_launch or set by the
        process itself. It is common to wish to update these environment
        variables based on the environment on the restart host
        (e.g., DISPLAY=$DISPLAY). This can be set in a file dmtcp_env.txt .
        [ new in DMTCP 2.1 ]
    • The contrib plugins include:
      • condor : support for HTCondor, a framework for high throughput computing
      • kvm : checkpointing of a KVM virtual machine
      • tun : support for tun networking (as in Tun/Tap) between a virtual
        machine and the host machine
      • python : support for checkpoint/restart within a Python session
      • infiniband : checkpointing over InfiniBand networks supports OFED
        InfiniBand API.
        (Note: If you are using a newer release of OFED, you may wish to use
        the rewrite of this plugin, to be available from the svn in late
        January, 2014.)
        [ improved in DMTCP 2.1 ]
      • ib2tcp : support for checkpointing computation over InfiniBand and
        restarting over TCP.
        [ new in DMTCP 2.1 ]
      • ckptfile : example/template for a plugin to change the default directory
        to receive checkpoint images. This can be important when restarting on
        a new host.
        [ new in DMTCP 2.1 ]
  • Full support for 32-/64-bit multilib architecture:
    • The standard binary, dmtcp_launch, now supports both 32- and 64-bit programs.
      Further, a 64-bit program may invoke a 32-bit program and vice versa, as part
      of a single computation under DMTCP control.
  • Other enhancements to the core feature set:
    • For extremely malloc-intensive programs, run-time overhead from several
      per cent to 20% has been observed. This is due to DMTCP deadlock
      avoidance. (The glibc implementation of malloc uses a global lock,
      that can result in deadlock if a user invokes malloc inside a plugin
      during checkpoint or restart.) If a user program is not using malloc
      in a plugin during checkpoint, then the user can disable this
      DMTCP deadlock avoidance scheme with a flag:
      dmtcp_launch --disable-alloc-plugin
      A future modification to DMTCP may remove this issue entirely.
  • Adapting DMTCP to application requirements and to external environments:
    • The old 'dmtcpaware' API is being removed in favor of:
      test/plugin/applic-*ckpt/
      For details on this newer API, please read the QUICK-START file with this
      same heading: ADAPTING DMTCP TO ...
Posted by Kapil Arya 2014-01-12 Labels: 2.1

Log in to post a comment.