Menu

Distributed MultiThreaded Checkpointing / News: Recent posts

DMTCP-2.6.0 released

Newer flags for configure:
* Rename --enable-debug to --enable-logging
* Add --enable-debug: "-Wall -g3 -O0" (for debugging DMTCP)

Newer flags for dmtcp_restart:
* Add --debug-restart-pause flag to dmtcp_restart

Bug fixes and enhancements:
* Fixes for glibc versions greater than or equal to 2.24
* Fix deadlock in system() wrapper when the child crashes
* Fix deadlock when a process is forked in the resume phase (issue #691)
* jsocket: Warn user if peer closes socket while draining (issue #701)
* Fix epoll1 test (initialize addrlen for accept()) (#705)
* Fix to correctly calculate Coordinator/Host IP:
Affects some distributed applications
* Allow restored stack to grow if needed.
* Fix bug in POSIX timer: race condition manifested in test/timer.c/Ubuntu-18.04
* Modified InfiniBand plugin for more robust support
(primarily of interest for MPI)
* The floating point environment (fegetenv()) is now restored on restart.
(Formerly, only the rounding mode (fegetround()) was restored.)
* The current resource limits (rlim_cur) for RLIMIT_NOFILE and RLIMIT_STACK
are restored if possible.
* Mutex ownership and robust mutexes are now supported if DMTCP is configured
with --enable-mutex-wrappers. (However, this configuration can also add runtime overhead
if mutex operations are called very frequently.)
[Thanks to Johannes Stoelp, Laurent Buchard, Pankaj Mehta of Synopsys, Inc.]
* Fix bug if stack grows a lot after a restart.
* Improved support for pty's
* util/gdbinit-example added for those who wish to debug DMTCP internals.
* Many bug fixes

Posted by Gene Cooperman 2019-08-15 Labels: 2.6.0

DMTCP-2.5.2 released

  • All fixes in Release DMTCP-2.4.9 are incorporated in this release.
  • An incompatibility of DMTCP with Open MPI 1.10 when using orterun (mpirun)
    was discovered. This does not affect recent versions, such as Open MPI 2.x.
  • In some rare cases, open files were not properly restored due to
    a use-after-free bug. This is now fixed.
  • In some rare cases, one process had created a SysV shared memory object,
    and a different process was assigned to restore it on restart. This
    was not handled correctly, and is now fixed.
  • Correctly restore CPU affinities of threads
  • Virtualized SysV shared memory keys to avoid race condition on restart
  • Fixed logic for checking if relative path to file was a duplicate
    of another existing path
  • The NSCD area for name service caching daemon was not handled correctly
    in CentOS 6.8 and later correctly. Fixed now.
  • The Linux sched.h include file for scheduling of cores was added to
    satisfy some older Linux distros that needed it for compiling DMTCP.
  • Fixed a regression in which --enable-debug (for verbose debug logs)
    was not being properly written.
  • The DMTCP coordinator was displaying a spurious warning, "Failed to find
    coordinator IP address", because it did not check for a canoncial hostname.
    A related issue prevented DMTCP from working properly on some
    SUSE/openSUSE distros.
Posted by Jiajun 2017-11-15 Labels: 2.5.2

DMTCP-2.4.9 released

  • Fixed a regression causing deleted NFS files to be handled incorrectly
  • Fixed handling of glibc for versions greater than glibc-2.24
  • Errors and warnings with gcc-7.x are fixed
  • A rare bug affecting pthread_cancel, etc., created incorrect pid on restart
  • man pages fixed: Description section was always describing dmtcp_command
Posted by Jiajun 2017-11-14 Labels: 2.4.9

DMTCP-2.5.1 released

This release mostly provides added robustness. Two notable items of
added functionality are:
i. DMTCP_RESTART_PAUSE and DMTCP_RESTART_PAUSE0 environment variables
for easier debugging upon initial restart
ii. The --debug-logs flag was added to dmtcp_launch/dmtcp_restart.
One can now turn on logging individually for separate plugins,
instead of only turning it on globally.

An incompatibility of DMTCP with Open MPI 1.10 when using orterun (mpirun)
was discovered. This may also affect some other versions of Open MPI 1.10.
This bug will be fixed in a future release.... read more

Posted by Jiajun 2017-09-05 Labels: 2.5.1

DMTCP-2.5.0 released

This release includes a few new plugins and several bug fixes for robustness.
Some of the highlights include:

  • Support for InfiniBand UD (in addition to the more common InfiniBand RC).
  • Added support for CMA (Cross-Memory Attach):
    process_vm_readv and process_vm_writev
  • Improved multi-arch (mixed 32-/64- bit) support.
  • Re-added --enable-fast-restart.
  • Added a new commandline option --with-plugin-32 for dmtcp_launch to specify
    32-bit plugins in a 64-bit environment.
  • Added --enable-pthread-mutex-wrappers configure flag to enable
    pthread_mutex_{lock,unlock} wrappers needed for Open MPI.
  • Added ability to specify environment file used in the modify-env plugin.
  • Allow dmtcp_restart to be invoked by root.
  • The following new plugins were added:
  • pathvirt: to virtualize filesystem paths.
  • delayresume: for finer-grained control over resuming of user threads
    during resume/restart.
Posted by Jiajun 2017-02-13 Labels: 2.5.0

DMTCP-2.4.8 released

  • The newest kernels (approximately Linux version 4.0 and later map
    VDSO into memory slightly differently for 32-bit processes running
    in a 64-bit Linux. DMTCP was failing to restart for such 32-bit processes.
    This is now fixed.
  • dmtcp_dlsym() extended to provide more robust wrapper functions.
  • Corner case fixed: signal handlers now restored before plugins restart.
  • Minor bug fixes.
Posted by Jiajun 2017-02-13 Labels: 2.4.8

DMTCP-2.5.0-rc2 released

Several important enhancements were added:

  • dmtcp_command -l will list the connected processes (workers)
  • --allow-file-overwrite flag added to dmtcp_launch
  • --with-plugin-32 added to dmtcp_launch for 32-bit plugins in multi-arch mode
  • Support for 'rsh' added (similar to existing 'ssh' support)
  • The pathvirt plugin did not propagage virtualization of paths after a call to ssh/rsh. This is needed when checkpointing on one filesystem/dir (e.g., /var/run-slot-5) and restarting on another filesystem/dir (e.g., /var/run-slot-7).
  • Bug fixed affecting processes opening 1024 file descriptors or more
  • Other minor bug fixes
Posted by Gene Cooperman 2016-10-25 Labels: 2.5.0-rc2

DMTCP-2.5.0-rc1 released

Several important enhancements were added:

  • Added more robust handling of InfiniBand with: dmtcp_launch --infiniband
  • Added missing support for__poll_chk
  • Added missing support for process_vm_readv/process_vm_writev
  • Fixed regression in dmtcp_checkpoint() API call for DMTCP-aware applications
  • Fixed regression with multi-arch (32-/64-bit) support:
  • Added: ./configure --enable-static-libstdcxx [experts only]
Posted by Gene Cooperman 2016-10-25 Labels: 2.5.0-rc1

DMTCP 2.4.0 released

Several important changes and enhancements were added:

  • dmtcp_launch/restart/command/coordinator now take the flags
    -h, -p, --coord-host/port and environment variables
    DMTCP_COORD_HOST/PORT. The older --host, --port, DMTCP_HOST/PORT
    are now deprecated.
  • Newer versions of MATLAB (matlab-2013 and later) were using additional
    Linux features. All recent versions of matlab are again supported.
  • Intensive testing done for integration of MPI/SLURM
    for the following MPI dialects: Intel MPI/MVAPICH-2/MPICH-2/Open MPI.
    See plugin/batch-queue/job_examples/ for SLURM/DMTCP submission scripts.
    Preliminary support for some other resource managers also provided,
    especially including ibrun.
    • Open MPI version 1.8 with InfiniBand is not yet supported.
      This is due to the OMPI use of UD (unreliable datagrams) for InfiniBand.
      Support is planned for the near future. Earlier OMPI versions continue
      to work with IB. We do not currently know of a config in OMPI-1.8 to
      avoid IB/UD (to use only IB/CM). Such a workaround would let DMTCP work.
  • Added support for newest Linux kernels: split of [vdso]
    into [vdso] and [vvar]; To see if this affects you, do:
    cat /proc/self/maps | grep '\[vvar]'
  • Support for glibc version 2.21 added. To see if this affects you, do:
    ls -l /lib*/libc.so.6 /lib/*/libc.so.6
  • The environment variable DMTCP_GDB_ATTACH_ON_RESTART was added. Setting
    this permanently is a security risk. But on a temporary basis,
    it can enable easier debugging of restarted processes:
    DMTCP_GDB_ATTACH_ON_RESTART=1 dmtcp_restart ckpt_a.out_*.dmtcp &
    gdb a.out `pgrep -n a.out`
  • Enhancements added for newer 32-bit ARM (armv7) CPUs
  • Experimental support is now provided for 64-bit ARM (armv8)
  • Bug fixes
Posted by Kapil Arya 2015-08-17

DMTCP-2.4.0-rc1 released!

Several important enancements were added to this 2.4 release candidate:

  • Newer versions of MATLAB (matlab-2013 and later) were using additional
    Linux features. All recent versions of matlab are again supported.
  • Intensive testing done for integration of MPI/SLURM
    for the following MPI dialects: Intel MPI/MVAPICH-2/MPICH-2/Open MPI
    See plugin/batch-queue/job_examples/ for SLURM/DMTCP submission scripts.
  • Added support for newest Linux kernels: split of [vdso] into [vdso] and [vvar]
    (to be included in rc2 release)
  • Enhancements added for newer 32-bit ARM (armv7) CPUs
  • Experimental support is now provided for 64-bit ARM (armv8)
  • Bug fixes
Posted by Gene Cooperman 2015-03-17

DMTCP 2.3 released

This is primarily a bug fix release. However, if you are using DMTCP
for the ARM v7 CPU, or if you are using DMTCP either with the InfiniBand
network or with the SLURM batch system, then it is strongly recommended
to upgrade.

The primary changes for this release are:

  • Bug fix affecting building for ARM on some recent armv7a CPUs.
  • Improvements in support for InfiniBand network and for SLURM
    batch system.
  • Other smaller bug fixes.
Posted by Kapil Arya 2014-07-03

DMTCP 2.2.1 released

This is a bug fix release. The previous release had a bug when configured
with --enable-unique-checkpoint-filenames configure flag. This has been fixed
now. Users relying on this flag are highly recommended to upgrade to 2.2.1.

Posted by Kapil Arya 2014-03-20

DMTCP 2.2 released

DMTCP version 2.2. has now been released.

In this release, the lowest layers have been re-organized and partially
re-written for greater clarity of code and greater maintainability.
These changes should be transparent to end users.

Users relying on the use of DMTCP with MPI, InfiniBand or the Toruqe or
SLURM batch queues are strongly advised to upgrade.

Other changes are:

  • A --exit-after-ckpt flag was added for dmtcp_coordinator.
  • Scalability improvements were added. DMTCP has now been tested
    on an MPI jobs using 2048 MPI ranks over 2048 CPU cores.
  • Anybody using DMTCP with InfiniBand is strongly recommended to upgrade
    to inherit important bug fixes. The InfiniBand plugin is still
    formally part of the 'contrib' directory during this release. It was
    tested primarily against Open MPI. Further testing is still needed
    before the InfiniBand plugin can be promoted from the 'contrib'
    directory to the 'plugin' directory.
  • The --infiniband flag of dmtcp_launch was not fully functional in
    version 2.1. This is now fixed.
  • The 'dmtcp_launch --no-coordinator' option was broken in version 2.1.
    This is now fixed.
  • The --disable-dl-plugin flag was added to dmtcp_launch. Most users will
    not need this option. But software relying on DT_RPATH, DT_RUNPATH,
    or certain other uncommon cases in loading dynamic libraries may need
    to invoke this for stability. It is hoped to remove the need for this
    flag in a future release.
  • A similar comment holds for the --disable-alloc-plugin flag in dmtcp_launch.
    If there appear to be issues with a memory allocator, consider invoking
    this flag.
  • Numerous minor bug fixes and enhancements were added.
Posted by Kapil Arya 2014-03-14

DMTCP 2.1 released

DMTCP version 2.1. has now been released.

As before, it runs on most Linux distros, and supports both x86 and x86_64
(Intel/AMD for 32- and 64-bits), and 32-bit ARM (ARMv7). In addition, the
older DMTCP version 1.2.x (currently 1.2.8) continues to be maintained, but on
a bug-fix basis only.

  • Change needed for all plugins:
    • If you have plugins that include "dmtcpplugin.h", they will now have to be
      changed to include "dmtcp.h". This is to reflect that "dmtcp.h" has more
      uses than just for plugins.
  • This new release includes:
    • some newly stable plugins - batch-queue, modify-env, ptrace (see below)
    • full support for 32-/64-bit multilib architecture. (see below)
    • other enhancements to the core feature set (see below)
    • adapting DMTCP to application requirements: removal of the old dmtcpaware
      interface in favor of the newer interface: test/plugin/applic-*ckpt/
      (see below)
    • attempt to restore current working directory on restart (may be impossible
      if restart host has different filesystem)
    • 'dmtcp_coordinator --port-file <file>' causes coordinator to write the port</file>
    • number on which it listens into FILE. This is useful in
      conjunction with 'dmtcp_coordinator --port 0', which starts a coordinator
      at a random unused port.
    • 'dmtcp_restart --ckptdir \<DIR>' and 'dmtcp_restart_script.sh --ckptdir \<DIR>'
      will change to a new directory to hold checkpoint images on restart.
    • 'dmtcp_restart --no-strict-uid-checking'
      or 'dmtcp_coordinator --no-strict-uid-checking'
      [ allows a user with a different uid to restart a checkpoint image;
      process uid will be changed to that of the new user ]
    • './configure --enable-run-as-root' [ self explanatory; normally running
      as root is bad practice ]
    • a new internal plugin to handle 'ssh' uniformly; Some corner cases
      in checkpointing MPI could have been affected by this.
    • some bug fixes related to the new plugin software architecture initiated
      with DMTCP 2.0.
  • Some newly stable plugins:
    This release continues to emphasize the use of DMTCP plugins.
    The plugins are now organized into two top-level subdirectories:
    • plugin - plugin is built by './configure; make', but must be invoked,
      typically through command-line option of 'dmtcp_launch'
    • contrib - plugin not built; user must cd to the subdirectory of the plugin,
      build it, and invoke it with 'dmtcp_launch --with-plugin ...'
    • Plugins in the top-level plugin directory:
      • ptrace : 'dmtcp_launch --ptrace'
        a plugin to support checkpointing ptrace-based applications,
        notably including GDB.
      • batch-queue : 'dmtcp_launch --batch-queue'
        a resource manager plugin that supports the Torque/PBS and SLURM
        batch queue systems. (This plugin is now mature, and was renamed
        from 'rm' in DMTCP-2.0 to 'batch-queue' to better reflect its use.)
        [ improved in DMTCP 2.1 ]
      • modify-env : 'dmtcp_launch --modify-env'
        Normally, on dmtcp_restart, a process can see only the original
        environment variables in effect during dmtcp_launch or set by the
        process itself. It is common to wish to update these environment
        variables based on the environment on the restart host
        (e.g., DISPLAY=$DISPLAY). This can be set in a file dmtcp_env.txt .
        [ new in DMTCP 2.1 ]
    • The contrib plugins include:
      • condor : support for HTCondor, a framework for high throughput computing
      • kvm : checkpointing of a KVM virtual machine
      • tun : support for tun networking (as in Tun/Tap) between a virtual
        machine and the host machine
      • python : support for checkpoint/restart within a Python session
      • infiniband : checkpointing over InfiniBand networks supports OFED
        InfiniBand API.
        (Note: If you are using a newer release of OFED, you may wish to use
        the rewrite of this plugin, to be available from the svn in late
        January, 2014.)
        [ improved in DMTCP 2.1 ]
      • ib2tcp : support for checkpointing computation over InfiniBand and
        restarting over TCP.
        [ new in DMTCP 2.1 ]
      • ckptfile : example/template for a plugin to change the default directory
        to receive checkpoint images. This can be important when restarting on
        a new host.
        [ new in DMTCP 2.1 ]
  • Full support for 32-/64-bit multilib architecture:
    • The standard binary, dmtcp_launch, now supports both 32- and 64-bit programs.
      Further, a 64-bit program may invoke a 32-bit program and vice versa, as part
      of a single computation under DMTCP control.
  • Other enhancements to the core feature set:
    • For extremely malloc-intensive programs, run-time overhead from several
      per cent to 20% has been observed. This is due to DMTCP deadlock
      avoidance. (The glibc implementation of malloc uses a global lock,
      that can result in deadlock if a user invokes malloc inside a plugin
      during checkpoint or restart.) If a user program is not using malloc
      in a plugin during checkpoint, then the user can disable this
      DMTCP deadlock avoidance scheme with a flag:
      dmtcp_launch --disable-alloc-plugin
      A future modification to DMTCP may remove this issue entirely.
  • Adapting DMTCP to application requirements and to external environments:
    • The old 'dmtcpaware' API is being removed in favor of:
      test/plugin/applic-*ckpt/
      For details on this newer API, please read the QUICK-START file with this
      same heading: ADAPTING DMTCP TO ...... read more
Posted by Kapil Arya 2014-01-12 Labels: 2.1

DMTCP 2.0 released

This version 2.0 release represents the future of DMTCP. The older DMTCP
version 1.2.x branch will continue to be maintained for bug fixes and
back-porting of simple enhancements to DMTCP, in order to provide backward
compatibility. But DMTCP version 1.2.x will not see most new features.

DMTCP version 2.0 has been re-designed around the concept of DMTCP
plugins (similar in spirit to web browser plugins). Much of the internal
architecture of DMTCP has been moved into plugins, for greater modularity.
Further, the plugin capability has been exposed, to make it easy for end
users to write their own plugins. Among the capabilities of plugins are:... read more

Posted by Kapil Arya 2013-10-03

DMTCP 1.2.8 released.

DMTCP version 1.2.8 is primarily a bug fix release. It is particularly
recommended to upgrade if you are using DMTCP with the ARM CPU,
or if you will compile DMTCP with a C++11 compiler (e.g. GNU flag -std=c++11).

Important changes include:

  • Bug fixes for newer ARM CPUs --- especially addressing cache coherency
    issues of multi-core ARM, and the more aggressive out-of-order
    execution for newer ARM CPUs.
  • On restart, gzip zombie processes associated with compressed checkpoint
    images were not always reaped properly. This is now handled correctly.
  • Peliminary support for using C++11 compilers to compile DMTCP (but
    not yet intensively tested).
  • Minor bug fixes.
Posted by Kapil Arya 2013-08-02

DMTCP 1.2.7 released

DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications. It operates directly on the user binary executable, without any Linux kernel modules or other kernel modifications.

Release Notes:

  • Proper restore of sockets calling bind with port '0'.
  • Allow plugins to call system() etc. during pre-ckpt phase.
  • Several other bug fixes and performance improvements.
Posted by Kapil Arya 2013-03-13

DMTCP 1.2.6 released

DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications. It operates directly on the user binary executable, without any Linux kernel modules or other kernel modifications.

Release Notes:

- Previous release (1.2.5) introduced compilation errors for older kernels.
This release fixes them.
- Several minor bug fixes related to gcc 4.7.

Posted by Kapil Arya 2012-07-31

DMTCP 1.2.5 released

DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications. It operates directly on the user binary executable, without any Linux kernel modules or other kernel modifications.

Release Notes:
- epoll, eventfd, and signalfd are now supported
- The ARM architecture for Linux is now supported.
(Linux currently supports 32-bit ARM EABI.)
- The name "DMTCP module" is changed to "DMTCP plugin" (more common terminology).
User plugins can greatly customize the behavior of DMTCP.
- The dmtcp_checkpoint cmd was resetting the checkpoint interval even
if the user did not specify the -i/--interval flag. This is now fixed.
- Improved support for a planned Fedora package for DMTCP
- On resume from ckpt, zero pages were sometimes expanded (increasing the
memory footprint). This affected Java. This is now fixed.
- Some bug fixes were provided for programs that intensively create
and destroy threads (e.g. OpenMP, Java)
- After restart, the floating point rounding mode (fesetround) was not being
properly restored. This is now fixed.
- There have been requests for support of DMTCP for PBS/TORQUE. Some partial
support has now been added to the svn only (_not_ to this release).
Please write to us if you need this support from DMTCP.
- The FAQ at the DMTCP web site was expanded.
- 15% slowdown observed in an unusual case:
A user reports that if your program frequently does both of these:
a. is heavily multi-threaded; and
b. calls malloc/free intensively;
This has been diagnosed. It was seen too close to this 1.2.5 release,
and so the fix will be provided for the next release (and in the public svn).

Posted by Kapil Arya 2012-05-27

DMTCP 1.2.4 released

DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications. It operates directly on the user binary executable, without any Linux kernel modules or other kernel modifications.

Release Notes:
- There is now much more robust treatment of processes that rapidly create and destroy threads. This was the case for the Java JVM (both for OpenJDK and Oracle (Sun) Java). This was also the case for Cilk. Cilk++ was not tested. We believe this new DMTCP to now be highly robust -- and we would appreciate receiving a notification if you find a Java or Cilk program that is not compatible with DMTCP.
- Zero-mapped pages are no longer expanded and saved to the DMTCP checkpoint image. For Java programs (and other programs using zero-mapped pages for their allocation arena or garbage collecotr), the checkpoint image will now be much smaller. Checkpoint and restart times will also be faster.
- DMTCP_ROOT/dmtcp/doc directory added with documentation of some DMTCP internals. architecture-of-dmtcp.pdf is a good place to start reading for those who are curious.
- The directory of example modules was moved to DMTCP_ROOT/test/module. This continues to support third-part wrappers around system calls, can registering functions to be called by DMTCP at interesting times (like pre-checkpoint, post-resume, post-restart, new thread created, etc.).
- This version of MTCP (inside this package) should be compatible with the checkpoint-restart service of Open MPI. The usage will be documented soon through the Open MPI web site. As before, an alternative is to simply start Open MPI inside DMTCP, and let DMTCP treat all of Open MPI as a "black box" that happens to be a ditributed computation
- A new --prefix command line flag has been added to dmtcp_checkpoint. It operates similarly to the flag of the same name in Open MPI. For distributed computations, remote processes will use the prefix as part of the path to find the remote dmtcp_checkpoint command. This is useful when a gateway machine has a different directory structure from the remote nodes.
- configure --enable-ptrace-support now uses ptrace module (more modular code). The ptrace module should also be more robust. It now fixes some additional cases that were missing earlier
- ./configure --enable-unique-checkpoint-filenames was not respecting bin/dmtcp_checkpoint --checkpoint-open-files . This is now fixed.
- If the coordinator received a kill request in the middle of a checkpoint, the coordinator could freeze or die. This has now been fixed, with the expected behavior: Kill the old computation that is in the middle of a checkpoint, and then allow any new computations to begin.
- dmtcp_inspector utility was broken in last release; now fixed
- configure --enable-forked-checkpoint was broken in the last release. It is fixed again.
- Many smaller bug fixes.
- The debian packages and rpm packages for OpenSUSE will be submitted to the distros over the next few days.

Posted by Kapil Arya 2012-01-23

DMTCP 1.2.4 to be released soon

We are currently working on a new DMTCP release which would feature stability fixes for multi-threaded processes and much more. We expect to put out the release on Thursday, Jan 19, 2012.

Posted by Kapil Arya 2012-01-15

DMTCP 1.2.3 Released

DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications. It operates directly on the user binary executable, without any Linux kernel modules or other kernel modifications.

This release is primarily a bug-fix release. Here are the Release Notes:

- Several bug fixes.
- Modifications added for compatibility with the checkpoint-restart service of OpenMPI (will be integrated with upcoming OpenMPI-1.6)
- Tests for emacs, vim and strace added to 'make check'
- When running emacs23 under GNU 'screen', it's not restored correctly. Currently we warn user to use emacs22. Emacs23 with 'screen' will be supported in future (and 'emacs23' continues to work fine standalone).
- Fixes a regression in which checkpointing 'gdb' with the required './configure --enable-ptrace-support' was failing. Works now.
- /proc/*/cmdline was not being restored correctly when: argc > 1 (Fixed.)
- debugging logic (primarily for DMTCP developers) was simplified so that changing CFLAGS in mtcp/Makefile to add '-DDEBUG' suffices to include MTCP debugging information. If --enable-debug is also configured, then a copy of MTCP debug information also goes into /tmp/dmtcp-USER@HOST/jassertlog.* .

Posted by Kapil Arya 2011-07-22

DMTCP 1.2.2 released

DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications. It operates directly on the user binary executable, without any Linux kernel modules or other kernel modifications.

Release Notes
- A new module system, allowing users to write their own extensions to DMTCP, including wrappers around library calls. See the module subdirectory for examples.
- ./configure --enable-m32 was not working in DMTCP 1.2.1. It works again now.
- more bug fixes and robustness testing. Tested on kernels ranging from Linux 2.6.5 to the latest kernel. Tested especially on the Linux distributions: Red Hat/Fedora, Debian/Ubuntu, SuSe/OpenSUSE; although we don't know of any Linux distributions where it fails to run.
- 'screen' did not checkpoint properly on machines using LDAP authentication. This could also affect processes using 'bash'. This has been fixed.
- Furthermore, recent versions of 'screen' began calling 'utempter' when present Support for 'utempter' and some other setuid processes has been added.
- Removed the requirement for libc.a in building DMTCP, since Red Hat does not include libc.a in its standard repository.
- ./configure --enable-ptrace now more robust. Still labelled "experimental" for this release. You will need to enable this if you want to checkpoint gdb sessions, programs running under strace, and certain other applications.
- ./configure --enable-fast-ckpt-restart can make ckpt/restart faster by using 'mmap'. You will need to set the environment variable DMTCP_GZIP to "0" if you use this. This feature is still experimental, and there are many other tricks for speeding up ckpt/restart. Please talk to the developers if this is important for your application.
- Experimental support added for HBICT ( hbict.sf.net ). This provides support for incremental and differential checkpointing. However, this is still ongoing work.
- Work has begun on improved support for process migration between different Linux kernels and distributions. Simple applications should migrate. Please talk to us if this feature is important to you.
- We do not yet support the 'epoll' and 'inotify' Linux system calls. Recently, there has been some demand for this, and we intend to raise the priority. Please talk to us if this feature is important to you.

Posted by Kapil Arya 2011-06-23

DMTCP 1.2.1 Released

DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications. It operates directly on the user binary executable, without any Linux kernel modules or other kernel modifications.

Release Notes
DMTCP 1.2.1 provides:
* Support for calling dmtcpaware API (dmtcpCheckpoint(), etc.) directly from inside a python session.
* The option for applications to use the dmtcpaware interface to link with a shared library (libdmtcpaware.so) instead of libdmtcpaware.a.
* Support for MPICH2 1.3.x (transparently checkpointing MPICH under DMTCP), as well as continuing the existing support for checkpointing OpenMPI.
* Support for running and checkpointing of binaries in non-privileged mode when the setuid/setgid bits of the binaries are set.
* Several bug fixes related to GNU screen.
* Experimental support for ptrace to allow checkpointing of gdb sessions, strace, and other ptrace-based aplications.
* On restart, restore original process name for 'ps' and /proc/self/cmdline.
* Additional bug fixes and enhancements.

Posted by Kapil Arya 2011-03-13

DMTCP 1.2.0 Released

DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpointing the state of an arbitrary group of programs spread across many machines and connected by sockets. It does not modify the user's program or the operating system.

Release Notes:
* This is a semi-major release. DMTCP now supports GNU screen.
* It also fixes some instabilities in checkpointing Matlab under certain environments.
* Numerous bug fixes were implemented as a part of review of DMTCP sub-systems.

Posted by Kapil Arya 2010-11-04