Hi,
I'm having problem with DMTCP 2.1 installed in VM in clouds, Amazon WS y Bonfire.
In both cases I'm testing by running bt benchmark (NAS) class A with 4 processes, 1 by node,
​using
OpenMpi  1.6.5. 

a) Amazon WS: 
​I'm using m1.small instances. ​
I get a segmentation fault when I try to checkpoint using the dmtcp_coordinator console.
This is the output in the app console.
 NAS Parallel Benchmarks 3.3 -- BT Benchmark 
 No input file inputbt.data. Using compiled defaults
 Size:  102x 102x 102
 Iterations:  200    dt:   0.0003000
 Number of active processes:     4

 Time step    1
 Time step   20
[56000] WARNING at jsocket.cpp:291 in readAll; REASON='JWARNING(cnt>=0) failed'
     sockfd() = 0
     cnt = -1
     len = 112
     (strerror((*__errno_location ()))) = Connection reset by peer
Message: JSocket read failure
[56000] ERROR at connectionidentifier.h:96 in assertValid; REASON='JASSERT(strcmp(sign, HANDSHAKE_SIGNATURE_MSG) == 0) failed'
     sign = 
Message: read invalid message, signature mismatch. (External socket?)
bt.B.4 (56000): Terminating...
Segmentation fault (core dumped)

This is the output in dmtcp_coordinator console
l
Client List:
#, PROG[virtPID:realPID]@HOST, DMTCP-UNIQUEPID, STATE
1, orterun[40000:3242]@master, 18af1fad8d756-40000-537450cc, RUNNING
18, orted_(forked)[52000:2060]@node003, 20385667ca0e709-52000-537450cd, RUNNING
19, orted_(forked)[53000:2071]@node001, 20385667ca0e707-53000-537450ce, RUNNING
22, orted_(forked)[55000:2059]@node002, 20385667ca0e708-55000-537450d0, RUNNING
26, bt.B.4[56000:3262]@master, 18af1fad8d756-56000-537450d0, RUNNING
27, bt.B.4[57000:2075]@node001, 20385667ca0e707-57000-537450d0, RUNNING
29, bt.B.4[58000:2063]@node002, 20385667ca0e708-58000-537450d0, RUNNING
30, bt.B.4[59000:2065]@node003, 20385667ca0e709-59000-537450d1, RUNNING
c
[3241] NOTE at dmtcp_coordinator.cpp:1256 in startCheckpoint; REASON='starting checkpoint, suspending all nodes'
     s.numPeers = 8
[3241] NOTE at dmtcp_coordinator.cpp:1258 in startCheckpoint; REASON='Incremented Generation'
     UniquePid::ComputationId().generation() = 1
[3241] NOTE at dmtcp_coordinator.cpp:613 in updateMinimumState; REASON='locking all nodes'
[3241] NOTE at dmtcp_coordinator.cpp:621 in updateMinimumState; REASON='draining all nodes'
[3241] NOTE at dmtcp_coordinator.cpp:627 in updateMinimumState; REASON='checkpointing all nodes'
[3241] NOTE at dmtcp_coordinator.cpp:881 in onDisconnect; REASON='client disconnected'
     client->identity() = 18af1fad8d756-40000-537450cc
[3241] NOTE at dmtcp_coordinator.cpp:881 in onDisconnect; REASON='client disconnected'
     client->identity() = 20385667ca0e709-52000-537450cd
[3241] NOTE at dmtcp_coordinator.cpp:881 in onDisconnect; REASON='client disconnected'
     client->identity() = 18af1fad8d756-56000-537450d0
[3241] NOTE at dmtcp_coordinator.cpp:881 in onDisconnect; REASON='client disconnected'
     client->identity() = 20385667ca0e708-55000-537450d0
[3241] NOTE at dmtcp_coordinator.cpp:881 in onDisconnect; REASON='client disconnected'
     client->identity() = 20385667ca0e707-53000-537450ce
[3241] WARNING at dmtcp_coordinator.cpp:1492 in writeRestartScript; REASON='JWARNING(symlinkat(uniqueFilename.c_str(), dirfd, filename.c_str()) == 0) failed'
[3241] NOTE at dmtcp_coordinator.cpp:641 in updateMinimumState; REASON='building name service database'
[3241] NOTE at dmtcp_coordinator.cpp:657 in updateMinimumState; REASON='entertaining queries now'
[3241] NOTE at dmtcp_coordinator.cpp:662 in updateMinimumState; REASON='refilling all nodes'
[3241] NOTE at dmtcp_coordinator.cpp:881 in onDisconnect; REASON='client disconnected'
     client->identity() = 20385667ca0e709-59000-537450d1
[3241] NOTE at dmtcp_coordinator.cpp:881 in onDisconnect; REASON='client disconnected'
     client->identity() = 20385667ca0e707-57000-537450d0
[3241] NOTE at dmtcp_coordinator.cpp:881 in onDisconnect; REASON='client disconnected'
     client->identity() = 20385667ca0e708-58000-537450d0

The output of "make check" is this environment is
Making all in plugin
== Tests ==
dmtcp1          ckpt: PASSED  rstr: FAILED  (first process rec'd signal 11)  retry: FAILED
                root-pids: [3876] msg: restart error, 1 expected, 0 found, running=0
dmtcp2          ckpt: PASSED  rstr: FAILED  (first process rec'd signal 11)  retry: FAILED
                root-pids: [3893] msg: restart error, 1 expected, 0 found, running=0
dmtcp3          ckpt: PASSED  rstr: FAILED  (first process rec'd signal 11)  retry: FAILED
                root-pids: [3918] msg: restart error, 1 expected, 0 found, running=0
dmtcp4          ckpt: PASSED  rstr: FAILED  (first process rec'd signal 11)  retry: FAILED
                root-pids: [3934] msg: restart error, 1 expected, 0 found, running=0
dmtcp5          ckpt: PASSED  rstr: FAILED  (first process rec'd signal 11)  retry: FAILED
                root-pids: [3958] msg: restart error, 2 expected, 0 found, running=0


b) Bonfire cloud.
mpirun is not able to finish when is running using dmtcp_launch. It is hanged. 
The VM are Debian. Kernel 2.6.32-5-xen-amd64.
model name : Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
The results of make check are similar
dmtcp1          ckpt: PASSED  rstr: FAILED  (first process rec'd signal 11)  retry: FAILED
                root-pids: [1565] msg: restart error, 1 expected, 0 found, running=0
dmtcp2          ckpt: PASSED  rstr: FAILED  (first process rec'd signal 11)  retry: FAILED
                root-pids: [1579] msg: restart error, 1 expected, 0 found, running=0
dmtcp3          ckpt: PASSED  rstr: FAILED  (first process rec'd signal 11)  retry: FAILED
                root-pids: [1602] msg: restart error, 1 expected, 0 found, running=0
dmtcp4          ckpt: PASSED  rstr: FAILED  (first process rec'd signal 11)  retry: FAILED
                root-pids: [1616] msg: restart error, 1 expected, 0 found, running=0


​​Any clues on what to look for is appreciated.
Thank you very much in advance.
Marcela