Hi Marina,

Sorry for the delay in replying.  Can you attach to the running process using gdb and provide a backtrace of all the threads. You can try "thread apply all bt" command in the gdb window and it will stacktrace for all the threads.

Kapil


On Sun, May 18, 2014 at 1:45 PM, Marina Moran <esperandoelmilagro@gmail.com> wrote:
Hi all:

I am using DMTCP over Debian Linux, and the NAS bechmark suite. I have
2 PCs with Debian GNU/Linux wheezy/sid (and OpenMPI), each PC has an
Intel Core(TM) i5 CPU 750  de 2.67GHz of 4 cores.

When I run the benchmarks over 2 computers, when the program finished
one process keep running on the coordinator, and the program doesn't
go back to the system symbol.
So, I start the program with 8 cores, the DMTCP has the 8 process,
does the checkpoints, but when the program finished on process keep
running, and actually keep doing checkpointing of it single process at
the specified interval.

What I am doing is at this point, press "k" at the coordinator to kill
the remaining process. The remaining process is orterun, as we can see
from the paste below.

Any idea whay I have this issue? I really would like to remove this behavior.


[4423] NOTE at dmtcp_coordinator.cpp:643 in onData; REASON='locking all nodes'
[4423] NOTE at dmtcp_coordinator.cpp:678 in onData; REASON='draining all nodes'
[4423] NOTE at dmtcp_coordinator.cpp:684 in onData;
REASON='checkpointing all nodes'
[4423] NOTE at dmtcp_coordinator.cpp:694 in onData; REASON='building
name service database'
[4423] NOTE at dmtcp_coordinator.cpp:713 in onData;
REASON='entertaining queries now'
[4423] NOTE at dmtcp_coordinator.cpp:718 in onData; REASON='refilling all nodes'
[4423] NOTE at dmtcp_coordinator.cpp:747 in onData; REASON='restarting
all nodes'
[4423] NOTE at dmtcp_coordinator.cpp:919 in onDisconnect;
REASON='client disconnected'
     client.identity() = 111ee9e3-4434-5378c252
[4423] NOTE at dmtcp_coordinator.cpp:919 in onDisconnect;
REASON='client disconnected'
     client.identity() = 111ee9e3-4439-5378c252
[4423] NOTE at dmtcp_coordinator.cpp:919 in onDisconnect;
REASON='client disconnected'
     client.identity() = 111ee9e3-4436-5378c252
[4423] NOTE at dmtcp_coordinator.cpp:919 in onDisconnect;
REASON='client disconnected'
     client.identity() = 111ee9e3-3230-5378c255
[4423] NOTE at dmtcp_coordinator.cpp:919 in onDisconnect;
REASON='client disconnected'
     client.identity() = 111ee9e3-3234-5378c255
[4423] NOTE at dmtcp_coordinator.cpp:919 in onDisconnect;
REASON='client disconnected'
     client.identity() = 111ee9e3-4442-5378c252
[4423] NOTE at dmtcp_coordinator.cpp:919 in onDisconnect;
REASON='client disconnected'
     client.identity() = 111ee9e3-3228-5378c254
[4423] NOTE at dmtcp_coordinator.cpp:919 in onDisconnect;
REASON='client disconnected'
     client.identity() = 111ee9e3-3236-5378c255
[4423] NOTE at dmtcp_coordinator.cpp:919 in onDisconnect;
REASON='client disconnected'
     client.identity() = 111ee9e3-3232-5378c255
l
Client List:
#, PROG[PID]@HOST, DMTCP-UNIQUEPID, STATE
1, orterun[4427]@debian-testing-marina, 111ee9e3-4427-5378c252, RUNNING
k
[4423] NOTE at dmtcp_coordinator.cpp:571 in handleUserCommand;
REASON='Killing all connected Peers...'
[4423] NOTE at dmtcp_coordinator.cpp:919 in onDisconnect;
REASON='client disconnected'
     client.identity() = 111ee9e3-4427-5378c252


Thanks in advance!!
Regards
Marina

------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum