I'm cc'ing to dmtcp-forum@..., since this doesn't
appear to be private, and other DMTCP users might also benefit from
This can definitely be done. The extra concept that you're looking for
is that each DMTCP coordinator is associated with a DMTCP computation.
You can launch several DMTCP coordinators on a single host. You just
need to ensure that each DMTCP coordinator listens on a separate port.
When you wnat to checkpoint a job, you do something like:
dmtcp_checkpoint --port PORT1 ./a.out arg1 arg2
dmtcp_checkpoint --port PORT2./b.out arg1 arg2
where the final PORT for dmtcp_checkpoint refers to the coordinator
that you want to connect to.
It also works to do this in advance.
dmtcp_coordinator --port PORT1
dmtcp_coordinator --port PORT2
dmtcp_coordinator --port PORT3
Then more than one process can connect to a single coordinator and
become united in a single DMTCP computation for that coordinator.
A checkpoint request to that coordinator checkpoints all processes
that connected to it via 'dmtcp_checkpoint --port PORT'
'dmtcp_checkpoint --host HOST --port PORT' is available for distributed
computations with a single coordinator.
Hope this helps. Let us know if anything isn't clear.
On Tue, Jun 04, 2013 at 03:33:14PM -0400, Robert William Leach wrote:
> I have a question. Is it possible to have multiple unrelated independent jobs running on the same node, each checkpointing at its own time with separate checkpoint directories? If so, I'm curious how the checkpoint command knows which process is which. If not, is there a way to do it? I frequently manage multiple 1-core jobs and I want them to be able to get on any node organically.
> Robert W. Leach
> Computational Biologist
> Center for Computational Research
> Center of Excellence in Bioinformatics
> University at Buffalo
> Work: 881-7516
Get latest updates about Open Source Projects, Conferences and News.