From: Robert W. L. <rw...@bu...> - 2013-06-12 22:05:28
|
Well, I did the lsof and there was no output. Apparently the coordinator is dying... I'm going to try to run some more tests to see if it's something I'm doing script-wise when I start the job (my pbs script is in perl). I was starting it with backticks, then I tried fork/exec. In both cases, the coordinator is dying before I issue the dmtcp_checkpoint command. Rob On Jun 12, 2013, at Jun12, 5:31 PM, Gene Cooperman wrote: > Hi Robert, > Thanks for writing. It's not obvious to me what's happening. > But here's a quick question, for diagnosing it. > After starting the coordinator, could you run: > lsof | grep dmtcp_coo > Alternatively, could you try: lsof | grep <PORT_NUM> > where PORT_NUM is the supposed port number of the coordinator? > > Let's verify that the coordinator is truly listening on the port > that it says it is. > > Kapil, > Could you please check in your code with the --port-file option? > Then we can make sure that we're all testing a common source, and there > is no issue about different versions. > Also, I presume you've already tested something similar to what > Robert is doing below. Is that correct? > > Thanks, > - Gene > > On Wed, Jun 12, 2013 at 05:17:26PM -0400, Robert William Leach wrote: >> Hi, >> >> For the life of me, I cannot figure out why, when I run dmtcp_checkpoint, I get an error about not being able to connect to the coordinator. Here are snippets from my script - it's all in 1 script - and the output I get from each of these commands. Help? >> >> dmtcp_coordinator --port 0 --background --exit-on-last --port-file /panfs/panfs.ccr.buffalo.edu/projects/ccrstaff/rwleach/PROJECT/CRPC/MACS/LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme.memeout-8cores.port --ckptdir /panfs/panfs.ccr.buffalo.edu/projects/ccrstaff/rwleach/PROJECT/CRPC/MACS/LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme.memeout-8cores.ckpt1 --tmpdir /panasas/scratch/rwleach/tmp >> >> dmtcp_coordinator starting... >> Port: 34511 >> Checkpoint Interval: disabled (checkpoint manually instead) >> Exit on last client: 1 >> The port number was written to file (/panfs/panfs.ccr.buffalo.edu/projects/ccrstaff/rwleach/PROJECT/CRPC/MACS/LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme.memeout-8cores.port) >> Backgrounding... >> >> dmtcp_checkpoint --no-gzip --join --port 34511 --tmpdir /panasas/scratch/rwleach/tmp --ckptdir /panfs/panfs.ccr.buffalo.edu/projects/ccrstaff/rwleach/PROJECT/CRPC/MACS/LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme.memeout-8cores.ckpt1 --quiet /util/meme/4.6.0/bin/meme.bin LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme -dna -mod zoops -minw 6 -maxw 25 -revcomp -nostatus -p 8 -o LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme.memepeak150-8cores -maxsize 30000000 1> /panfs/panfs.ccr.buffalo.edu/projects/ccrstaff/rwleach/PROJECT/CRPC/MACS/LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme.memeout-8cores 2> /panfs/panfs.ccr.buffalo.edu/projects/ccrstaff/rwleach/PROJECT/CRPC/MACS/LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme.memeout-8cores.err & >> >> [15030] ERROR at dmtcpcoordinatorapi.cpp:81 in createNewConnectionToCoordinator; REASON='JASSERT(fd.isValid()) failed' >> coordinatorAddr = d06n40b.ccr.buffalo.edu >> coordinatorPort = 34511 >> Message: Failed to connect to DMTCP coordinator >> meme.bin (15030): Terminating... >> >> env | grep DMTCP >> >> DMTCP_HOST=d06n40b.ccr.buffalo.edu >> DMTCP=/util/dmtcp/1.2.7 >> DMTCP_CHECKPOINT_DIR=/panfs/panfs.ccr.buffalo.edu/projects/ccrstaff/rwleach/PROJECT/CRPC/MACS/LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme.memeout-8cores. >> ckpt1 >> DMTCP_GZIP=0 >> DMTCP_TMPDIR=/panasas/scratch/rwleach/tmp >> >> >> http://SwingBuffalo.com/ >> - Phone Swing Buffalo or sign up for our email list via the contact page on our website! >> http://RhythmShuffle.com/ >> http://LindyFix.com/ >> >> > >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by Windows: >> >> Build for Windows Store. >> >> http://p.sf.net/sfu/windows-dev2dev > >> _______________________________________________ >> Dmtcp-forum mailing list >> Dmt...@li... >> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum > > |