Re: [Scalablecr-discuss] Running SCR with MVAPICH2 and BLCR
Brought to you by:
kathrynmohror,
moody20
From: Arjun J R. <rec...@gm...> - 2014-01-28 04:56:29
|
I have write permissions on the location of the control directory. I have put the control directory as /home/username/Control and I own the directory. The same directory has been set in the scr.conf file as well in the CNTLDIR as well as in the SCR_CNTL_BASE variable. I have a /tmp too, and have write permissions on that directory (In fact, I have made my username own that directory) My system runs Scientific Linux 6.4 On Tue, Jan 14, 2014 at 6:23 AM, Kathryn Mohror <ka...@ll...> wrote: > Hi Arjun, > > Sorry for the late reply -- I took some vacation. > > Just as a first guess, did you make sure that you have write permissions > on the location of the control directory? I believe (and the error message > supports this) that SCR_CNTL_BASE can't be set in the environment, but has > to be set in your SCR configuration file. It defaults to /tmp. Do you have > a /tmp? > > Kathryn > > On Dec 25, 2013, at 8:59 PM, Arjun J Rao <rec...@gm...> wrote: > > I understand that SCR was built to be used with custom application codes that > write their own checkpoints from within the application. However, MVAPICH2 > claims to have integrated SCR such that the checkpoints written by BLCR can > > be written to the parallel file system in a scalable manner later. However, I > am not currently writing out the checkpoints to a central location, but to a > local disk for testing. > > > I first installed BLCR and SLURM. Then I installed MVAPICH2 with the > following options : > ./configure --enable-ckpt --with-scr --with-pm=no --with-pmi=slurm > However, taking a simple MPI program and compiling using mpicc and then > running using srun yields the following errors : > > srun -N2 -n12 MPIExecutable > SCR v1.1-8 ABORT : rank 1 on machine2: Failed to create store descriptor > for control directory @ > src/mpid/ch3/channels/common/src/scr/scr_storedesc.c:299 > In: PMI_Abort(0,application called MPI_Abort(MPI_COMM_WORLD,0) - process 1) > SCR v1.1-8 ABORT : rank 0 on machine1: Failed to create store descriptor > for control directory @ > src/mpid/ch3/channels/common/src/scr/scr_storedesc.c:299 > In: PMI_Abort(0,application called MPI_Abort(MPI_COMM_WORLD,0) - process 0) > . > . > . > SCR v1.1-8 ERROR: rank 3 on machine1: SCR_CNTL_BASE cannot be set in the > environment or user configuration file, ignoring setting > SCR v1.1-8 ERROR: rank 1 on machine1: SCR_CNTL_BASE cannot be set in the > environment or user configuration file, ignoring setting > . > . > . > slurmd[machine1]: ***STEP 195.1 KILLED AT 2013-12-24T10:11:23 WITH SIGNAL > 9*** > srun: Job step aborted: Waiting upto 2 seconds for job step to finish > > slurmd[machine1]: *** STEP 195.1 KILLED AT 2013-12-24T10:11:23 WITH SIGNAL > 9**** > slurmd[machine2]: *** STEP 195.1 KILLED AT 2013-12-24T10:11:23 WITH SIGNAL > 9**** > . > . > . > > There was actually a lot of output but i've just printed only one version > of each of the message types in the output.I have set SCR_CNTL_BASE, SCR_RUNS, > SCR_CACHE_BASE, SCR_PREFIX and SCR_FLUSH. What could be wrong with the > configuration or the environment of SCR to yield such errors ? > > > ------------------------------------------------------------------------------ > Rapidly troubleshoot problems before they affect your business. Most IT > organizations don't have a clear picture of how application performance > affects their revenue. With AppDynamics, you get 100% visibility into your > Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics > Pro! > > http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk_______________________________________________ > Scalablecr-discuss mailing list > Sca...@li... > https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss > > > _________________________________________________________________ > Kathryn Mohror, ka...@ll..., http://scalability.llnl.gov/ > Scalability Team @ Lawrence Livermore National Laboratory, Livermore, CA, > USA > > > > > > > > > |