Re: [Scalablecr-discuss] Running SCR with MVAPICH2 and BLCR
Brought to you by:
kathrynmohror,
moody20
|
From: Arjun J R. <rec...@gm...> - 2014-01-28 04:56:29
|
I have write permissions on the location of the control directory. I have
put the control directory as /home/username/Control and I own the
directory. The same directory has been set in the scr.conf file as well in
the CNTLDIR as well as in the SCR_CNTL_BASE variable. I have a /tmp too,
and have write permissions on that directory (In fact, I have made my
username own that directory)
My system runs Scientific Linux 6.4
On Tue, Jan 14, 2014 at 6:23 AM, Kathryn Mohror <ka...@ll...> wrote:
> Hi Arjun,
>
> Sorry for the late reply -- I took some vacation.
>
> Just as a first guess, did you make sure that you have write permissions
> on the location of the control directory? I believe (and the error message
> supports this) that SCR_CNTL_BASE can't be set in the environment, but has
> to be set in your SCR configuration file. It defaults to /tmp. Do you have
> a /tmp?
>
> Kathryn
>
> On Dec 25, 2013, at 8:59 PM, Arjun J Rao <rec...@gm...> wrote:
>
> I understand that SCR was built to be used with custom application codes that
> write their own checkpoints from within the application. However, MVAPICH2
> claims to have integrated SCR such that the checkpoints written by BLCR can
>
> be written to the parallel file system in a scalable manner later. However, I
> am not currently writing out the checkpoints to a central location, but to a
> local disk for testing.
>
>
> I first installed BLCR and SLURM. Then I installed MVAPICH2 with the
> following options :
> ./configure --enable-ckpt --with-scr --with-pm=no --with-pmi=slurm
> However, taking a simple MPI program and compiling using mpicc and then
> running using srun yields the following errors :
>
> srun -N2 -n12 MPIExecutable
> SCR v1.1-8 ABORT : rank 1 on machine2: Failed to create store descriptor
> for control directory @
> src/mpid/ch3/channels/common/src/scr/scr_storedesc.c:299
> In: PMI_Abort(0,application called MPI_Abort(MPI_COMM_WORLD,0) - process 1)
> SCR v1.1-8 ABORT : rank 0 on machine1: Failed to create store descriptor
> for control directory @
> src/mpid/ch3/channels/common/src/scr/scr_storedesc.c:299
> In: PMI_Abort(0,application called MPI_Abort(MPI_COMM_WORLD,0) - process 0)
> .
> .
> .
> SCR v1.1-8 ERROR: rank 3 on machine1: SCR_CNTL_BASE cannot be set in the
> environment or user configuration file, ignoring setting
> SCR v1.1-8 ERROR: rank 1 on machine1: SCR_CNTL_BASE cannot be set in the
> environment or user configuration file, ignoring setting
> .
> .
> .
> slurmd[machine1]: ***STEP 195.1 KILLED AT 2013-12-24T10:11:23 WITH SIGNAL
> 9***
> srun: Job step aborted: Waiting upto 2 seconds for job step to finish
>
> slurmd[machine1]: *** STEP 195.1 KILLED AT 2013-12-24T10:11:23 WITH SIGNAL
> 9****
> slurmd[machine2]: *** STEP 195.1 KILLED AT 2013-12-24T10:11:23 WITH SIGNAL
> 9****
> .
> .
> .
>
> There was actually a lot of output but i've just printed only one version
> of each of the message types in the output.I have set SCR_CNTL_BASE, SCR_RUNS,
> SCR_CACHE_BASE, SCR_PREFIX and SCR_FLUSH. What could be wrong with the
> configuration or the environment of SCR to yield such errors ?
>
>
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
> Pro!
>
> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk_______________________________________________
> Scalablecr-discuss mailing list
> Sca...@li...
> https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss
>
>
> _________________________________________________________________
> Kathryn Mohror, ka...@ll..., http://scalability.llnl.gov/
> Scalability Team @ Lawrence Livermore National Laboratory, Livermore, CA,
> USA
>
>
>
>
>
>
>
>
>
|