Re: [Scalablecr-discuss] Getting SCR running
Brought to you by:
kathrynmohror,
moody20
From: Kathryn M. <ka...@ll...> - 2013-09-30 16:27:14
|
Hi, > I am running SLURM and all the other stuff needed to run SCR on a single computer, as a way to evaluate SCR. I have a simple C counting program that I wish to run and take periodic checkpoints of. My executable name is "count" How can I get SCR to take periodic checkpoints ? I have specified two separate directories for the two kinds of checkpoint images > > When I installed SCR, I don't have any command such as scr_srun on my system. > When I type in scr_srun, I get the error "Command not found" Possibly you did these steps already, but just as a sanity check: Did you do a 'make install'? Did you set your path to point to the installation directory for SCR? setenv PATH /usr/local/tools/scr-1.1/bin:${PATH} > > The executables are stored in /usr/local/tools/scr-1.1/bin/ and when I execute just scr_srun using ./scr_srun I get > scr_srun: Started: Mon Sep 30 12:58:22 IST 2013 > scr_srun: ERROR: Could not identify node set scr_srun assumes you are executing in an allocation given to you by SLURM and that the environment variable SLURM_NODELIST is set. You can get an interactive partition with salloc, e.g. salloc -N 1 -p <queuename> If I recall correctly, you may have problems running on a single node. SCR will want to find locations for storing redundant checkpoints that are not on the same node. That way if a node goes down, the checkpoints are protected on the other node. I think that it will exit with an error message if you run on a single node. Hope that helps! Kathryn > > Nandaka Jojha > ______________________________________________________________ Kathryn Mohror, ka...@ll..., http://people.llnl.gov/mohror1 CASC @ Lawrence Livermore National Laboratory, Livermore, CA, USA |