Re: [Scalablecr-discuss] Error: Invalid control directory INVALID
Brought to you by:
kathrynmohror,
moody20
From: Arjun J R. <rec...@gm...> - 2013-11-28 09:45:31
|
Correction : It is not SCR_NODELIST, it is SLURM_NODELIST. And I also run it inside an allocation provided by SLURM.. salloc -N2 bash and then run scr_srun -N2 -n4 abc I also got confused because the way to run MVAPICH2 jobs in SLURM is to just run the srun commands which takes care of the allocation on its own. But here, with scr_srun, I guess we have to do our own allocations. On Thu, Nov 28, 2013 at 1:23 PM, Arjun J Rao <rec...@gm...>wrote: > I have a 2-node setup on which I intend to test out scalable > checkpointing. > I keep getting the error "Could not identify nodeset" when I issue the > command scr_srun -N2 -n4 abc > > I found that the SCR_NODELIST environment variable needs to be set. What > format must the nodeset be provided in ? I could not find it in the pdf > manual or in the readmes. Setting SCR_NODELIST to abc[1-4] or to > abc1,abc2,abc3,abc4 both of which silence the "could not identify nodeset" > warnings. Which is the correct format to set the SCR_NODELIST variable to ? > > Also, after the "could not identify nodeset" warning was suppressed, I ran > scr_srun -N2 -n4 abc > to find this error > ""ERROR: Invalid control directory INVALID" > > I checked the /etc/scr.conf file and it was set to /tmp. I tried changing > it to /home/username/somedir but got the same error. I also tried setting > the SCR_CNTL_BASE environmental to point to /tmp and to > /home/username/somedir but got the same warning message. > > How can this issue be resolved ? > |