Re: [Scalablecr-discuss] Error: Invalid control directory INVALID
Brought to you by:
kathrynmohror,
moody20
From: Moody, A. T. <mo...@ll...> - 2013-12-02 20:43:34
|
Hi Arjun, Right, you need to run scr_srun within a SLURM allocation, so an salloc command is first required to get the SLURM allocation. We'll work to clarify that detail in future documentation. As for the second error, SCR should default to set the control directory to use /tmp. You can override this during the configure step. If you did that, can you please cut-and-paste the configure line you used to configure SCR? Also, it would help to increase the debug verbosity, which you can do by setting the SCR_DEBUG environment variable to 1 or perhaps 2. Can you do that and send along the output? -Adam ________________________________ From: Arjun J Rao [rec...@gm...] Sent: Thursday, November 28, 2013 1:45 AM To: sca...@li... Subject: Re: [Scalablecr-discuss] Error: Invalid control directory INVALID Correction : It is not SCR_NODELIST, it is SLURM_NODELIST. And I also run it inside an allocation provided by SLURM.. salloc -N2 bash and then run scr_srun -N2 -n4 abc I also got confused because the way to run MVAPICH2 jobs in SLURM is to just run the srun commands which takes care of the allocation on its own. But here, with scr_srun, I guess we have to do our own allocations. On Thu, Nov 28, 2013 at 1:23 PM, Arjun J Rao <rec...@gm...<mailto:rec...@gm...>> wrote: I have a 2-node setup on which I intend to test out scalable checkpointing. I keep getting the error "Could not identify nodeset" when I issue the command scr_srun -N2 -n4 abc I found that the SCR_NODELIST environment variable needs to be set. What format must the nodeset be provided in ? I could not find it in the pdf manual or in the readmes. Setting SCR_NODELIST to abc[1-4] or to abc1,abc2,abc3,abc4 both of which silence the "could not identify nodeset" warnings. Which is the correct format to set the SCR_NODELIST variable to ? Also, after the "could not identify nodeset" warning was suppressed, I ran scr_srun -N2 -n4 abc to find this error ""ERROR: Invalid control directory INVALID" I checked the /etc/scr.conf file and it was set to /tmp. I tried changing it to /home/username/somedir but got the same error. I also tried setting the SCR_CNTL_BASE environmental to point to /tmp and to /home/username/somedir but got the same warning message. How can this issue be resolved ? |