From: Grigory S. <sha...@gm...> - 2020-11-11 18:20:21
|
Hi Yangyang, I've tried your config with Scipion2 and it seems to work fine. The only problem I found was using curly quotes (“) instead of straight ones (") in the queues dictionary. Did you get the error message after the job was submitted and started to run or before? Best regards, Grigory -------------------------------------------------------------------------------- Grigory Sharov, Ph.D. MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK. tel. +44 (0) 1223 267228 <+44%201223%20267228> e-mail: gs...@mr... On Wed, Nov 11, 2020 at 9:20 AM Yangyang Yi <yy...@si...> wrote: > Dear Scipion users & devs, > > I am kindly asking for your advice. > > Now we are trying to set Scipion-2.0 on a slurm cluster. It could run on > single machine but failed to submit the jobs to queue. Slurm cluster works > well and running scipion on single node works. > > Here’s our settings for host.conf: > > host.conf: > [localhost] > PARALLEL_COMMAND = mpirun -np %_(JOB_NODES)d -bynode %_(COMMAND)s > NAME = SLURM > MANDATORY = 0 > SUBMIT_COMMAND = sbatch %_(JOB_SCRIPT)s > CANCEL_COMMAND = scancel %_(JOB_ID)s > CHECK_COMMAND = squeue -j %_(JOB_ID)s > SUBMIT_TEMPLATE = #!/bin/bash > ####SBATCH --export=ALL > #SBATCH -p %_(JOB_QUEUE)s > #SBATCH -J %_(JOB_NAME)s > #SBATCH -o %_(JOB_SCRIPT)s.out > #SBATCH -e %_(JOB_SCRIPT)s.err > #SBATCH --time=%_(JOB_TIME)s:00:00 > #SBATCH --nodes=1 > #SBATCH --ntasks=%_(JOB_NODES)d > #SBATCH --cpus-per-task=%_(JOB_THREADS)d > WORKDIR=$SLURM_JOB_SUBMIT_DIR > export XMIPP_IN_QUEUE=1 > cd $WORKDIR > # Make a copy of node file > echo $SLURM_JOB_NODELIST > %_(JOB_NODEFILE)s > ### Display the job context > echo Running on host `hostname` > echo Time is `date` > echo Working directory is `pwd` > echo $SLURM_JOB_NODELIST > echo CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES > ################################# > %_(JOB_COMMAND)s > find "$SLURM_SUBMIT_DIR" -type f -user $USER -perm 644 -exec chmod > 664 {} + > QUEUES = { > “a": [["JOB_TIME", "48", "Time (hours)", "Select the time expected > (in hours) for this job"], > ["NODES","1", "Nodes", "How many nodes required for all the > nodes"], > ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual > jobs to queue"]], > “b": [["JOB_TIME", "48", "Time (hours)", "Select the time expected > (in hours) for this job"], > ["NODES","1", "Nodes", "How many nodes required for all the > nodes"], > ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual > jobs to queue"]], > “c": [["JOB_MEMORY", "8192", "Memory (MB)", "Select amount of > memory (in megabytes) for this job"], > ["JOB_TIME", "48", "Time (hours)", "Select the time expected (in > hours) for this job"], > ["NODES","1", "Nodes", "How many nodes required for all the > nodes"], > ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual > jobs to queue"]] > } > JOB_DONE_REGEX = > > And the Scipion reports: > typeerror: %d format: a number is required, not nonetype > > And suggestions about how to solve the problem? Thanks! > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > |