From: Yangyang Yi <yy...@si...> - 2020-11-11 09:19:56
|
Dear Scipion users & devs, I am kindly asking for your advice. Now we are trying to set Scipion-2.0 on a slurm cluster. It could run on single machine but failed to submit the jobs to queue. Slurm cluster works well and running scipion on single node works. Here’s our settings for host.conf: host.conf: [localhost] PARALLEL_COMMAND = mpirun -np %_(JOB_NODES)d -bynode %_(COMMAND)s NAME = SLURM MANDATORY = 0 SUBMIT_COMMAND = sbatch %_(JOB_SCRIPT)s CANCEL_COMMAND = scancel %_(JOB_ID)s CHECK_COMMAND = squeue -j %_(JOB_ID)s SUBMIT_TEMPLATE = #!/bin/bash ####SBATCH --export=ALL #SBATCH -p %_(JOB_QUEUE)s #SBATCH -J %_(JOB_NAME)s #SBATCH -o %_(JOB_SCRIPT)s.out #SBATCH -e %_(JOB_SCRIPT)s.err #SBATCH --time=%_(JOB_TIME)s:00:00 #SBATCH --nodes=1 #SBATCH --ntasks=%_(JOB_NODES)d #SBATCH --cpus-per-task=%_(JOB_THREADS)d WORKDIR=$SLURM_JOB_SUBMIT_DIR export XMIPP_IN_QUEUE=1 cd $WORKDIR # Make a copy of node file echo $SLURM_JOB_NODELIST > %_(JOB_NODEFILE)s ### Display the job context echo Running on host `hostname` echo Time is `date` echo Working directory is `pwd` echo $SLURM_JOB_NODELIST echo CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES ################################# %_(JOB_COMMAND)s find "$SLURM_SUBMIT_DIR" -type f -user $USER -perm 644 -exec chmod 664 {} + QUEUES = { “a": [["JOB_TIME", "48", "Time (hours)", "Select the time expected (in hours) for this job"], ["NODES","1", "Nodes", "How many nodes required for all the nodes"], ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual jobs to queue"]], “b": [["JOB_TIME", "48", "Time (hours)", "Select the time expected (in hours) for this job"], ["NODES","1", "Nodes", "How many nodes required for all the nodes"], ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual jobs to queue"]], “c": [["JOB_MEMORY", "8192", "Memory (MB)", "Select amount of memory (in megabytes) for this job"], ["JOB_TIME", "48", "Time (hours)", "Select the time expected (in hours) for this job"], ["NODES","1", "Nodes", "How many nodes required for all the nodes"], ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual jobs to queue"]] } JOB_DONE_REGEX = And the Scipion reports: typeerror: %d format: a number is required, not nonetype And suggestions about how to solve the problem? Thanks! |