From: Yangyang Yi <yy...@si...> - 2020-11-23 08:33:42
|
Sorry for the late reply. Here’s the log from the job begin to the job end. run.stdout: 00001: RUNNING PROTOCOL ----------------- 00002: HostName: headnode.cm.cluster 00003: PID: 209177 00004: Scipion: v2.0 (2019-04-23) Diocletian 00005: currentDir: /ddn/users/spadm/ScipionUserData/projects/relion_benchmark 00006: workingDir: Runs/000347_ProtRelionClassify3D 00007: runMode: Continue 00008: MPI: 5 00009: threads: 2 00010: len(steps) 3 len(prevSteps) 0 00011: Starting at step: 1 00012: Running steps 00013: STARTED: convertInputStep, step 1 00014: 2020-11-12 13:46:13.708639 00015: Converting set from 'Runs/000002_ProtImportParticles/particles.sqlite' into 'Runs/000347_ProtRelionClassify3D/input_particles.star' 00016: convertBinaryFiles: creating soft links. 00017: Root: Runs/000347_ProtRelionClassify3D/extra/input -> Runs/000002_ProtImportParticles/extra 00018: FINISHED: convertInputStep, step 1 00019: 2020-11-12 13:46:48.416238 00020: STARTED: runRelionStep, step 2 00021: 2020-11-12 13:46:48.438416 00022: ** Submiting to queue: 'sbatch /ddn/users/spadm/ScipionUserData/projects/relion_benchmark/Runs/000347_ProtRelionClassify3D/logs/347-0-1.job' 00023: launched job with id 2552 00024: FINISHED: runRelionStep, step 2 00025: 2020-11-12 13:46:48.524619 00026: STARTED: createOutputStep, step 3 00027: 2020-11-12 13:46:48.973668 00028: Traceback (most recent call last): 00029: File "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/executor.py", line 151, in run 00030: self.step._run() # not self.step.run() , to avoid race conditions 00031: File "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 237, in _run 00032: resultFiles = self._runFunc() 00033: File "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 233, in _runFunc 00034: return self._func(*self._args) 00035: File "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", line 77, in createOutputStep 00036: self._fillClassesFromIter(classes3D, self._lastIter()) 00037: File "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", line 176, in _fillClassesFromIter 00038: self._loadClassesInfo(iteration) 00039: File "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", line 166, in _loadClassesInfo 00040: self._getFileName('model', iter=iteration)) 00041: File "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 841, in _getFileName 00042: return self.__filenamesDict[key] % kwargs 00043: TypeError: %d format: a number is required, not NoneType 00044: Protocol failed: %d format: a number is required, not NoneType 00045: FAILED: createOutputStep, step 3 00046: 2020-11-12 13:46:48.991279 00047: *** Last status is failed 00048: ------------------- PROTOCOL FAILED (DONE 3/3) run.log: 2020-11-12 13:46:48.438416 00020: 2020-11-12 13:46:48,972 INFO: FINISHED: runRelionStep, step 2 00021: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.524619 00022: 2020-11-12 13:46:48,973 INFO: STARTED: createOutputStep, step 3 00023: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.973668 00024: 2020-11-12 13:46:49,485 ERROR: Protocol failed: %d format: a number is required, not NoneType 00025: 2020-11-12 13:46:49,508 INFO: FAILED: createOutputStep, step 3 00026: 2020-11-12 13:46:49,508 INFO: 2020-11-12 13:46:48.991279 00027: 2020-11-12 13:46:49,570 INFO: ------------------- PROTOCOL FAILED (DONE 3/3) > 2020年11月12日 上午2:19,Grigory Sharov <sha...@gm...> 写道: > > Hi Yangyang, > > I've tried your config with Scipion2 and it seems to work fine. The only problem I found was using curly quotes (“) instead of straight ones (") in the queues dictionary. Did you get the error message after the job was submitted and started to run or before? > > Best regards, > Grigory > > -------------------------------------------------------------------------------- > Grigory Sharov, Ph.D. > > MRC Laboratory of Molecular Biology, > Francis Crick Avenue, > Cambridge Biomedical Campus, > Cambridge CB2 0QH, UK. > tel. +44 (0) 1223 267228 <tel:+44%201223%20267228> > e-mail: gs...@mr... <mailto:gs...@mr...> > > > On Wed, Nov 11, 2020 at 9:20 AM Yangyang Yi <yy...@si... <mailto:yy...@si...>> wrote: > Dear Scipion users & devs, > > I am kindly asking for your advice. > > Now we are trying to set Scipion-2.0 on a slurm cluster. It could run on single machine but failed to submit the jobs to queue. Slurm cluster works well and running scipion on single node works. > > Here’s our settings for host.conf: > > host.conf: > [localhost] > PARALLEL_COMMAND = mpirun -np %_(JOB_NODES)d -bynode %_(COMMAND)s > NAME = SLURM > MANDATORY = 0 > SUBMIT_COMMAND = sbatch %_(JOB_SCRIPT)s > CANCEL_COMMAND = scancel %_(JOB_ID)s > CHECK_COMMAND = squeue -j %_(JOB_ID)s > SUBMIT_TEMPLATE = #!/bin/bash > ####SBATCH --export=ALL > #SBATCH -p %_(JOB_QUEUE)s > #SBATCH -J %_(JOB_NAME)s > #SBATCH -o %_(JOB_SCRIPT)s.out > #SBATCH -e %_(JOB_SCRIPT)s.err > #SBATCH --time=%_(JOB_TIME)s:00:00 > #SBATCH --nodes=1 > #SBATCH --ntasks=%_(JOB_NODES)d > #SBATCH --cpus-per-task=%_(JOB_THREADS)d > WORKDIR=$SLURM_JOB_SUBMIT_DIR > export XMIPP_IN_QUEUE=1 > cd $WORKDIR > # Make a copy of node file > echo $SLURM_JOB_NODELIST > %_(JOB_NODEFILE)s > ### Display the job context > echo Running on host `hostname` > echo Time is `date` > echo Working directory is `pwd` > echo $SLURM_JOB_NODELIST > echo CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES > ################################# > %_(JOB_COMMAND)s > find "$SLURM_SUBMIT_DIR" -type f -user $USER -perm 644 -exec chmod 664 {} + > QUEUES = { > “a": [["JOB_TIME", "48", "Time (hours)", "Select the time expected (in hours) for this job"], > ["NODES","1", "Nodes", "How many nodes required for all the nodes"], > ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual jobs to queue"]], > “b": [["JOB_TIME", "48", "Time (hours)", "Select the time expected (in hours) for this job"], > ["NODES","1", "Nodes", "How many nodes required for all the nodes"], > ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual jobs to queue"]], > “c": [["JOB_MEMORY", "8192", "Memory (MB)", "Select amount of memory (in megabytes) for this job"], > ["JOB_TIME", "48", "Time (hours)", "Select the time expected (in hours) for this job"], > ["NODES","1", "Nodes", "How many nodes required for all the nodes"], > ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual jobs to queue"]] > } > JOB_DONE_REGEX = > > And the Scipion reports: > typeerror: %d format: a number is required, not nonetype > > And suggestions about how to solve the problem? Thanks! > _______________________________________________ > scipion-users mailing list > sci...@li... <mailto:sci...@li...> > https://lists.sourceforge.net/lists/listinfo/scipion-users <https://lists.sourceforge.net/lists/listinfo/scipion-users> > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users |