From: Pablo C. <pc...@cn...> - 2020-11-23 08:49:44
|
I think job went in! I see here more an issue when loading the starfile. Maybe iteration is None? https://github.com/scipion-em/scipion-em-relion/blob/support/relion/protocols/protocol_classify3d.py#L166 Grigory, Jose Miguel? On 23/11/20 9:32, Yangyang Yi wrote: > Sorry for the late reply. Here’s the log from the job begin to the job > end. > run.stdout: > > 00001: RUNNING PROTOCOL ----------------- > 00002: HostName: headnode.cm.cluster > 00003: PID: 209177 > 00004: Scipion: v2.0 (2019-04-23) Diocletian > 00005: currentDir: > /ddn/users/spadm/ScipionUserData/projects/relion_benchmark > 00006: workingDir: Runs/000347_ProtRelionClassify3D > 00007: runMode: Continue > 00008: MPI: 5 > 00009: threads: 2 > 00010: len(steps) 3 len(prevSteps) 0 > 00011: Starting at step: 1 > 00012: Running steps > 00013: STARTED: convertInputStep, step 1 > 00014: 2020-11-12 13:46:13.708639 > 00015: Converting set from > 'Runs/000002_ProtImportParticles/particles.sqlite' into > 'Runs/000347_ProtRelionClassify3D/input_particles.star' > 00016: convertBinaryFiles: creating soft links. > 00017: Root: Runs/000347_ProtRelionClassify3D/extra/input -> > Runs/000002_ProtImportParticles/extra > 00018: FINISHED: convertInputStep, step 1 > 00019: 2020-11-12 13:46:48.416238 > 00020: STARTED: runRelionStep, step 2 > 00021: 2020-11-12 13:46:48.438416 > 00022: ** Submiting to queue: 'sbatch > /ddn/users/spadm/ScipionUserData/projects/relion_benchmark/Runs/000347_ProtRelionClassify3D/logs/347-0-1.job' > 00023: launched job with id 2552 > 00024: FINISHED: runRelionStep, step 2 > 00025: 2020-11-12 13:46:48.524619 > 00026: STARTED: createOutputStep, step 3 > 00027: 2020-11-12 13:46:48.973668 > 00028: Traceback (most recent call last): > 00029: File > "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/executor.py", line > 151, in run > 00030: self.step._run() # not self.step.run() , to avoid race conditions > 00031: File > "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line > 237, in _run > 00032: resultFiles = self._runFunc() > 00033: File > "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line > 233, in _runFunc > 00034: return self._func(*self._args) > 00035: File > "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", > line 77, in createOutputStep > 00036: self._fillClassesFromIter(classes3D, self._lastIter()) > 00037: File > "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", > line 176, in _fillClassesFromIter > 00038: self._loadClassesInfo(iteration) > 00039: File > "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", > line 166, in _loadClassesInfo > 00040: self._getFileName('model', iter=iteration)) > 00041: File > "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line > 841, in _getFileName > 00042: return self.__filenamesDict[key] % kwargs > 00043: TypeError: %d format: a number is required, not NoneType > 00044: Protocol failed: %d format: a number is required, not NoneType > 00045: FAILED: createOutputStep, step 3 > 00046: 2020-11-12 13:46:48.991279 > 00047: *** Last status is failed > 00048: ------------------- PROTOCOL FAILED (DONE 3/3) > > run.log: > 2020-11-12 13:46:48.438416 > 00020: 2020-11-12 13:46:48,972 INFO: FINISHED: runRelionStep, step 2 > 00021: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.524619 > 00022: 2020-11-12 13:46:48,973 INFO: STARTED: createOutputStep, step 3 > 00023: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.973668 > 00024: 2020-11-12 13:46:49,485 ERROR: Protocol failed: %d format: a > number is required, not NoneType > 00025: 2020-11-12 13:46:49,508 INFO: FAILED: createOutputStep, step 3 > 00026: 2020-11-12 13:46:49,508 INFO: 2020-11-12 13:46:48.991279 > 00027: 2020-11-12 13:46:49,570 INFO: ------------------- PROTOCOL > FAILED (DONE 3/3) > > > >> 2020年11月12日 上午2:19,Grigory Sharov <sha...@gm... >> <mailto:sha...@gm...>> 写道: >> >> Hi Yangyang, >> >> I've tried your config with Scipion2 and it seems to work fine. The >> only problem I found was using curly quotes (“) instead of straight >> ones (") in the queues dictionary. Did you get the error message >> after the job was submitted and started to run or before? >> >> Best regards, >> Grigory >> >> -------------------------------------------------------------------------------- >> Grigory Sharov, Ph.D. >> >> MRC Laboratory of Molecular Biology, >> Francis Crick Avenue, >> Cambridge Biomedical Campus, >> Cambridge CB2 0QH, UK. >> tel. +44 (0) 1223 267228 <tel:+44%201223%20267228> >> e-mail: gs...@mr... <mailto:gs...@mr...> >> >> >> On Wed, Nov 11, 2020 at 9:20 AM Yangyang Yi <yy...@si... >> <mailto:yy...@si...>> wrote: >> >> Dear Scipion users & devs, >> >> I am kindly asking for your advice. >> >> Now we are trying to set Scipion-2.0 on a slurm cluster. It could >> run on single machine but failed to submit the jobs to queue. >> Slurm cluster works well and running scipion on single node works. >> >> Here’s our settings for host.conf: >> >> host.conf: >> [localhost] >> PARALLEL_COMMAND = mpirun -np %_(JOB_NODES)d -bynode %_(COMMAND)s >> NAME = SLURM >> MANDATORY = 0 >> SUBMIT_COMMAND = sbatch %_(JOB_SCRIPT)s >> CANCEL_COMMAND = scancel %_(JOB_ID)s >> CHECK_COMMAND = squeue -j %_(JOB_ID)s >> SUBMIT_TEMPLATE = #!/bin/bash >> ####SBATCH --export=ALL >> #SBATCH -p %_(JOB_QUEUE)s >> #SBATCH -J %_(JOB_NAME)s >> #SBATCH -o %_(JOB_SCRIPT)s.out >> #SBATCH -e %_(JOB_SCRIPT)s.err >> #SBATCH --time=%_(JOB_TIME)s:00:00 >> #SBATCH --nodes=1 >> #SBATCH --ntasks=%_(JOB_NODES)d >> #SBATCH --cpus-per-task=%_(JOB_THREADS)d >> WORKDIR=$SLURM_JOB_SUBMIT_DIR >> export XMIPP_IN_QUEUE=1 >> cd $WORKDIR >> # Make a copy of node file >> echo $SLURM_JOB_NODELIST > %_(JOB_NODEFILE)s >> ### Display the job context >> echo Running on host `hostname` >> echo Time is `date` >> echo Working directory is `pwd` >> echo $SLURM_JOB_NODELIST >> echo CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES >> ################################# >> %_(JOB_COMMAND)s >> find "$SLURM_SUBMIT_DIR" -type f -user $USER -perm 644 -exec >> chmod 664 {} + >> QUEUES = { >> “a": [["JOB_TIME", "48", "Time (hours)", "Select the time >> expected (in hours) for this job"], >> ["NODES","1", "Nodes", "How many nodes required for all the nodes"], >> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual >> jobs to queue"]], >> “b": [["JOB_TIME", "48", "Time (hours)", "Select the time >> expected (in hours) for this job"], >> ["NODES","1", "Nodes", "How many nodes required for all the nodes"], >> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual >> jobs to queue"]], >> “c": [["JOB_MEMORY", "8192", "Memory (MB)", "Select amount of >> memory (in megabytes) for this job"], >> ["JOB_TIME", "48", "Time (hours)", "Select the time expected (in >> hours) for this job"], >> ["NODES","1", "Nodes", "How many nodes required for all the nodes"], >> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual >> jobs to queue"]] >> } >> JOB_DONE_REGEX = >> >> And the Scipion reports: >> typeerror: %d format: a number is required, not nonetype >> >> And suggestions about how to solve the problem? Thanks! >> _______________________________________________ >> scipion-users mailing list >> sci...@li... >> <mailto:sci...@li...> >> https://lists.sourceforge.net/lists/listinfo/scipion-users >> >> _______________________________________________ >> scipion-users mailing list >> sci...@li... >> <mailto:sci...@li...> >> https://lists.sourceforge.net/lists/listinfo/scipion-users > > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users -- Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* |