You can subscribe to this list here.
2016 |
Jan
(2) |
Feb
(13) |
Mar
(9) |
Apr
(4) |
May
(5) |
Jun
(2) |
Jul
(8) |
Aug
(3) |
Sep
(25) |
Oct
(7) |
Nov
(49) |
Dec
(15) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2017 |
Jan
(24) |
Feb
(36) |
Mar
(53) |
Apr
(44) |
May
(37) |
Jun
(34) |
Jul
(12) |
Aug
(15) |
Sep
(14) |
Oct
(9) |
Nov
(9) |
Dec
(7) |
2018 |
Jan
(16) |
Feb
(9) |
Mar
(27) |
Apr
(39) |
May
(8) |
Jun
(24) |
Jul
(22) |
Aug
(11) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
2019 |
Jan
(4) |
Feb
(5) |
Mar
|
Apr
(1) |
May
(21) |
Jun
(13) |
Jul
(31) |
Aug
(22) |
Sep
(9) |
Oct
(19) |
Nov
(24) |
Dec
(12) |
2020 |
Jan
(30) |
Feb
(12) |
Mar
(16) |
Apr
(4) |
May
(37) |
Jun
(17) |
Jul
(19) |
Aug
(15) |
Sep
(26) |
Oct
(84) |
Nov
(64) |
Dec
(55) |
2021 |
Jan
(18) |
Feb
(58) |
Mar
(26) |
Apr
(88) |
May
(51) |
Jun
(36) |
Jul
(31) |
Aug
(37) |
Sep
(79) |
Oct
(15) |
Nov
(29) |
Dec
(8) |
2022 |
Jan
(5) |
Feb
(8) |
Mar
(29) |
Apr
(21) |
May
(11) |
Jun
(11) |
Jul
(18) |
Aug
(16) |
Sep
(6) |
Oct
(10) |
Nov
(23) |
Dec
(1) |
2023 |
Jan
(18) |
Feb
|
Mar
(4) |
Apr
|
May
(3) |
Jun
(10) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(3) |
Dec
(5) |
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
(1) |
Feb
|
Mar
|
Apr
(5) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Grigory S. <sha...@gm...> - 2020-11-26 11:13:23
|
It could be also that job scripts starting with a number are not allowed by Slurm: Runs/000429_ProtRelionClassify2D/logs/429 In this case SUBMIT_PREFIX is required: https://scipion-em.github.io/docs/docs/scipion-modes/host-configuration.html Best regards, Grigory -------------------------------------------------------------------------------- Grigory Sharov, Ph.D. MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK. tel. +44 (0) 1223 267228 <+44%201223%20267228> e-mail: gs...@mr... On Thu, Nov 26, 2020 at 10:58 AM Pablo Conesa <pc...@cn...> wrote: > Just an idea!! > > Maybe the "working directory" is not been passed to the node, thus working > on home and failing? > On 26/11/20 11:02, Grigory Sharov wrote: > > Hi, > > the place where you start scipion is not relevant, does SCIPION_USER_DATA > variable in the config file point to a shared writable location? > > Best regards, > Grigory > > > -------------------------------------------------------------------------------- > Grigory Sharov, Ph.D. > > MRC Laboratory of Molecular Biology, > Francis Crick Avenue, > Cambridge Biomedical Campus, > Cambridge CB2 0QH, UK. > tel. +44 (0) 1223 267228 <+44%201223%20267228> > e-mail: gs...@mr... > > > On Thu, Nov 26, 2020 at 8:57 AM Yangyang Yi <yy...@si...> wrote: > >> I have check the job more carefully and also tried other jobs. >> For the relion Class3D job, (I used relion_benchmark dataset) it reported >> “Output Directory not exist” in Runs/000429_ProtRelionClassify2D/logs/429: >> ERROR: >> ERROR: output directory does not exist! >> /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaab935e555] >> /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi() >> [0x436a8f] >> ================== >> ERROR: >> ERROR: output directory does not exist! >> /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41) >> [0x447f91] >> /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(_ZN11MlOptimiser17initialiseGeneralEi+0x248f) >> [0x5ada9f] >> /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(_ZN14MlOptimiserMpi10initialiseEv+0x998) >> [0x4689f8] >> /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(main+0xb2f) >> [0x4336ff] >> /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaab935e555] >> /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi() >> [0x436a8f] >> >> I have run some other job and they reported similar error, such as >> “Cannot open tif file” (in MotionCor2) or "forrtl: severe (24): end-of-file >> during read, unit 5, file /proc/226944/fd/0” (in unblur). For >> cisTEM-unblur, it reports: >> IOError: [Errno 2] No such file or >> directory:'Runs/000250_CistemProtUnblur/extra/May08_03.05.02_shifts.txt’, >> but I found these files in my home directory where I opened scipion. >> >> I am suspecting the raw data’s location matters. Since my testing data >> was located in our cluster shared storage, the permission is root and all >> the people could read but cannot modify (like /data/tutorial_data/). Those >> data has been used for software teaching or testing before, I’m sure that >> all the users could process them outside scipion. But I started scipion in >> my home directory which was located in cluster shared storage (like >> /data/users/xxxlab/xxx). Is there anything I should take care about? >> >> I will also try scipion-3.0 to see if it works. >> >> 2020年11月23日 下午4:32,Yangyang Yi <yy...@si...> 写道: >> >> Sorry for the late reply. Here’s the log from the job begin to the job >> end. >> run.stdout: >> >> 00001: RUNNING PROTOCOL ----------------- >> 00002: HostName: headnode.cm.cluster >> 00003: PID: 209177 >> 00004: Scipion: v2.0 (2019-04-23) Diocletian >> 00005: currentDir: >> /ddn/users/spadm/ScipionUserData/projects/relion_benchmark >> 00006: workingDir: Runs/000347_ProtRelionClassify3D >> 00007: runMode: Continue >> 00008: MPI: 5 >> 00009: threads: 2 >> 00010: len(steps) 3 len(prevSteps) 0 >> 00011: Starting at step: 1 >> 00012: Running steps >> 00013: STARTED: convertInputStep, step 1 >> 00014: 2020-11-12 13:46:13.708639 >> 00015: Converting set from >> 'Runs/000002_ProtImportParticles/particles.sqlite' into >> 'Runs/000347_ProtRelionClassify3D/input_particles.star' >> 00016: convertBinaryFiles: creating soft links. >> 00017: Root: Runs/000347_ProtRelionClassify3D/extra/input -> >> Runs/000002_ProtImportParticles/extra >> 00018: FINISHED: convertInputStep, step 1 >> 00019: 2020-11-12 13:46:48.416238 >> 00020: STARTED: runRelionStep, step 2 >> 00021: 2020-11-12 13:46:48.438416 >> 00022: ** Submiting to queue: 'sbatch >> /ddn/users/spadm/ScipionUserData/projects/relion_benchmark/Runs/000347_ProtRelionClassify3D/logs/347-0-1.job' >> 00023: launched job with id 2552 >> 00024: FINISHED: runRelionStep, step 2 >> 00025: 2020-11-12 13:46:48.524619 >> 00026: STARTED: createOutputStep, step 3 >> 00027: 2020-11-12 13:46:48.973668 >> 00028: Traceback (most recent call last): >> 00029: File >> "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/executor.py", line 151, in >> run >> 00030: self.step._run() # not self.step.run() , to avoid race >> conditions >> 00031: File >> "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 237, in >> _run >> 00032: resultFiles = self._runFunc() >> 00033: File >> "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 233, in >> _runFunc >> 00034: return self._func(*self._args) >> 00035: File >> "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", >> line 77, in createOutputStep >> 00036: self._fillClassesFromIter(classes3D, self._lastIter()) >> 00037: File >> "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", >> line 176, in _fillClassesFromIter >> 00038: self._loadClassesInfo(iteration) >> 00039: File >> "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", >> line 166, in _loadClassesInfo >> 00040: self._getFileName('model', iter=iteration)) >> 00041: File >> "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 841, in >> _getFileName >> 00042: return self.__filenamesDict[key] % kwargs >> 00043: TypeError: %d format: a number is required, not NoneType >> 00044: Protocol failed: %d format: a number is required, not NoneType >> 00045: FAILED: createOutputStep, step 3 >> 00046: 2020-11-12 13:46:48.991279 >> 00047: *** Last status is failed >> 00048: ------------------- PROTOCOL FAILED (DONE 3/3) >> >> run.log: >> 2020-11-12 13:46:48.438416 >> 00020: 2020-11-12 13:46:48,972 INFO: FINISHED: runRelionStep, step 2 >> 00021: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.524619 >> 00022: 2020-11-12 13:46:48,973 INFO: STARTED: createOutputStep, step 3 >> 00023: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.973668 >> 00024: 2020-11-12 13:46:49,485 ERROR: Protocol failed: %d format: a >> number is required, not NoneType >> 00025: 2020-11-12 13:46:49,508 INFO: FAILED: createOutputStep, step 3 >> 00026: 2020-11-12 13:46:49,508 INFO: 2020-11-12 13:46:48.991279 >> 00027: 2020-11-12 13:46:49,570 INFO: ------------------- PROTOCOL >> FAILED (DONE 3/3) >> >> >> >> 2020年11月12日 上午2:19,Grigory Sharov <sha...@gm...> 写道: >> >> Hi Yangyang, >> >> I've tried your config with Scipion2 and it seems to work fine. The only >> problem I found was using curly quotes (“) instead of straight ones (") >> in the queues dictionary. Did you get the error message after the job was >> submitted and started to run or before? >> >> Best regards, >> Grigory >> >> >> -------------------------------------------------------------------------------- >> Grigory Sharov, Ph.D. >> >> MRC Laboratory of Molecular Biology, >> Francis Crick Avenue, >> Cambridge Biomedical Campus, >> Cambridge CB2 0QH, UK. >> tel. +44 (0) 1223 267228 <+44%201223%20267228> >> e-mail: gs...@mr... >> >> >> On Wed, Nov 11, 2020 at 9:20 AM Yangyang Yi <yy...@si...> >> wrote: >> >>> Dear Scipion users & devs, >>> >>> I am kindly asking for your advice. >>> >>> Now we are trying to set Scipion-2.0 on a slurm cluster. It could run on >>> single machine but failed to submit the jobs to queue. Slurm cluster works >>> well and running scipion on single node works. >>> >>> Here’s our settings for host.conf: >>> >>> host.conf: >>> [localhost] >>> PARALLEL_COMMAND = mpirun -np %_(JOB_NODES)d -bynode %_(COMMAND)s >>> NAME = SLURM >>> MANDATORY = 0 >>> SUBMIT_COMMAND = sbatch %_(JOB_SCRIPT)s >>> CANCEL_COMMAND = scancel %_(JOB_ID)s >>> CHECK_COMMAND = squeue -j %_(JOB_ID)s >>> SUBMIT_TEMPLATE = #!/bin/bash >>> ####SBATCH --export=ALL >>> #SBATCH -p %_(JOB_QUEUE)s >>> #SBATCH -J %_(JOB_NAME)s >>> #SBATCH -o %_(JOB_SCRIPT)s.out >>> #SBATCH -e %_(JOB_SCRIPT)s.err >>> #SBATCH --time=%_(JOB_TIME)s:00:00 >>> #SBATCH --nodes=1 >>> #SBATCH --ntasks=%_(JOB_NODES)d >>> #SBATCH --cpus-per-task=%_(JOB_THREADS)d >>> WORKDIR=$SLURM_JOB_SUBMIT_DIR >>> export XMIPP_IN_QUEUE=1 >>> cd $WORKDIR >>> # Make a copy of node file >>> echo $SLURM_JOB_NODELIST > %_(JOB_NODEFILE)s >>> ### Display the job context >>> echo Running on host `hostname` >>> echo Time is `date` >>> echo Working directory is `pwd` >>> echo $SLURM_JOB_NODELIST >>> echo CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES >>> ################################# >>> %_(JOB_COMMAND)s >>> find "$SLURM_SUBMIT_DIR" -type f -user $USER -perm 644 -exec >>> chmod 664 {} + >>> QUEUES = { >>> “a": [["JOB_TIME", "48", "Time (hours)", "Select the time >>> expected (in hours) for this job"], >>> ["NODES","1", "Nodes", "How many nodes required for all the >>> nodes"], >>> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual >>> jobs to queue"]], >>> “b": [["JOB_TIME", "48", "Time (hours)", "Select the time >>> expected (in hours) for this job"], >>> ["NODES","1", "Nodes", "How many nodes required for all the >>> nodes"], >>> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual >>> jobs to queue"]], >>> “c": [["JOB_MEMORY", "8192", "Memory (MB)", "Select amount of >>> memory (in megabytes) for this job"], >>> ["JOB_TIME", "48", "Time (hours)", "Select the time expected (in >>> hours) for this job"], >>> ["NODES","1", "Nodes", "How many nodes required for all the >>> nodes"], >>> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual >>> jobs to queue"]] >>> } >>> JOB_DONE_REGEX = >>> >>> And the Scipion reports: >>> typeerror: %d format: a number is required, not nonetype >>> >>> And suggestions about how to solve the problem? Thanks! >>> _______________________________________________ >>> scipion-users mailing list >>> sci...@li... >>> https://lists.sourceforge.net/lists/listinfo/scipion-users >>> >> _______________________________________________ >> scipion-users mailing list >> sci...@li... >> https://lists.sourceforge.net/lists/listinfo/scipion-users >> >> >> >> _______________________________________________ >> scipion-users mailing list >> sci...@li... >> https://lists.sourceforge.net/lists/listinfo/scipion-users >> > > > _______________________________________________ > scipion-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/scipion-users > > -- > Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > |
From: Pablo C. <pc...@cn...> - 2020-11-26 10:58:07
|
Just an idea!! Maybe the "working directory" is not been passed to the node, thus working on home and failing? On 26/11/20 11:02, Grigory Sharov wrote: > Hi, > > the place where you start scipion is not relevant, does > SCIPION_USER_DATA variable in the config file point to a shared > writable location? > > Best regards, > Grigory > > -------------------------------------------------------------------------------- > Grigory Sharov, Ph.D. > > MRC Laboratory of Molecular Biology, > Francis Crick Avenue, > Cambridge Biomedical Campus, > Cambridge CB2 0QH, UK. > tel. +44 (0) 1223 267228 <tel:+44%201223%20267228> > e-mail: gs...@mr... <mailto:gs...@mr...> > > > On Thu, Nov 26, 2020 at 8:57 AM Yangyang Yi <yy...@si... > <mailto:yy...@si...>> wrote: > > I have check the job more carefully and also tried other jobs. > For the relion Class3D job, (I used relion_benchmark dataset) it > reported “Output Directory not exist” in > Runs/000429_ProtRelionClassify2D/logs/429: > ERROR: > ERROR: output directory does not exist! > /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaab935e555] > /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi() > [0x436a8f] > ================== > ERROR: > ERROR: output directory does not exist! > /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41) > [0x447f91] > /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(_ZN11MlOptimiser17initialiseGeneralEi+0x248f) > [0x5ada9f] > /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(_ZN14MlOptimiserMpi10initialiseEv+0x998) > [0x4689f8] > /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(main+0xb2f) > [0x4336ff] > /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaab935e555] > /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi() > [0x436a8f] > > I have run some other job and they reported similar error, such as > “Cannot open tif file” (in MotionCor2) or "forrtl: severe (24): > end-of-file during read, unit 5, file /proc/226944/fd/0” (in > unblur). For cisTEM-unblur, it reports: > IOError: [Errno 2] No such file or > directory:'Runs/000250_CistemProtUnblur/extra/May08_03.05.02_shifts.txt’, > but I found these files in my home directory where I opened scipion. > > I am suspecting the raw data’s location matters. Since my testing > data was located in our cluster shared storage, the permission is > root and all the people could read but cannot modify (like > /data/tutorial_data/). Those data has been used for software > teaching or testing before, I’m sure that all the users could > process them outside scipion. But I started scipion in my home > directory which was located in cluster shared storage (like > /data/users/xxxlab/xxx). Is there anything I should take care about? > > I will also try scipion-3.0 to see if it works. > >> 2020年11月23日 下午4:32,Yangyang Yi <yy...@si... >> <mailto:yy...@si...>> 写道: >> >> Sorry for the late reply. Here’s the log from the job begin to >> the job end. >> run.stdout: >> >> 00001: RUNNING PROTOCOL ----------------- >> 00002: HostName: headnode.cm.cluster >> 00003: PID: 209177 >> 00004: Scipion: v2.0 (2019-04-23) Diocletian >> 00005: currentDir: >> /ddn/users/spadm/ScipionUserData/projects/relion_benchmark >> 00006: workingDir: Runs/000347_ProtRelionClassify3D >> 00007: runMode: Continue >> 00008: MPI: 5 >> 00009: threads: 2 >> 00010: len(steps) 3 len(prevSteps) 0 >> 00011: Starting at step: 1 >> 00012: Running steps >> 00013: STARTED: convertInputStep, step 1 >> 00014: 2020-11-12 13:46:13.708639 >> 00015: Converting set from >> 'Runs/000002_ProtImportParticles/particles.sqlite' into >> 'Runs/000347_ProtRelionClassify3D/input_particles.star' >> 00016: convertBinaryFiles: creating soft links. >> 00017: Root: Runs/000347_ProtRelionClassify3D/extra/input -> >> Runs/000002_ProtImportParticles/extra >> 00018: FINISHED: convertInputStep, step 1 >> 00019: 2020-11-12 13:46:48.416238 >> 00020: STARTED: runRelionStep, step 2 >> 00021: 2020-11-12 13:46:48.438416 >> 00022: ** Submiting to queue: 'sbatch >> /ddn/users/spadm/ScipionUserData/projects/relion_benchmark/Runs/000347_ProtRelionClassify3D/logs/347-0-1.job' >> 00023: launched job with id 2552 >> 00024: FINISHED: runRelionStep, step 2 >> 00025: 2020-11-12 13:46:48.524619 >> 00026: STARTED: createOutputStep, step 3 >> 00027: 2020-11-12 13:46:48.973668 >> 00028: Traceback (most recent call last): >> 00029: File >> "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/executor.py", >> line 151, in run >> 00030: self.step._run() # not self.step.run() , to avoid >> race conditions >> 00031: File >> "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", >> line 237, in _run >> 00032: resultFiles = self._runFunc() >> 00033: File >> "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", >> line 233, in _runFunc >> 00034: return self._func(*self._args) >> 00035: File >> "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", >> line 77, in createOutputStep >> 00036: self._fillClassesFromIter(classes3D, self._lastIter()) >> 00037: File >> "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", >> line 176, in _fillClassesFromIter >> 00038: self._loadClassesInfo(iteration) >> 00039: File >> "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", >> line 166, in _loadClassesInfo >> 00040: self._getFileName('model', iter=iteration)) >> 00041: File >> "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", >> line 841, in _getFileName >> 00042: return self.__filenamesDict[key] % kwargs >> 00043: TypeError: %d format: a number is required, not NoneType >> 00044: Protocol failed: %d format: a number is required, not >> NoneType >> 00045: FAILED: createOutputStep, step 3 >> 00046: 2020-11-12 13:46:48.991279 >> 00047: *** Last status is failed >> 00048: ------------------- PROTOCOL FAILED (DONE 3/3) >> >> run.log: >> 2020-11-12 13:46:48.438416 >> 00020: 2020-11-12 13:46:48,972 INFO: FINISHED: runRelionStep, >> step 2 >> 00021: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.524619 >> 00022: 2020-11-12 13:46:48,973 INFO: STARTED: >> createOutputStep, step 3 >> 00023: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.973668 >> 00024: 2020-11-12 13:46:49,485 ERROR: Protocol failed: %d >> format: a number is required, not NoneType >> 00025: 2020-11-12 13:46:49,508 INFO: FAILED: createOutputStep, >> step 3 >> 00026: 2020-11-12 13:46:49,508 INFO: 2020-11-12 13:46:48.991279 >> 00027: 2020-11-12 13:46:49,570 INFO: ------------------- >> PROTOCOL FAILED (DONE 3/3) >> >> >> >>> 2020年11月12日 上午2:19,Grigory Sharov <sha...@gm... >>> <mailto:sha...@gm...>> 写道: >>> >>> Hi Yangyang, >>> >>> I've tried your config with Scipion2 and it seems to work fine. >>> The only problem I found was using curly quotes (“) instead of >>> straight ones (") in the queues dictionary. Did you get the >>> error message after the job was submitted and started to run or >>> before? >>> >>> Best regards, >>> Grigory >>> >>> -------------------------------------------------------------------------------- >>> Grigory Sharov, Ph.D. >>> >>> MRC Laboratory of Molecular Biology, >>> Francis Crick Avenue, >>> Cambridge Biomedical Campus, >>> Cambridge CB2 0QH, UK. >>> tel. +44 (0) 1223 267228 <tel:+44%201223%20267228> >>> e-mail: gs...@mr... <mailto:gs...@mr...> >>> >>> >>> On Wed, Nov 11, 2020 at 9:20 AM Yangyang Yi >>> <yy...@si... <mailto:yy...@si...>> wrote: >>> >>> Dear Scipion users & devs, >>> >>> I am kindly asking for your advice. >>> >>> Now we are trying to set Scipion-2.0 on a slurm cluster. It >>> could run on single machine but failed to submit the jobs to >>> queue. Slurm cluster works well and running scipion on >>> single node works. >>> >>> Here’s our settings for host.conf: >>> >>> host.conf: >>> [localhost] >>> PARALLEL_COMMAND = mpirun -np %_(JOB_NODES)d -bynode >>> %_(COMMAND)s >>> NAME = SLURM >>> MANDATORY = 0 >>> SUBMIT_COMMAND = sbatch %_(JOB_SCRIPT)s >>> CANCEL_COMMAND = scancel %_(JOB_ID)s >>> CHECK_COMMAND = squeue -j %_(JOB_ID)s >>> SUBMIT_TEMPLATE = #!/bin/bash >>> ####SBATCH --export=ALL >>> #SBATCH -p %_(JOB_QUEUE)s >>> #SBATCH -J %_(JOB_NAME)s >>> #SBATCH -o %_(JOB_SCRIPT)s.out >>> #SBATCH -e %_(JOB_SCRIPT)s.err >>> #SBATCH --time=%_(JOB_TIME)s:00:00 >>> #SBATCH --nodes=1 >>> #SBATCH --ntasks=%_(JOB_NODES)d >>> #SBATCH --cpus-per-task=%_(JOB_THREADS)d >>> WORKDIR=$SLURM_JOB_SUBMIT_DIR >>> export XMIPP_IN_QUEUE=1 >>> cd $WORKDIR >>> # Make a copy of node file >>> echo $SLURM_JOB_NODELIST > %_(JOB_NODEFILE)s >>> ### Display the job context >>> echo Running on host `hostname` >>> echo Time is `date` >>> echo Working directory is `pwd` >>> echo $SLURM_JOB_NODELIST >>> echo CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES >>> ################################# >>> %_(JOB_COMMAND)s >>> find "$SLURM_SUBMIT_DIR" -type f -user $USER -perm >>> 644 -exec chmod 664 {} + >>> QUEUES = { >>> “a": [["JOB_TIME", "48", "Time (hours)", "Select the >>> time expected (in hours) for this job"], >>> ["NODES","1", "Nodes", "How many nodes required for all the >>> nodes"], >>> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send >>> individual jobs to queue"]], >>> “b": [["JOB_TIME", "48", "Time (hours)", "Select the >>> time expected (in hours) for this job"], >>> ["NODES","1", "Nodes", "How many nodes required for all the >>> nodes"], >>> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send >>> individual jobs to queue"]], >>> “c": [["JOB_MEMORY", "8192", "Memory (MB)", "Select >>> amount of memory (in megabytes) for this job"], >>> ["JOB_TIME", "48", "Time (hours)", "Select the time expected >>> (in hours) for this job"], >>> ["NODES","1", "Nodes", "How many nodes required for all the >>> nodes"], >>> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send >>> individual jobs to queue"]] >>> } >>> JOB_DONE_REGEX = >>> >>> And the Scipion reports: >>> typeerror: %d format: a number is required, not nonetype >>> >>> And suggestions about how to solve the problem? Thanks! >>> _______________________________________________ >>> scipion-users mailing list >>> sci...@li... >>> <mailto:sci...@li...> >>> https://lists.sourceforge.net/lists/listinfo/scipion-users >>> >>> _______________________________________________ >>> scipion-users mailing list >>> sci...@li... >>> <mailto:sci...@li...> >>> https://lists.sourceforge.net/lists/listinfo/scipion-users >> > > _______________________________________________ > scipion-users mailing list > sci...@li... > <mailto:sci...@li...> > https://lists.sourceforge.net/lists/listinfo/scipion-users > > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users -- Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* |
From: Pablo C. <pc...@cn...> - 2020-11-26 10:54:35
|
Ok, let me understand. Dmitry, you have folder "/home/user/Data/ScipionUserData/projects/scratch/" that does not contain project.sqlite (thus the warning) Now, why is that folder there? What does it contain? Is that an actual project? On 26/11/20 11:46, Grigory Sharov wrote: > Honestly not sure if there's a solution besides editing sqlite files. > Maybe Pablo or Jose Miguel can reply to the thread. > > Best regards, > Grigory > > -------------------------------------------------------------------------------- > Grigory Sharov, Ph.D. > > MRC Laboratory of Molecular Biology, > Francis Crick Avenue, > Cambridge Biomedical Campus, > Cambridge CB2 0QH, UK. > tel. +44 (0) 1223 267228 <tel:+44%201223%20267228> > e-mail: gs...@mr... <mailto:gs...@mr...> > > > On Wed, Nov 25, 2020 at 3:55 PM Dmitry A. Semchonok > <sem...@gm... <mailto:sem...@gm...>> wrote: > > Dear Grigory, > > Yes, it probably was removed unintrendedly. > > Sincerely, > Dmitry > > On November 25, 2020 4:19:35 PM Grigory Sharov > <sha...@gm... <mailto:sha...@gm...>> wrote: > >> Hi, >> >> did you remove /home/user/Data/ScipionUserData/projects/scratch/ >> folder? >> >> Best regards, >> Grigory >> >> -------------------------------------------------------------------------------- >> Grigory Sharov, Ph.D. >> >> MRC Laboratory of Molecular Biology, >> Francis Crick Avenue, >> Cambridge Biomedical Campus, >> Cambridge CB2 0QH, UK. >> tel. +44 (0) 1223 267228 <tel:+44%201223%20267228> >> e-mail: gs...@mr... <mailto:gs...@mr...> >> >> >> On Wed, Nov 25, 2020 at 9:50 AM Dmitry Semchonok >> <Sem...@gm... <mailto:Sem...@gm...>> wrote: >> >> Dear colleagues, >> >> Small question. >> >> I have an issue with some error message in the terminal. >> >> How to remove it? >> >> Thank you! >> >> >> Sincerely, >> Dmitry >> >> >> ERROR loading project: scratch >> Project database not found at >> '/home/user/Data/ScipionUserData/projects/scratch/project.sqlite' >> >> _______________________________________________ >> scipion-users mailing list >> sci...@li... >> <mailto:sci...@li...> >> https://lists.sourceforge.net/lists/listinfo/scipion-users >> >> _______________________________________________ >> scipion-users mailing list >> sci...@li... >> <mailto:scipion-users%40lists.sourceforge.net> >> https://lists.sourceforge.net/lists/listinfo/scipion-users >> > > _______________________________________________ > scipion-users mailing list > sci...@li... > <mailto:sci...@li...> > https://lists.sourceforge.net/lists/listinfo/scipion-users > > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users -- Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* |
From: Grigory S. <sha...@gm...> - 2020-11-26 10:47:11
|
Honestly not sure if there's a solution besides editing sqlite files. Maybe Pablo or Jose Miguel can reply to the thread. Best regards, Grigory -------------------------------------------------------------------------------- Grigory Sharov, Ph.D. MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK. tel. +44 (0) 1223 267228 <+44%201223%20267228> e-mail: gs...@mr... On Wed, Nov 25, 2020 at 3:55 PM Dmitry A. Semchonok <sem...@gm...> wrote: > Dear Grigory, > > Yes, it probably was removed unintrendedly. > > Sincerely, > Dmitry > > On November 25, 2020 4:19:35 PM Grigory Sharov <sha...@gm...> > wrote: > >> Hi, >> >> did you remove /home/user/Data/ScipionUserData/projects/scratch/ folder? >> >> Best regards, >> Grigory >> >> >> -------------------------------------------------------------------------------- >> Grigory Sharov, Ph.D. >> >> MRC Laboratory of Molecular Biology, >> Francis Crick Avenue, >> Cambridge Biomedical Campus, >> Cambridge CB2 0QH, UK. >> tel. +44 (0) 1223 267228 <+44%201223%20267228> >> e-mail: gs...@mr... >> >> >> On Wed, Nov 25, 2020 at 9:50 AM Dmitry Semchonok <Sem...@gm...> >> wrote: >> >>> Dear colleagues, >>> >>> Small question. >>> >>> I have an issue with some error message in the terminal. >>> >>> How to remove it? >>> >>> Thank you! >>> >>> >>> Sincerely, >>> Dmitry >>> >>> >>> ERROR loading project: scratch >>> Project database not found at >>> '/home/user/Data/ScipionUserData/projects/scratch/project.sqlite' >>> >>> _______________________________________________ >>> scipion-users mailing list >>> sci...@li... >>> https://lists.sourceforge.net/lists/listinfo/scipion-users >>> >> _______________________________________________ >> scipion-users mailing list >> sci...@li... >> https://lists.sourceforge.net/lists/listinfo/scipion-users >> >> > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > |
From: Grigory S. <sha...@gm...> - 2020-11-26 10:02:35
|
Hi, the place where you start scipion is not relevant, does SCIPION_USER_DATA variable in the config file point to a shared writable location? Best regards, Grigory -------------------------------------------------------------------------------- Grigory Sharov, Ph.D. MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK. tel. +44 (0) 1223 267228 <+44%201223%20267228> e-mail: gs...@mr... On Thu, Nov 26, 2020 at 8:57 AM Yangyang Yi <yy...@si...> wrote: > I have check the job more carefully and also tried other jobs. > For the relion Class3D job, (I used relion_benchmark dataset) it reported > “Output Directory not exist” in Runs/000429_ProtRelionClassify2D/logs/429: > ERROR: > ERROR: output directory does not exist! > /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaab935e555] > /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi() > [0x436a8f] > ================== > ERROR: > ERROR: output directory does not exist! > /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41) > [0x447f91] > /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(_ZN11MlOptimiser17initialiseGeneralEi+0x248f) > [0x5ada9f] > /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(_ZN14MlOptimiserMpi10initialiseEv+0x998) > [0x4689f8] > /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(main+0xb2f) > [0x4336ff] > /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaab935e555] > /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi() > [0x436a8f] > > I have run some other job and they reported similar error, such as “Cannot > open tif file” (in MotionCor2) or "forrtl: severe (24): end-of-file during > read, unit 5, file /proc/226944/fd/0” (in unblur). For cisTEM-unblur, it > reports: > IOError: [Errno 2] No such file or > directory:'Runs/000250_CistemProtUnblur/extra/May08_03.05.02_shifts.txt’, > but I found these files in my home directory where I opened scipion. > > I am suspecting the raw data’s location matters. Since my testing data was > located in our cluster shared storage, the permission is root and all the > people could read but cannot modify (like /data/tutorial_data/). Those data > has been used for software teaching or testing before, I’m sure that all > the users could process them outside scipion. But I started scipion in my > home directory which was located in cluster shared storage (like > /data/users/xxxlab/xxx). Is there anything I should take care about? > > I will also try scipion-3.0 to see if it works. > > 2020年11月23日 下午4:32,Yangyang Yi <yy...@si...> 写道: > > Sorry for the late reply. Here’s the log from the job begin to the job end. > run.stdout: > > 00001: RUNNING PROTOCOL ----------------- > 00002: HostName: headnode.cm.cluster > 00003: PID: 209177 > 00004: Scipion: v2.0 (2019-04-23) Diocletian > 00005: currentDir: > /ddn/users/spadm/ScipionUserData/projects/relion_benchmark > 00006: workingDir: Runs/000347_ProtRelionClassify3D > 00007: runMode: Continue > 00008: MPI: 5 > 00009: threads: 2 > 00010: len(steps) 3 len(prevSteps) 0 > 00011: Starting at step: 1 > 00012: Running steps > 00013: STARTED: convertInputStep, step 1 > 00014: 2020-11-12 13:46:13.708639 > 00015: Converting set from > 'Runs/000002_ProtImportParticles/particles.sqlite' into > 'Runs/000347_ProtRelionClassify3D/input_particles.star' > 00016: convertBinaryFiles: creating soft links. > 00017: Root: Runs/000347_ProtRelionClassify3D/extra/input -> > Runs/000002_ProtImportParticles/extra > 00018: FINISHED: convertInputStep, step 1 > 00019: 2020-11-12 13:46:48.416238 > 00020: STARTED: runRelionStep, step 2 > 00021: 2020-11-12 13:46:48.438416 > 00022: ** Submiting to queue: 'sbatch > /ddn/users/spadm/ScipionUserData/projects/relion_benchmark/Runs/000347_ProtRelionClassify3D/logs/347-0-1.job' > 00023: launched job with id 2552 > 00024: FINISHED: runRelionStep, step 2 > 00025: 2020-11-12 13:46:48.524619 > 00026: STARTED: createOutputStep, step 3 > 00027: 2020-11-12 13:46:48.973668 > 00028: Traceback (most recent call last): > 00029: File > "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/executor.py", line 151, in > run > 00030: self.step._run() # not self.step.run() , to avoid race > conditions > 00031: File > "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 237, in > _run > 00032: resultFiles = self._runFunc() > 00033: File > "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 233, in > _runFunc > 00034: return self._func(*self._args) > 00035: File > "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", > line 77, in createOutputStep > 00036: self._fillClassesFromIter(classes3D, self._lastIter()) > 00037: File > "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", > line 176, in _fillClassesFromIter > 00038: self._loadClassesInfo(iteration) > 00039: File > "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", > line 166, in _loadClassesInfo > 00040: self._getFileName('model', iter=iteration)) > 00041: File > "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 841, in > _getFileName > 00042: return self.__filenamesDict[key] % kwargs > 00043: TypeError: %d format: a number is required, not NoneType > 00044: Protocol failed: %d format: a number is required, not NoneType > 00045: FAILED: createOutputStep, step 3 > 00046: 2020-11-12 13:46:48.991279 > 00047: *** Last status is failed > 00048: ------------------- PROTOCOL FAILED (DONE 3/3) > > run.log: > 2020-11-12 13:46:48.438416 > 00020: 2020-11-12 13:46:48,972 INFO: FINISHED: runRelionStep, step 2 > 00021: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.524619 > 00022: 2020-11-12 13:46:48,973 INFO: STARTED: createOutputStep, step 3 > 00023: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.973668 > 00024: 2020-11-12 13:46:49,485 ERROR: Protocol failed: %d format: a > number is required, not NoneType > 00025: 2020-11-12 13:46:49,508 INFO: FAILED: createOutputStep, step 3 > 00026: 2020-11-12 13:46:49,508 INFO: 2020-11-12 13:46:48.991279 > 00027: 2020-11-12 13:46:49,570 INFO: ------------------- PROTOCOL > FAILED (DONE 3/3) > > > > 2020年11月12日 上午2:19,Grigory Sharov <sha...@gm...> 写道: > > Hi Yangyang, > > I've tried your config with Scipion2 and it seems to work fine. The only > problem I found was using curly quotes (“) instead of straight ones (") > in the queues dictionary. Did you get the error message after the job was > submitted and started to run or before? > > Best regards, > Grigory > > > -------------------------------------------------------------------------------- > Grigory Sharov, Ph.D. > > MRC Laboratory of Molecular Biology, > Francis Crick Avenue, > Cambridge Biomedical Campus, > Cambridge CB2 0QH, UK. > tel. +44 (0) 1223 267228 <+44%201223%20267228> > e-mail: gs...@mr... > > > On Wed, Nov 11, 2020 at 9:20 AM Yangyang Yi <yy...@si...> wrote: > >> Dear Scipion users & devs, >> >> I am kindly asking for your advice. >> >> Now we are trying to set Scipion-2.0 on a slurm cluster. It could run on >> single machine but failed to submit the jobs to queue. Slurm cluster works >> well and running scipion on single node works. >> >> Here’s our settings for host.conf: >> >> host.conf: >> [localhost] >> PARALLEL_COMMAND = mpirun -np %_(JOB_NODES)d -bynode %_(COMMAND)s >> NAME = SLURM >> MANDATORY = 0 >> SUBMIT_COMMAND = sbatch %_(JOB_SCRIPT)s >> CANCEL_COMMAND = scancel %_(JOB_ID)s >> CHECK_COMMAND = squeue -j %_(JOB_ID)s >> SUBMIT_TEMPLATE = #!/bin/bash >> ####SBATCH --export=ALL >> #SBATCH -p %_(JOB_QUEUE)s >> #SBATCH -J %_(JOB_NAME)s >> #SBATCH -o %_(JOB_SCRIPT)s.out >> #SBATCH -e %_(JOB_SCRIPT)s.err >> #SBATCH --time=%_(JOB_TIME)s:00:00 >> #SBATCH --nodes=1 >> #SBATCH --ntasks=%_(JOB_NODES)d >> #SBATCH --cpus-per-task=%_(JOB_THREADS)d >> WORKDIR=$SLURM_JOB_SUBMIT_DIR >> export XMIPP_IN_QUEUE=1 >> cd $WORKDIR >> # Make a copy of node file >> echo $SLURM_JOB_NODELIST > %_(JOB_NODEFILE)s >> ### Display the job context >> echo Running on host `hostname` >> echo Time is `date` >> echo Working directory is `pwd` >> echo $SLURM_JOB_NODELIST >> echo CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES >> ################################# >> %_(JOB_COMMAND)s >> find "$SLURM_SUBMIT_DIR" -type f -user $USER -perm 644 -exec >> chmod 664 {} + >> QUEUES = { >> “a": [["JOB_TIME", "48", "Time (hours)", "Select the time >> expected (in hours) for this job"], >> ["NODES","1", "Nodes", "How many nodes required for all the >> nodes"], >> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual >> jobs to queue"]], >> “b": [["JOB_TIME", "48", "Time (hours)", "Select the time >> expected (in hours) for this job"], >> ["NODES","1", "Nodes", "How many nodes required for all the >> nodes"], >> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual >> jobs to queue"]], >> “c": [["JOB_MEMORY", "8192", "Memory (MB)", "Select amount of >> memory (in megabytes) for this job"], >> ["JOB_TIME", "48", "Time (hours)", "Select the time expected (in >> hours) for this job"], >> ["NODES","1", "Nodes", "How many nodes required for all the >> nodes"], >> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual >> jobs to queue"]] >> } >> JOB_DONE_REGEX = >> >> And the Scipion reports: >> typeerror: %d format: a number is required, not nonetype >> >> And suggestions about how to solve the problem? Thanks! >> _______________________________________________ >> scipion-users mailing list >> sci...@li... >> https://lists.sourceforge.net/lists/listinfo/scipion-users >> > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > |
From: Yangyang Yi <yy...@si...> - 2020-11-26 08:57:36
|
I have check the job more carefully and also tried other jobs. For the relion Class3D job, (I used relion_benchmark dataset) it reported “Output Directory not exist” in Runs/000429_ProtRelionClassify2D/logs/429: ERROR: ERROR: output directory does not exist! /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaab935e555] /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi() [0x436a8f] ================== ERROR: ERROR: output directory does not exist! /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41) [0x447f91] /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(_ZN11MlOptimiser17initialiseGeneralEi+0x248f) [0x5ada9f] /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(_ZN14MlOptimiserMpi10initialiseEv+0x998) [0x4689f8] /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(main+0xb2f) [0x4336ff] /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaab935e555] /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi() [0x436a8f] I have run some other job and they reported similar error, such as “Cannot open tif file” (in MotionCor2) or "forrtl: severe (24): end-of-file during read, unit 5, file /proc/226944/fd/0” (in unblur). For cisTEM-unblur, it reports: IOError: [Errno 2] No such file or directory:'Runs/000250_CistemProtUnblur/extra/May08_03.05.02_shifts.txt’, but I found these files in my home directory where I opened scipion. I am suspecting the raw data’s location matters. Since my testing data was located in our cluster shared storage, the permission is root and all the people could read but cannot modify (like /data/tutorial_data/). Those data has been used for software teaching or testing before, I’m sure that all the users could process them outside scipion. But I started scipion in my home directory which was located in cluster shared storage (like /data/users/xxxlab/xxx). Is there anything I should take care about? I will also try scipion-3.0 to see if it works. > 2020年11月23日 下午4:32,Yangyang Yi <yy...@si...> 写道: > > Sorry for the late reply. Here’s the log from the job begin to the job end. > run.stdout: > > 00001: RUNNING PROTOCOL ----------------- > 00002: HostName: headnode.cm.cluster > 00003: PID: 209177 > 00004: Scipion: v2.0 (2019-04-23) Diocletian > 00005: currentDir: /ddn/users/spadm/ScipionUserData/projects/relion_benchmark > 00006: workingDir: Runs/000347_ProtRelionClassify3D > 00007: runMode: Continue > 00008: MPI: 5 > 00009: threads: 2 > 00010: len(steps) 3 len(prevSteps) 0 > 00011: Starting at step: 1 > 00012: Running steps > 00013: STARTED: convertInputStep, step 1 > 00014: 2020-11-12 13:46:13.708639 > 00015: Converting set from 'Runs/000002_ProtImportParticles/particles.sqlite' into 'Runs/000347_ProtRelionClassify3D/input_particles.star' > 00016: convertBinaryFiles: creating soft links. > 00017: Root: Runs/000347_ProtRelionClassify3D/extra/input -> Runs/000002_ProtImportParticles/extra > 00018: FINISHED: convertInputStep, step 1 > 00019: 2020-11-12 13:46:48.416238 > 00020: STARTED: runRelionStep, step 2 > 00021: 2020-11-12 13:46:48.438416 > 00022: ** Submiting to queue: 'sbatch /ddn/users/spadm/ScipionUserData/projects/relion_benchmark/Runs/000347_ProtRelionClassify3D/logs/347-0-1.job' > 00023: launched job with id 2552 > 00024: FINISHED: runRelionStep, step 2 > 00025: 2020-11-12 13:46:48.524619 > 00026: STARTED: createOutputStep, step 3 > 00027: 2020-11-12 13:46:48.973668 > 00028: Traceback (most recent call last): > 00029: File "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/executor.py", line 151, in run > 00030: self.step._run() # not self.step.run() , to avoid race conditions > 00031: File "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 237, in _run > 00032: resultFiles = self._runFunc() > 00033: File "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 233, in _runFunc > 00034: return self._func(*self._args) > 00035: File "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", line 77, in createOutputStep > 00036: self._fillClassesFromIter(classes3D, self._lastIter()) > 00037: File "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", line 176, in _fillClassesFromIter > 00038: self._loadClassesInfo(iteration) > 00039: File "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", line 166, in _loadClassesInfo > 00040: self._getFileName('model', iter=iteration)) > 00041: File "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 841, in _getFileName > 00042: return self.__filenamesDict[key] % kwargs > 00043: TypeError: %d format: a number is required, not NoneType > 00044: Protocol failed: %d format: a number is required, not NoneType > 00045: FAILED: createOutputStep, step 3 > 00046: 2020-11-12 13:46:48.991279 > 00047: *** Last status is failed > 00048: ------------------- PROTOCOL FAILED (DONE 3/3) > > run.log: > 2020-11-12 13:46:48.438416 > 00020: 2020-11-12 13:46:48,972 INFO: FINISHED: runRelionStep, step 2 > 00021: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.524619 > 00022: 2020-11-12 13:46:48,973 INFO: STARTED: createOutputStep, step 3 > 00023: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.973668 > 00024: 2020-11-12 13:46:49,485 ERROR: Protocol failed: %d format: a number is required, not NoneType > 00025: 2020-11-12 13:46:49,508 INFO: FAILED: createOutputStep, step 3 > 00026: 2020-11-12 13:46:49,508 INFO: 2020-11-12 13:46:48.991279 > 00027: 2020-11-12 13:46:49,570 INFO: ------------------- PROTOCOL FAILED (DONE 3/3) > > > >> 2020年11月12日 上午2:19,Grigory Sharov <sha...@gm... <mailto:sha...@gm...>> 写道: >> >> Hi Yangyang, >> >> I've tried your config with Scipion2 and it seems to work fine. The only problem I found was using curly quotes (“) instead of straight ones (") in the queues dictionary. Did you get the error message after the job was submitted and started to run or before? >> >> Best regards, >> Grigory >> >> -------------------------------------------------------------------------------- >> Grigory Sharov, Ph.D. >> >> MRC Laboratory of Molecular Biology, >> Francis Crick Avenue, >> Cambridge Biomedical Campus, >> Cambridge CB2 0QH, UK. >> tel. +44 (0) 1223 267228 <tel:+44%201223%20267228> >> e-mail: gs...@mr... <mailto:gs...@mr...> >> >> >> On Wed, Nov 11, 2020 at 9:20 AM Yangyang Yi <yy...@si... <mailto:yy...@si...>> wrote: >> Dear Scipion users & devs, >> >> I am kindly asking for your advice. >> >> Now we are trying to set Scipion-2.0 on a slurm cluster. It could run on single machine but failed to submit the jobs to queue. Slurm cluster works well and running scipion on single node works. >> >> Here’s our settings for host.conf: >> >> host.conf: >> [localhost] >> PARALLEL_COMMAND = mpirun -np %_(JOB_NODES)d -bynode %_(COMMAND)s >> NAME = SLURM >> MANDATORY = 0 >> SUBMIT_COMMAND = sbatch %_(JOB_SCRIPT)s >> CANCEL_COMMAND = scancel %_(JOB_ID)s >> CHECK_COMMAND = squeue -j %_(JOB_ID)s >> SUBMIT_TEMPLATE = #!/bin/bash >> ####SBATCH --export=ALL >> #SBATCH -p %_(JOB_QUEUE)s >> #SBATCH -J %_(JOB_NAME)s >> #SBATCH -o %_(JOB_SCRIPT)s.out >> #SBATCH -e %_(JOB_SCRIPT)s.err >> #SBATCH --time=%_(JOB_TIME)s:00:00 >> #SBATCH --nodes=1 >> #SBATCH --ntasks=%_(JOB_NODES)d >> #SBATCH --cpus-per-task=%_(JOB_THREADS)d >> WORKDIR=$SLURM_JOB_SUBMIT_DIR >> export XMIPP_IN_QUEUE=1 >> cd $WORKDIR >> # Make a copy of node file >> echo $SLURM_JOB_NODELIST > %_(JOB_NODEFILE)s >> ### Display the job context >> echo Running on host `hostname` >> echo Time is `date` >> echo Working directory is `pwd` >> echo $SLURM_JOB_NODELIST >> echo CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES >> ################################# >> %_(JOB_COMMAND)s >> find "$SLURM_SUBMIT_DIR" -type f -user $USER -perm 644 -exec chmod 664 {} + >> QUEUES = { >> “a": [["JOB_TIME", "48", "Time (hours)", "Select the time expected (in hours) for this job"], >> ["NODES","1", "Nodes", "How many nodes required for all the nodes"], >> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual jobs to queue"]], >> “b": [["JOB_TIME", "48", "Time (hours)", "Select the time expected (in hours) for this job"], >> ["NODES","1", "Nodes", "How many nodes required for all the nodes"], >> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual jobs to queue"]], >> “c": [["JOB_MEMORY", "8192", "Memory (MB)", "Select amount of memory (in megabytes) for this job"], >> ["JOB_TIME", "48", "Time (hours)", "Select the time expected (in hours) for this job"], >> ["NODES","1", "Nodes", "How many nodes required for all the nodes"], >> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual jobs to queue"]] >> } >> JOB_DONE_REGEX = >> >> And the Scipion reports: >> typeerror: %d format: a number is required, not nonetype >> >> And suggestions about how to solve the problem? Thanks! >> _______________________________________________ >> scipion-users mailing list >> sci...@li... <mailto:sci...@li...> >> https://lists.sourceforge.net/lists/listinfo/scipion-users <https://lists.sourceforge.net/lists/listinfo/scipion-users> >> _______________________________________________ >> scipion-users mailing list >> sci...@li... <mailto:sci...@li...> >> https://lists.sourceforge.net/lists/listinfo/scipion-users > |
From: Dmitry A. S. <sem...@gm...> - 2020-11-25 15:55:24
|
Dear Grigory, Yes, it probably was removed unintrendedly. Sincerely, Dmitry On November 25, 2020 4:19:35 PM Grigory Sharov <sha...@gm...> wrote: > Hi, > > did you remove /home/user/Data/ScipionUserData/projects/scratch/ folder? > > > Best regards, > Grigory > > > -------------------------------------------------------------------------------- > Grigory Sharov, Ph.D. > > MRC Laboratory of Molecular Biology, > Francis Crick Avenue, > Cambridge Biomedical Campus, > Cambridge CB2 0QH, UK. > tel. +44 (0) 1223 267228 > e-mail: gs...@mr... > > > > > > On Wed, Nov 25, 2020 at 9:50 AM Dmitry Semchonok <Sem...@gm...> wrote: > > Dear colleagues, > > > Small question. > > > I have an issue with some error message in the terminal. > > > How to remove it? > > > Thank you! > > > > > Sincerely, > Dmitry > > > > > ERROR loading project: scratch > Project database not found at > '/home/user/Data/ScipionUserData/projects/scratch/project.sqlite' > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users |
From: Grigory S. <sha...@gm...> - 2020-11-25 15:29:37
|
Hi, as Gctf error indicates you need to check "CUDA Toolkit and Compatible Driver Versions" from here <https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html>, and make sure that your cuda toolkit version is compatible with your cuda-driver version - check *cuda driver version* ~$ nvidia-smi or ~$ cat /proc/driver/nvidia/version - check *local cuda version* ~$ nvcc --version or ~$ cat /usr/local/cuda/version.txt For ctffind4, could you try to run it locally (not on cluster), no threads, no MPI on a single micrograph and paste the run.stdout file here. Does the scipion test for ctffind4 work? Best regards, Grigory -------------------------------------------------------------------------------- Grigory Sharov, Ph.D. MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK. tel. +44 (0) 1223 267228 <+44%201223%20267228> e-mail: gs...@mr... On Wed, Nov 25, 2020 at 3:20 PM Ana Andreea ARTENI < ana...@i2...> wrote: > > Hi again, > I do have CUDA 10.1. > Jose Miguel also wrote me: I have Gctf_v1.18 wich is a VPP version. We'll > install gctf-1.06. > What about the second thing: cistem is still in "launched" mode, without > actually running - how can I sort out this one? > My best regards, > ana. > ------------------------------ > >> via ctffind4 >> METHODS: >> > cistem - ctffind4 >> ERROR generating methods info: 'CistemProtCTFFind' object has no >> attribute 'outputCTF' >> >> File 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile >> 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile >> 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile >> 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile >> 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile >> 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile >> 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile >> 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile >> 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile >> 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't exist >> >> Need also help here... >> My kind regards, >> ana. >> > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > |
From: Grigory S. <sha...@gm...> - 2020-11-25 15:19:15
|
Hi, did you remove /home/user/Data/ScipionUserData/projects/scratch/ folder? Best regards, Grigory -------------------------------------------------------------------------------- Grigory Sharov, Ph.D. MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK. tel. +44 (0) 1223 267228 <+44%201223%20267228> e-mail: gs...@mr... On Wed, Nov 25, 2020 at 9:50 AM Dmitry Semchonok <Sem...@gm...> wrote: > Dear colleagues, > > Small question. > > I have an issue with some error message in the terminal. > > How to remove it? > > Thank you! > > > Sincerely, > Dmitry > > > ERROR loading project: scratch > Project database not found at > '/home/user/Data/ScipionUserData/projects/scratch/project.sqlite' > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > |
From: Dmitry S. <Sem...@gm...> - 2020-11-25 09:50:03
|
Dear colleagues, Small question. I have an issue with some error message in the terminal. How to remove it? Thank you! Sincerely, Dmitry ERROR loading project: scratch Project database not found at '/home/user/Data/ScipionUserData/projects/scratch/project.sqlite' |
From: Grigory S. <sha...@gm...> - 2020-11-25 09:30:38
|
Hi Colin, it looks like an error with reading the converted reference volume. Do you have the mrc volume in Runs/006157_ProtRelionRefine3D/tmp folder? Best regards, Grigory -------------------------------------------------------------------------------- Grigory Sharov, Ph.D. MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK. tel. +44 (0) 1223 267228 <+44%201223%20267228> e-mail: gs...@mr... On Wed, Nov 25, 2020 at 5:30 AM <pc...@cn...> wrote: > Hi Collin, > > Could you send us Runs/004006_ProtUserSubSet/particles.sqlite > > Meanwhile, try to update relion plugin too. > > > El 25 nov. 2020 1:12, Colin Deniston <ckd...@gm...> escribió: > > I'm getting this error only when I try and run either a relion 3D refine > or 3D classification job in scipion on a specific workstation. Any idea > where the issue may be? I'm updating to the most recent version of scipion > now to see if it will fix, I currently have 3.0.5. Thanks! > > STARTED: convertInputStep, step 1, time 2020-11-24 16:07:15.402676 > 00019: Converting set from 'Runs/004006_ProtUserSubSet/particles.sqlite' > into 'Runs/006157_ProtRelionRefine3D/input_particles.star' > 00020: convertBinaryFiles: creating soft links. > 00021: Root: Runs/006157_ProtRelionRefine3D/extra/input -> > Runs/000961_ProtRelionExtractParticles/extra > 00022: Traceback (most recent call last): > 00023: File > "/apps/GNU/scipion/3.0/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 188, in run > 00024: self._run() > 00025: File > "/apps/GNU/scipion/3.0/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 239, in _run > 00026: resultFiles = self._runFunc() > 00027: File > "/apps/GNU/scipion/3.0/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 235, in _runFunc > 00028: return self._func(*self._args) > 00029: File > "/home/cdeniston/.local/lib/python3.8/site-packages/relion/protocols/protocol_base.py", > line 803, in convertInputStep > 00030: self._convertRef() > 00031: File > "/home/cdeniston/.local/lib/python3.8/site-packages/relion/protocols/protocol_base.py", > line 1226, in _convertRef > 00032: self._convertVol(ih, refVols[0]) > 00033: File > "/home/cdeniston/.local/lib/python3.8/site-packages/relion/protocols/protocol_base.py", > line 1210, in _convertVol > 00034: img = ih.read(inputVol) > 00035: File > "/apps/GNU/scipion/3.0/lib/python3.8/site-packages/pwem/emlib/image/image_handler.py", > line 275, in read > 00036: return self._imgClass(location) > 00037: TypeError: __init__() takes 1 positional argument but 2 were given > 00038: Protocol failed: __init__() takes 1 positional argument but 2 > were given > 00039: FAILED: convertInputStep, step 1, time 2020-11-24 16:07:29.782580 > 00040: *** Last status is failed > 00041: ------------------- PROTOCOL FAILED (DONE 1/3) > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > |
From: Grigory S. <sha...@gm...> - 2020-11-25 09:28:54
|
Hi Ana, the gctf error is : Error CUDA driver version is insufficient for CUDA runtime version at line 3059 in file src/ctf.cu The binary you are using (default) requires CUDA 10.1 for ctffind4 it looks like you did not actually run the job, as there's no output logs. I'll fix the methods message. Best regards, Grigory -------------------------------------------------------------------------------- Grigory Sharov, Ph.D. MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK. tel. +44 (0) 1223 267228 <+44%201223%20267228> e-mail: gs...@mr... On Wed, Nov 25, 2020 at 9:04 AM Ana Andreea ARTENI < ana...@i2...> wrote: > > Concerning the gctf I found the following in the run.stdout > STARTED: estimateCtfStep, step 3, time 2020-11-20 14:42:22.007666 > 00019: Estimating CTF of micrograph: 2 > 00020: Error CUDA driver version is insufficient for CUDA runtime > version at line 3059 in file src/ctf.cu > 00021: > *************************************************************************************************** > 00022: Version: > /data/sics/scipion/software/em/gctf-1.18/bin/Gctf_v1.18_sm30-75_cu10.1 > v1.18, updated on 2017-05-10 > 00023: Author: Kai Zhang@MRC Laboratory of Molecular Biology > 00024: Contact: kz...@mr... > 00025: Description: This is a simplified version of GCTF for CTF > determination. > 00026: > *************************************************************************************************** > 00027: > 00028: Opening > Runs/000850_ProtGctf/tmp/mic_0001/Falcon_2012_06_12-14_33_35_0_movie_aligned_mic.mrc > for initial test and preparation ....... > 00029: ERROR: Gctf has failed on Runs/000850_ProtGctf/tmp/mic_0001/*.mrc > 00030: > /data/sics/scipion/software/em/gctf-1.18/bin/Gctf_v1.18_sm30-75_cu10.1 > --apix 3.540000 --kV 300.000000 --cs 2.000000 --ac 0.100000 --dstep > 13.999992 --defL 5000.000000 --defH 90000.000000 --defS 500.000000 --astm > 1000.000000 --resL 50.000000 --resH 8.230000 --do_EPA 0 --boxsize 512 > --plot_res_ring 1 --gid 2 --bfac 150 --B_resH 7.080000 --overlap 0.500000 > --convsize 85 --do_Hres_ref 0 --smooth_resL 1000 --EPA_oversmp 4 --ctfstar > NONE --do_validation 0 Runs/000850_ProtGctf/tmp/mic_0002/*.mrc > 00031: Error CUDA driver version is insufficient for CUDA runtime > version at line 3059 in file src/ctf.cu > > Hope this helps better... > ana. > ------------------------------ > *From: *"Ana Andreea ARTENI" <ana...@i2...> > *To: *"sharov grigory" <sha...@gm...> > *Cc: *"Anne-Pascale JAUDIER" <ann...@i2...>, > "scipion-users" <sci...@li...> > *Sent: *Wednesday, 25 November, 2020 09:51:06 > *Subject: *error with protocols in Scipion3 CTF correction step > > > Dear Grigory, > > After installing Relion3, and start doing the mix and patch tutorial we > have the following errors: > > At the CTF step: > > via gctf: > > The test: scipion3 tests gctf.tests.test_protocols_gctf gives no errors, > but the output looks as in the image attached (that shows no psd and no > correct values for the parameters). > Any idea what is wrong and how can we fix this? > -------------------------------------------- > > via ctffind4 > METHODS: > > cistem - ctffind4 > ERROR generating methods info: 'CistemProtCTFFind' object has no attribute > 'outputCTF' > > File 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't exist > > Need also help here... > My kind regards, > ana. > > |
From: Jose M. de la R. T. <del...@gm...> - 2020-11-25 09:03:42
|
Dear Ana, Can you take a look a the error from Gctf logs? It is not creating any output, but it would be good to take a look at the error in the logs. Do you have setup the gctf binary version according to your Cuda version? (in scipion.conf file) Best, Jose Miguel On Wed, Nov 25, 2020 at 9:53 AM Ana Andreea ARTENI < ana...@i2...> wrote: > > Dear Grigory, > > After installing Relion3, and start doing the mix and patch tutorial we > have the following errors: > > At the CTF step: > > via gctf: > > The test: scipion3 tests gctf.tests.test_protocols_gctf gives no errors, > but the output looks as in the image attached (that shows no psd and no > correct values for the parameters). > Any idea what is wrong and how can we fix this? > -------------------------------------------- > > via ctffind4 > METHODS: > > cistem - ctffind4 > ERROR generating methods info: 'CistemProtCTFFind' object has no attribute > 'outputCTF' > > File 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile > 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't exist > > Need also help here... > My kind regards, > ana. > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > |
From: Ana A. A. <ana...@i2...> - 2020-11-25 08:51:23
|
Dear Grigory, After installing Relion3, and start doing the mix and patch tutorial we have the following errors: At the CTF step: via gctf: The test: scipion3 tests gctf.tests.test_protocols_gctf gives no errors, but the output looks as in the image attached (that shows no psd and no correct values for the parameters). Any idea what is wrong and how can we fix this? -------------------------------------------- via ctffind4 METHODS: > cistem - ctffind4 ERROR generating methods info: 'CistemProtCTFFind' object has no attribute 'outputCTF' File 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't existFile 'Runs/000979_CistemProtCTFFind/logs/run.stdout' doesn't exist Need also help here... My kind regards, ana. |
From: <pc...@cn...> - 2020-11-25 05:30:14
|
<div dir='auto'><div>Hi Collin, <div dir="auto"><br></div><div dir="auto">Could you send us <span style="font-family: sans-serif; font-size: 12.8px;">Runs/004006_ProtUserSubSet/particles.sqlite</span></div><div dir="auto"><span style="font-family: sans-serif; font-size: 12.8px;"><br></span></div><div dir="auto"><span style="font-family: sans-serif; font-size: 12.8px;">Meanwhile, try to update relion plugin too.</span></div><br><div class="gmail_extra"><br><div class="gmail_quote">El 25 nov. 2020 1:12, Colin Deniston <ckd...@gm...> escribió:<br type="attribution"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I'm getting this error only when I try and run either a relion 3D refine or 3D classification job in scipion on a specific workstation. Any idea where the issue may be? I'm updating to the most recent version of scipion now to see if it will fix, I currently have 3.0.5. Thanks!<div><br></div><div>STARTED: convertInputStep, step 1, time 2020-11-24 16:07:15.402676<br>00019: Converting set from 'Runs/004006_ProtUserSubSet/particles.sqlite' into 'Runs/006157_ProtRelionRefine3D/input_particles.star'<br>00020: convertBinaryFiles: creating soft links.<br>00021: Root: Runs/006157_ProtRelionRefine3D/extra/input -> Runs/000961_ProtRelionExtractParticles/extra<br>00022: Traceback (most recent call last):<br>00023: File "/apps/GNU/scipion/3.0/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 188, in run<br>00024: self._run()<br>00025: File "/apps/GNU/scipion/3.0/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 239, in _run<br>00026: resultFiles = self._runFunc()<br>00027: File "/apps/GNU/scipion/3.0/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 235, in _runFunc<br>00028: return self._func(*self._args)<br>00029: File "/home/cdeniston/.local/lib/python3.8/site-packages/relion/protocols/protocol_base.py", line 803, in convertInputStep<br>00030: self._convertRef()<br>00031: File "/home/cdeniston/.local/lib/python3.8/site-packages/relion/protocols/protocol_base.py", line 1226, in _convertRef<br>00032: self._convertVol(ih, refVols[0])<br>00033: File "/home/cdeniston/.local/lib/python3.8/site-packages/relion/protocols/protocol_base.py", line 1210, in _convertVol<br>00034: img = ih.read(inputVol)<br>00035: File "/apps/GNU/scipion/3.0/lib/python3.8/site-packages/pwem/emlib/image/image_handler.py", line 275, in read<br>00036: return self._imgClass(location)<br>00037: TypeError: __init__() takes 1 positional argument but 2 were given<br>00038: Protocol failed: __init__() takes 1 positional argument but 2 were given<br>00039: FAILED: convertInputStep, step 1, time 2020-11-24 16:07:29.782580<br>00040: *** Last status is failed <br>00041: ------------------- PROTOCOL FAILED (DONE 1/3)<br></div></div> </blockquote></div><br></div></div></div> |
From: Colin D. <ckd...@gm...> - 2020-11-25 00:12:43
|
I'm getting this error only when I try and run either a relion 3D refine or 3D classification job in scipion on a specific workstation. Any idea where the issue may be? I'm updating to the most recent version of scipion now to see if it will fix, I currently have 3.0.5. Thanks! STARTED: convertInputStep, step 1, time 2020-11-24 16:07:15.402676 00019: Converting set from 'Runs/004006_ProtUserSubSet/particles.sqlite' into 'Runs/006157_ProtRelionRefine3D/input_particles.star' 00020: convertBinaryFiles: creating soft links. 00021: Root: Runs/006157_ProtRelionRefine3D/extra/input -> Runs/000961_ProtRelionExtractParticles/extra 00022: Traceback (most recent call last): 00023: File "/apps/GNU/scipion/3.0/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 188, in run 00024: self._run() 00025: File "/apps/GNU/scipion/3.0/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 239, in _run 00026: resultFiles = self._runFunc() 00027: File "/apps/GNU/scipion/3.0/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 235, in _runFunc 00028: return self._func(*self._args) 00029: File "/home/cdeniston/.local/lib/python3.8/site-packages/relion/protocols/protocol_base.py", line 803, in convertInputStep 00030: self._convertRef() 00031: File "/home/cdeniston/.local/lib/python3.8/site-packages/relion/protocols/protocol_base.py", line 1226, in _convertRef 00032: self._convertVol(ih, refVols[0]) 00033: File "/home/cdeniston/.local/lib/python3.8/site-packages/relion/protocols/protocol_base.py", line 1210, in _convertVol 00034: img = ih.read(inputVol) 00035: File "/apps/GNU/scipion/3.0/lib/python3.8/site-packages/pwem/emlib/image/image_handler.py", line 275, in read 00036: return self._imgClass(location) 00037: TypeError: __init__() takes 1 positional argument but 2 were given 00038: Protocol failed: __init__() takes 1 positional argument but 2 were given 00039: FAILED: convertInputStep, step 1, time 2020-11-24 16:07:29.782580 00040: *** Last status is failed 00041: ------------------- PROTOCOL FAILED (DONE 1/3) |
From: Grigory S. <sha...@gm...> - 2020-11-24 14:04:28
|
Hi Dmitry, looking at the time - 1 s - this seems very quick! You need to check the output in the Runs/000980_ProtRelionMotioncor/tmp/movie_000001/output/ folder if it actually run anything. Best regards, Grigory -------------------------------------------------------------------------------- Grigory Sharov, Ph.D. MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK. tel. +44 (0) 1223 267228 <+44%201223%20267228> e-mail: gs...@mr... On Tue, Nov 24, 2020 at 12:00 PM Dmitry Semchonok <Sem...@gm...> wrote: > Dear colleagues, > > > Could you please advice? > > > > Scipon2 — relion - motion correction — Error > > At the same time the motion cor2 runs fine. > > Thank you > > 0001: RUNNING PROTOCOL ----------------- > 00002: HostName: cryoem02 > 00003: PID: 64478 > 00004: Scipion: v2.0 (2019-04-23) Diocletian > 00005: currentDir: > /data/panos/ScipionUserData/projects/TutorialBetagel__students > 00006: workingDir: Runs/000980_ProtRelionMotioncor > 00007: runMode: Continue > 00008: MPI: 1 > 00009: threads: 1 > 00010: len(steps) 17 len(prevSteps) 0 > 00011: Starting at step: 1 > 00012: Running steps > 00013: STARTED: _convertInputStep, step 1 > 00014: 2020-11-24 12:56:04.911042 > 00015: Relion version: > 00016: relion_run_motioncorr --version > 00017: RELION version: 3.0.8 > 00018: Precision: BASE=double, CUDA-ACC=single > 00019: > 00020: FINISHED: _convertInputStep, step 1 > 00021: 2020-11-24 12:56:04.951805 > 00022: STARTED: processMovieStep, step 2 > 00023: 2020-11-24 12:56:04.992562 > 00024: Processing movie: > Runs/000980_ProtRelionMotioncor/tmp/movie_000001/Falcon_2012_06_12-14_33_35_0_movie.mrcs > 00025: relion_run_motioncorr --i > Falcon_2012_06_12-14_33_35_0_movie_input.star --o output/ --use_own > --first_frame_sum 1 --last_frame_sum 16 --bin_factor 1.000000 --bfactor 150 > --angpix 3.540000 --patch_x 1 --patch_y 1 --j 1 --voltage 300 > 00026: Using our own implementation based on MOTIONCOR2 algorithm > 00027: to correct beam-induced motion for the following micrographs: > 00028: * Falcon_2012_06_12-14_33_35_0_movie.mrcs > 00029: Correcting beam-induced motions using our own implementation ... > 00030: 1/ 1 sec > ............................................................~~(,_,"> > 00031: Generating logfile.pdf ... > 00032: 000/??? sec ~~(,_,"> > [oo]gs: /usr/local/scipion/v2.0.0/software/lib/libtiff.so.5: > no version information available (required by /lib64/libgs.so.9) > 00033: 1/ 1 sec > ............................................................~~(,_,"> > 00034: Done! Written: output/logfile.pdf and > output/corrected_micrographs.star > 00035: Traceback (most recent call last): > 00036: File > "/usr/local/scipion/v2.0.0/pyworkflow/protocol/protocol.py", line 186, in > run > 00037: self._run() > 00038: File > "/usr/local/scipion/v2.0.0/pyworkflow/protocol/protocol.py", line 237, in > _run > 00039: resultFiles = self._runFunc() > 00040: File > "/usr/local/scipion/v2.0.0/pyworkflow/protocol/protocol.py", line 233, in > _runFunc > 00041: return self._func(*self._args) > 00042: File > "/usr/local/scipion/v2.0.0/pyworkflow/em/protocol/protocol_movies.py", line > 375, in processMovieStep > 00043: self._processMovie(movie) > 00044: File > "/usr/local/scipion/v2.0.0/software/lib/python2.7/site-packages/relion/protocols/protocol_motioncor.py", > line 213, in _processMovie > 00045: self._computeExtra(movie) > 00046: File > "/usr/local/scipion/v2.0.0/software/lib/python2.7/site-packages/relion/protocols/protocol_motioncor.py", > line 355, in _computeExtra > 00047: self._saveAlignmentPlots(movie) > 00048: File > "/usr/local/scipion/v2.0.0/software/lib/python2.7/site-packages/relion/protocols/protocol_motioncor.py", > line 401, in _saveAlignmentPlots > 00049: shiftsX, shiftsY = self._getMovieShifts(movie, > self._getMovieOutFn(movie, '.star')) > 00050: File > "/usr/local/scipion/v2.0.0/software/lib/python2.7/site-packages/relion/protocols/protocol_motioncor.py", > line 271, in _getMovieShifts > 00051: table = md.Table(fileName=outStar, tableName='global_shift') > 00052: File > "/usr/local/scipion/v2.0.0/software/lib/python2.7/site-packages/relion/convert/metadata.py", > line 58, in __init__ > 00053: self.read(fileName, tableName) > 00054: File > "/usr/local/scipion/v2.0.0/software/lib/python2.7/site-packages/relion/convert/metadata.py", > line 109, in read > 00055: with open(fileName) as f: > 00056: IOError: [Errno 2] No such file or directory: > 'Runs/000980_ProtRelionMotioncor/tmp/movie_000001/output/Falcon_2012_06_12-14_33_35_0_movie.star' > 00057: Protocol failed: [Errno 2] No such file or directory: > 'Runs/000980_ProtRelionMotioncor/tmp/movie_000001/output/Falcon_2012_06_12-14_33_35_0_movie.star' > 00058: FAILED: processMovieStep, step 2 > 00059: 2020-11-24 12:56:07.086919 > 00060: *** Last status is failed > 00061: ------------------- PROTOCOL FAILED (DONE 2/17) > > > Sincerely > Dmitry > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > |
From: Dmitry S. <Sem...@gm...> - 2020-11-24 12:00:33
|
Dear colleagues, Could you please advice? Scipon2 — relion - motion correction — Error At the same time the motion cor2 runs fine. Thank you 0001: RUNNING PROTOCOL ----------------- 00002: HostName: cryoem02 00003: PID: 64478 00004: Scipion: v2.0 (2019-04-23) Diocletian 00005: currentDir: /data/panos/ScipionUserData/projects/TutorialBetagel__students 00006: workingDir: Runs/000980_ProtRelionMotioncor 00007: runMode: Continue 00008: MPI: 1 00009: threads: 1 00010: len(steps) 17 len(prevSteps) 0 00011: Starting at step: 1 00012: Running steps 00013: STARTED: _convertInputStep, step 1 00014: 2020-11-24 12:56:04.911042 00015: Relion version: 00016: relion_run_motioncorr --version 00017: RELION version: 3.0.8 00018: Precision: BASE=double, CUDA-ACC=single 00019: 00020: FINISHED: _convertInputStep, step 1 00021: 2020-11-24 12:56:04.951805 00022: STARTED: processMovieStep, step 2 00023: 2020-11-24 12:56:04.992562 00024: Processing movie: Runs/000980_ProtRelionMotioncor/tmp/movie_000001/Falcon_2012_06_12-14_33_35_0_movie.mrcs 00025: relion_run_motioncorr --i Falcon_2012_06_12-14_33_35_0_movie_input.star --o output/ --use_own --first_frame_sum 1 --last_frame_sum 16 --bin_factor 1.000000 --bfactor 150 --angpix 3.540000 --patch_x 1 --patch_y 1 --j 1 --voltage 300 00026: Using our own implementation based on MOTIONCOR2 algorithm 00027: to correct beam-induced motion for the following micrographs: 00028: * Falcon_2012_06_12-14_33_35_0_movie.mrcs 00029: Correcting beam-induced motions using our own implementation ... 00030: 1/ 1 sec ............................................................~~(,_,"> 00031: Generating logfile.pdf ... 00032: 000/??? sec ~~(,_,"> [oo]gs: /usr/local/scipion/v2.0.0/software/lib/libtiff.so.5: no version information available (required by /lib64/libgs.so.9) 00033: 1/ 1 sec ............................................................~~(,_,"> 00034: Done! Written: output/logfile.pdf and output/corrected_micrographs.star 00035: Traceback (most recent call last): 00036: File "/usr/local/scipion/v2.0.0/pyworkflow/protocol/protocol.py", line 186, in run 00037: self._run() 00038: File "/usr/local/scipion/v2.0.0/pyworkflow/protocol/protocol.py", line 237, in _run 00039: resultFiles = self._runFunc() 00040: File "/usr/local/scipion/v2.0.0/pyworkflow/protocol/protocol.py", line 233, in _runFunc 00041: return self._func(*self._args) 00042: File "/usr/local/scipion/v2.0.0/pyworkflow/em/protocol/protocol_movies.py", line 375, in processMovieStep 00043: self._processMovie(movie) 00044: File "/usr/local/scipion/v2.0.0/software/lib/python2.7/site-packages/relion/protocols/protocol_motioncor.py", line 213, in _processMovie 00045: self._computeExtra(movie) 00046: File "/usr/local/scipion/v2.0.0/software/lib/python2.7/site-packages/relion/protocols/protocol_motioncor.py", line 355, in _computeExtra 00047: self._saveAlignmentPlots(movie) 00048: File "/usr/local/scipion/v2.0.0/software/lib/python2.7/site-packages/relion/protocols/protocol_motioncor.py", line 401, in _saveAlignmentPlots 00049: shiftsX, shiftsY = self._getMovieShifts(movie, self._getMovieOutFn(movie, '.star')) 00050: File "/usr/local/scipion/v2.0.0/software/lib/python2.7/site-packages/relion/protocols/protocol_motioncor.py", line 271, in _getMovieShifts 00051: table = md.Table(fileName=outStar, tableName='global_shift') 00052: File "/usr/local/scipion/v2.0.0/software/lib/python2.7/site-packages/relion/convert/metadata.py", line 58, in __init__ 00053: self.read(fileName, tableName) 00054: File "/usr/local/scipion/v2.0.0/software/lib/python2.7/site-packages/relion/convert/metadata.py", line 109, in read 00055: with open(fileName) as f: 00056: IOError: [Errno 2] No such file or directory: 'Runs/000980_ProtRelionMotioncor/tmp/movie_000001/output/Falcon_2012_06_12-14_33_35_0_movie.star' 00057: Protocol failed: [Errno 2] No such file or directory: 'Runs/000980_ProtRelionMotioncor/tmp/movie_000001/output/Falcon_2012_06_12-14_33_35_0_movie.star' 00058: FAILED: processMovieStep, step 2 00059: 2020-11-24 12:56:07.086919 00060: *** Last status is failed 00061: ------------------- PROTOCOL FAILED (DONE 2/17) Sincerely Dmitry |
From: Grigory S. <sha...@gm...> - 2020-11-23 15:15:02
|
Dear Dmitry, you can open old Scipion projects. If any protocol has been deprecated in Scipion 3, then you'll see a corresponding message. You won't be able to open/edit such protocols' form, but the outputs should be available for use. Best regards, Grigory -------------------------------------------------------------------------------- Grigory Sharov, Ph.D. MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK. tel. +44 (0) 1223 267228 <+44%201223%20267228> e-mail: gs...@mr... On Mon, Nov 23, 2020 at 2:27 PM Dmitry A. Semchonok <sem...@gm...> wrote: > Dear colleagues, > > Can I open the projects made by Scipion 2 with Scipion 3 and work with > them? > > Am I expecting any issues? > > sincerely, > Dmitry > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > |
From: David <dst...@cn...> - 2020-11-23 15:06:06
|
Hi Wolfgang, you can drop the '-m3dnow', as this technology has been dropped 10 years ago (AFAIK). If you are willing to / can generate non-portable code [which is also in theory the case with using fma and avx2 instructions, but majority of CPUs do support it], but at the same time get all possible optimizations, have a look at the '-march=cpu-type' optimization flag here https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/x86-Options.html If you use -march=your_cpu_family flag, all/majority (of the compliant) optimization flags for that specific architecture should be enabled by default. To see which flags are on, you can use '-v' with any compilation command we use. See the example below (in bold are the important flags, and in italics optimization flags that were automatically enabled). I recommend you to do these optimization if you can, as the performance gains outweigh the extra time spent on compilation (especially for the devel branch of xmipp, where we significantly improved the compilation time). If you need a code which is portable between several different families, use the '-v' to expand the flags for each architecture, and then set the intersection of those flags in the xmipp.conf file. Hope this helps. KR, David S. $ g++ *-v* -o libraries/reconstruction/aalign_significant.os -c *-march=native* -std=c++11 -O0 -I../ -I/usr/include -I/usr/include/hdf5/serial -fPIC -I/usr/include/python3.5m -I/usr/lib/python3/dist-packages/numpy/core/include -Iexternal -Ilibraries -I/home/david/git/xmipp_devel/src/xmippCore libraries/reconstruction/aalign_significant.cpp Using built-in specs. COLLECT_GCC=g++ Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.5.0-12ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 5.5.0 20171010 (Ubuntu 5.5.0-12ubuntu1) COLLECT_GCC_OPTIONS='-v' '-o' 'libraries/reconstruction/aalign_significant.os' '-c' '-march=native' '-std=c++11' '-O0' '-I' '../' '-I' '/usr/include' '-I' '/usr/include/hdf5/serial' '-fPIC' '-I' '/usr/include/python3.5m' '-I' '/usr/lib/python3/dist-packages/numpy/core/include' '-I' 'external' '-I' 'libraries' '-I' '/home/david/git/xmipp_devel/src/xmippCore' '-shared-libgcc' /usr/lib/gcc/x86_64-linux-gnu/5/cc1plus -quiet -v -I ../ -I /usr/include -I /usr/include/hdf5/serial -I /usr/include/python3.5m -I /usr/lib/python3/dist-packages/numpy/core/include -I external -I libraries -I /home/david/git/xmipp_devel/src/xmippCore -imultiarch x86_64-linux-gnu -D_GNU_SOURCE libraries/reconstruction/aalign_significant.cpp *-march=broadwell* /-mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mclflushopt -mxsavec -mxsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-clwb -mno-mwaitx --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=6144 -mtune=generic/ -quiet -dumpbase aalign_significant.cpp -auxbase-strip libraries/reconstruction/aalign_significant.os -O0 -std=c++11 -version .... (skipped) On 19/11/20 12:05, Lugmayr, Wolfgang wrote: > dear all, > > just for documentation of the scipion 3.0.6 official release. > > i changed in xmipp.conf the line: > CXXFLAGS= -mtune=native -march=native -std=c++11 -O3 > to > CXXFLAGS= -mfma -mavx2 -m3dnow -fomit-frame-pointer -std=c++11 -O3 > and rebuild xmipp. > now the exes run fine on intel and amd epyc processors. > > the flags come from recommended for gcc: > https://prace-ri.eu/wp-content/uploads/Best-Practice-Guide_AMD.pdf > > cheers and thanks for the help, > wolfgang > > > ------------------------------------------------------------------------ > *From: *"Carlos Oscar Sorzano" <co...@cn...> > *To: *"Pablo Conesa" <pc...@cn...>, "David" > <dst...@cn...>, dma...@cn..., "w lugmayr" > <w.l...@uk...>, "Mailing list for Scipion users" > <sci...@li...> > *Sent: *Thursday, 29 October, 2020 16:11:16 > *Subject: *Re: Fwd: [scipion-users] scipion-installer - how do i > change compiler optimization flags? > > Dear Wolfgang, > > in the compilation of Xmipp, you can set the environment variable > CXXFLAGS before calling the xmipp script. From the Scipion > installation I am not sure how to pass it to the xmipp script. It may > be that setting it in the shell before calling scipion suffices. But I > have never tried. > > Kind regards, Carlos Oscar > > El 29/10/2020 a las 11:13, Pablo Conesa escribió: > > ? > > > > -------- Forwarded Message -------- > Subject: [scipion-users] scipion-installer - how do i change > compiler optimization flags? > Date: Wed, 28 Oct 2020 14:19:29 +0100 (CET) > From: Lugmayr, Wolfgang <w.l...@uk...> <mailto:w.l...@uk...> > Reply-To: Mailing list for Scipion users > <sci...@li...> > <mailto:sci...@li...> > To: Mailing list for Scipion users > <sci...@li...> > <mailto:sci...@li...> > > > > hi, > > i installed and compiled scipion 3.0.5-devel on a system with > intel cpus. > > when i run now scipion on new nodes with "AMD EPYC 7402" > processors the executable fails with: > $ scipion3 last Scipion v3.0.5 () devel > sh: line 1: 45875 Illegal instruction python -m scipion last > > so i do not want to have 2 versions of scipion3 but maybe compile > it with intel and amd compatibility flags. > how do i set these flags? > > cheers, > wolfgang > > -- > Universitätsklinikum Hamburg-Eppendorf (UKE) > @ Centre for Structral Systems Biology (CSSB) > @ Deutsches Elektronen-Synchrotron (DESY) > Notkestrasse 85 Gebäude 15 > 22607 Hamburg, Germany > Tel.: +49 40 8998-87652 > Email:wol...@cs... <mailto:wol...@cs...> > http://www.cssb-hamburg.de/ > > > _______________________________________________ > scipion-users mailing list > sci...@li... <mailto:sci...@li...> > https://lists.sourceforge.net/lists/listinfo/scipion-users > > |
From: Dmitry A. S. <sem...@gm...> - 2020-11-23 14:27:02
|
Dear colleagues, Can I open the projects made by Scipion 2 with Scipion 3 and work with them? Am I expecting any issues? sincerely, Dmitry |
From: Carlos O. S. <co...@cn...> - 2020-11-23 11:08:20
|
Dear Juha, have you installed from binaries or from sources? The program failing is only one or all of them? You may try xmipp_image_statistics -i Runs/005027_XmippProtAlignVolumeParticles/extra/inputVolume.vol Cheers, Carlos Oscar El 22/11/2020 a las 23:34, Juha Huiskonen escribió: > Hi all, > > I have an issue running Xmipp protocol_align_volume_and_particles.py > from Scipion. > > RuntimeError: FATAL: module compiled as little endian, but detected > different endianness at runtime > > It seems there's an issue with the installation. Any tips how to fix > it? The full error is below > > Best wishes, > Juha > > > 00054: STARTED: alignVolumeStep, step 2, time 2020-11-23 00:26:33.446121 > 00055: xmipp_volume_align --i1 > Runs/005027_XmippProtAlignVolumeParticles/extra/refVolume.vol --i2 > Runs/005027_XmippProtAlignVolumeParticles/extra/inputVolume.vol > --apply > Runs/005027_XmippProtAlignVolumeParticles/extra/inputVolumeAligned.vol > --frm --copyGeo > Runs/005027_XmippProtAlignVolumeParticles/extra/transformation-matrix.txt > 00056: RuntimeError: FATAL: module compiled as little endian, but > detected different endianness at runtime > 00057: > /projappl/project_2000637/apps/scipion/3.0/software/em/xmipp/bin/xmipp_volume_align: > line 3: 39299 Segmentation fault > $XMIPP_HOME/bin/xmipp_volume_align_prog "$@" > 00058: Traceback (most recent call last): > 00059: File > "/projappl/project_2000637/apps/Anaconda-3.0/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 188, in run > 00060: self._run() > 00061: File > "/projappl/project_2000637/apps/Anaconda-3.0/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 239, in _run > 00062: resultFiles = self._runFunc() > 00063: File > "/projappl/project_2000637/apps/Anaconda-3.0/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 235, in _runFunc > 00064: return self._func(*self._args) > 00065: File > "/projappl/project_2000637/apps/Anaconda-3.0/envs/.scipion3env/lib/python3.8/site-packages/xmipp3/protocols/protocol_align_volume_and_particles.py", > line 153, in alignVolumeStep > 00066: self.runJob("xmipp_volume_align", args) > 00067: File > "/projappl/project_2000637/apps/Anaconda-3.0/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 1339, in runJob > 00068: self._stepsExecutor.runJob(self._log, program, arguments, > **kwargs) > 00069: File > "/projappl/project_2000637/apps/Anaconda-3.0/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/protocol/executor.py", > line 66, in runJob > 00070: process.runJob(log, programName, params, > 00071: File > "/projappl/project_2000637/apps/Anaconda-3.0/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/utils/process.py", > line 53, in runJob > 00072: return runCommand(command, env, cwd) > 00073: File > "/projappl/project_2000637/apps/Anaconda-3.0/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/utils/process.py", > line 68, in runCommand > 00074: check_call(command, shell=True, stdout=sys.stdout, > stderr=sys.stderr, > 00075: File > "/projappl/project_2000637/apps/Anaconda-3.0/envs/.scipion3env/lib/python3.8/subprocess.py", > line 364, in check_call > 00076: raise CalledProcessError(retcode, cmd) > 00077: subprocess.CalledProcessError: Command 'xmipp_volume_align > --i1 Runs/005027_XmippProtAlignVolumeParticles/extra/refVolume.vol > --i2 Runs/005027_XmippProtAlignVolumeParticles/extra/inputVolume.vol > --apply > Runs/005027_XmippProtAlignVolumeParticles/extra/inputVolumeAligned.vol > --frm --copyGeo > Runs/005027_XmippProtAlignVolumeParticles/extra/transformation-matrix.txt' > returned non-zero exit status 139. > 00078: Protocol failed: Command 'xmipp_volume_align --i1 > Runs/005027_XmippProtAlignVolumeParticles/extra/refVolume.vol --i2 > Runs/005027_XmippProtAlignVolumeParticles/extra/inputVolume.vol > --apply > Runs/005027_XmippProtAlignVolumeParticles/extra/inputVolumeAligned.vol > --frm --copyGeo > Runs/005027_XmippProtAlignVolumeParticles/extra/transformation-matrix.txt' > returned non-zero exit status 139. > 00079: FAILED: alignVolumeStep, step 2, time 2020-11-23 00:26:34.625143 > 00080: ------------------- PROTOCOL FAILED (DONE 2/4) > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users |
From: Grigory S. <sha...@gm...> - 2020-11-23 09:44:17
|
>From the last message: "FINISHED: runRelionStep, step 2" there's no problem with any queue system, the problem is whether relion actually produced the output data since detected last iteration in self._getFileName('model', iter=iteration)) is None. Best regards, Grigory -------------------------------------------------------------------------------- Grigory Sharov, Ph.D. MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK. tel. +44 (0) 1223 267228 e-mail: gs...@mr... On Mon, Nov 23, 2020 at 9:33 AM Lugmayr, Wolfgang <w.l...@uk...> wrote: > > hi, > > i compared the template posted below with mine and i have the following differences in my template: > MANDATORY = False > #SBATCH --ntasks %_(JOB_NODES)s > > your template may need also to change: > #SBATCH --nodes %_(NODES)s > > on our cluster the amount of mpi ntasks defines how many nodes you get. so i do not use the --nodes parameter. > > cheers, > wolfgang > > > ________________________________ > From: "Pablo Conesa" <pc...@cn...> > To: "Mailing list for Scipion users" <sci...@li...> > Sent: Monday, 23 November, 2020 09:49:29 > Subject: Re: [scipion-users] Questions about host.conf for Scipion on slurp cluster > > I think job went in! > > I see here more an issue when loading the starfile. Maybe iteration is None? > > https://github.com/scipion-em/scipion-em-relion/blob/support/relion/protocols/protocol_classify3d.py#L166 > > Grigory, Jose Miguel? > > > On 23/11/20 9:32, Yangyang Yi wrote: > > Sorry for the late reply. Here’s the log from the job begin to the job end. > run.stdout: > > 00001: RUNNING PROTOCOL ----------------- > 00002: HostName: headnode.cm.cluster > 00003: PID: 209177 > 00004: Scipion: v2.0 (2019-04-23) Diocletian > 00005: currentDir: /ddn/users/spadm/ScipionUserData/projects/relion_benchmark > 00006: workingDir: Runs/000347_ProtRelionClassify3D > 00007: runMode: Continue > 00008: MPI: 5 > 00009: threads: 2 > 00010: len(steps) 3 len(prevSteps) 0 > 00011: Starting at step: 1 > 00012: Running steps > 00013: STARTED: convertInputStep, step 1 > 00014: 2020-11-12 13:46:13.708639 > 00015: Converting set from 'Runs/000002_ProtImportParticles/particles.sqlite' into 'Runs/000347_ProtRelionClassify3D/input_particles.star' > 00016: convertBinaryFiles: creating soft links. > 00017: Root: Runs/000347_ProtRelionClassify3D/extra/input -> Runs/000002_ProtImportParticles/extra > 00018: FINISHED: convertInputStep, step 1 > 00019: 2020-11-12 13:46:48.416238 > 00020: STARTED: runRelionStep, step 2 > 00021: 2020-11-12 13:46:48.438416 > 00022: ** Submiting to queue: 'sbatch /ddn/users/spadm/ScipionUserData/projects/relion_benchmark/Runs/000347_ProtRelionClassify3D/logs/347-0-1.job' > 00023: launched job with id 2552 > 00024: FINISHED: runRelionStep, step 2 > 00025: 2020-11-12 13:46:48.524619 > 00026: STARTED: createOutputStep, step 3 > 00027: 2020-11-12 13:46:48.973668 > 00028: Traceback (most recent call last): > 00029: File "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/executor.py", line 151, in run > 00030: self.step._run() # not self.step.run() , to avoid race conditions > 00031: File "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 237, in _run > 00032: resultFiles = self._runFunc() > 00033: File "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 233, in _runFunc > 00034: return self._func(*self._args) > 00035: File "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", line 77, in createOutputStep > 00036: self._fillClassesFromIter(classes3D, self._lastIter()) > 00037: File "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", line 176, in _fillClassesFromIter > 00038: self._loadClassesInfo(iteration) > 00039: File "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", line 166, in _loadClassesInfo > 00040: self._getFileName('model', iter=iteration)) > 00041: File "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 841, in _getFileName > 00042: return self.__filenamesDict[key] % kwargs > 00043: TypeError: %d format: a number is required, not NoneType > 00044: Protocol failed: %d format: a number is required, not NoneType > 00045: FAILED: createOutputStep, step 3 > 00046: 2020-11-12 13:46:48.991279 > 00047: *** Last status is failed > 00048: ------------------- PROTOCOL FAILED (DONE 3/3) > > run.log: > 2020-11-12 13:46:48.438416 > 00020: 2020-11-12 13:46:48,972 INFO: FINISHED: runRelionStep, step 2 > 00021: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.524619 > 00022: 2020-11-12 13:46:48,973 INFO: STARTED: createOutputStep, step 3 > 00023: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.973668 > 00024: 2020-11-12 13:46:49,485 ERROR: Protocol failed: %d format: a number is required, not NoneType > 00025: 2020-11-12 13:46:49,508 INFO: FAILED: createOutputStep, step 3 > 00026: 2020-11-12 13:46:49,508 INFO: 2020-11-12 13:46:48.991279 > 00027: 2020-11-12 13:46:49,570 INFO: ------------------- PROTOCOL FAILED (DONE 3/3) > > > > 2020年11月12日 上午2:19,Grigory Sharov <sha...@gm...> 写道: > > Hi Yangyang, > > I've tried your config with Scipion2 and it seems to work fine. The only problem I found was using curly quotes (“) instead of straight ones (") in the queues dictionary. Did you get the error message after the job was submitted and started to run or before? > > Best regards, > Grigory > > -------------------------------------------------------------------------------- > Grigory Sharov, Ph.D. > > MRC Laboratory of Molecular Biology, > Francis Crick Avenue, > Cambridge Biomedical Campus, > Cambridge CB2 0QH, UK. > tel. +44 (0) 1223 267228 > e-mail: gs...@mr... > > > On Wed, Nov 11, 2020 at 9:20 AM Yangyang Yi <yy...@si...> wrote: >> >> Dear Scipion users & devs, >> >> I am kindly asking for your advice. >> >> Now we are trying to set Scipion-2.0 on a slurm cluster. It could run on single machine but failed to submit the jobs to queue. Slurm cluster works well and running scipion on single node works. >> >> Here’s our settings for host.conf: >> >> host.conf: >> [localhost] >> PARALLEL_COMMAND = mpirun -np %_(JOB_NODES)d -bynode %_(COMMAND)s >> NAME = SLURM >> MANDATORY = 0 >> SUBMIT_COMMAND = sbatch %_(JOB_SCRIPT)s >> CANCEL_COMMAND = scancel %_(JOB_ID)s >> CHECK_COMMAND = squeue -j %_(JOB_ID)s >> SUBMIT_TEMPLATE = #!/bin/bash >> ####SBATCH --export=ALL >> #SBATCH -p %_(JOB_QUEUE)s >> #SBATCH -J %_(JOB_NAME)s >> #SBATCH -o %_(JOB_SCRIPT)s.out >> #SBATCH -e %_(JOB_SCRIPT)s.err >> #SBATCH --time=%_(JOB_TIME)s:00:00 >> #SBATCH --nodes=1 >> #SBATCH --ntasks=%_(JOB_NODES)d >> #SBATCH --cpus-per-task=%_(JOB_THREADS)d >> WORKDIR=$SLURM_JOB_SUBMIT_DIR >> export XMIPP_IN_QUEUE=1 >> cd $WORKDIR >> # Make a copy of node file >> echo $SLURM_JOB_NODELIST > %_(JOB_NODEFILE)s >> ### Display the job context >> echo Running on host `hostname` >> echo Time is `date` >> echo Working directory is `pwd` >> echo $SLURM_JOB_NODELIST >> echo CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES >> ################################# >> %_(JOB_COMMAND)s >> find "$SLURM_SUBMIT_DIR" -type f -user $USER -perm 644 -exec chmod 664 {} + >> QUEUES = { >> “a": [["JOB_TIME", "48", "Time (hours)", "Select the time expected (in hours) for this job"], >> ["NODES","1", "Nodes", "How many nodes required for all the nodes"], >> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual jobs to queue"]], >> “b": [["JOB_TIME", "48", "Time (hours)", "Select the time expected (in hours) for this job"], >> ["NODES","1", "Nodes", "How many nodes required for all the nodes"], >> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual jobs to queue"]], >> “c": [["JOB_MEMORY", "8192", "Memory (MB)", "Select amount of memory (in megabytes) for this job"], >> ["JOB_TIME", "48", "Time (hours)", "Select the time expected (in hours) for this job"], >> ["NODES","1", "Nodes", "How many nodes required for all the nodes"], >> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual jobs to queue"]] >> } >> JOB_DONE_REGEX = >> >> And the Scipion reports: >> typeerror: %d format: a number is required, not nonetype >> >> And suggestions about how to solve the problem? Thanks! >> _______________________________________________ >> scipion-users mailing list >> sci...@li... >> https://lists.sourceforge.net/lists/listinfo/scipion-users > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > > > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > > -- > Pablo Conesa - Madrid Scipion team > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users |
From: Lugmayr, W. <w.l...@uk...> - 2020-11-23 09:33:39
|
hi, i compared the template posted below with mine and i have the following differences in my template: MANDATORY = False #SBATCH --ntasks %_(JOB_NODES)s your template may need also to change: #SBATCH --nodes %_(NODES)s on our cluster the amount of mpi ntasks defines how many nodes you get. so i do not use the --nodes parameter. cheers, wolfgang From: "Pablo Conesa" <pc...@cn...> To: "Mailing list for Scipion users" <sci...@li...> Sent: Monday, 23 November, 2020 09:49:29 Subject: Re: [scipion-users] Questions about host.conf for Scipion on slurp cluster I think job went in! I see here more an issue when loading the starfile. Maybe iteration is None? [ https://github.com/scipion-em/scipion-em-relion/blob/support/relion/protocols/protocol_classify3d.py#L166 | https://github.com/scipion-em/scipion-em-relion/blob/support/relion/protocols/protocol_classify3d.py#L166 ] Grigory, Jose Miguel? On 23/11/20 9:32, Yangyang Yi wrote: Sorry for the late reply. Here’s the log from the job begin to the job end. run.stdout: 00001: RUNNING PROTOCOL ----------------- 00002: HostName: headnode.cm.cluster 00003: PID: 209177 00004: Scipion: v2.0 (2019-04-23) Diocletian 00005: currentDir: /ddn/users/spadm/ScipionUserData/projects/relion_benchmark 00006: workingDir: Runs/000347_ProtRelionClassify3D 00007: runMode: Continue 00008: MPI: 5 00009: threads: 2 00010: len(steps) 3 len(prevSteps) 0 00011: Starting at step: 1 00012: Running steps 00013: STARTED: convertInputStep, step 1 00014: 2020-11-12 13:46:13.708639 00015: Converting set from 'Runs/000002_ProtImportParticles/particles.sqlite' into 'Runs/000347_ProtRelionClassify3D/input_particles.star' 00016: convertBinaryFiles: creating soft links. 00017: Root: Runs/000347_ProtRelionClassify3D/extra/input -> Runs/000002_ProtImportParticles/extra 00018: FINISHED: convertInputStep, step 1 00019: 2020-11-12 13:46:48.416238 00020: STARTED: runRelionStep, step 2 00021: 2020-11-12 13:46:48.438416 00022: ** Submiting to queue: 'sbatch /ddn/users/spadm/ScipionUserData/projects/relion_benchmark/Runs/000347_ProtRelionClassify3D/logs/347-0-1.job' 00023: launched job with id 2552 00024: FINISHED: runRelionStep, step 2 00025: 2020-11-12 13:46:48.524619 00026: STARTED: createOutputStep, step 3 00027: 2020-11-12 13:46:48.973668 00028: Traceback (most recent call last): 00029: File "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/executor.py", line 151, in run 00030: self.step._run() # not self.step.run() , to avoid race conditions 00031: File "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 237, in _run 00032: resultFiles = self._runFunc() 00033: File "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 233, in _runFunc 00034: return self._func(*self._args) 00035: File "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", line 77, in createOutputStep 00036: self._fillClassesFromIter(classes3D, self._lastIter()) 00037: File "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", line 176, in _fillClassesFromIter 00038: self._loadClassesInfo(iteration) 00039: File "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", line 166, in _loadClassesInfo 00040: self._getFileName('model', iter=iteration)) 00041: File "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 841, in _getFileName 00042: return self.__filenamesDict[key] % kwargs 00043: TypeError: %d format: a number is required, not NoneType 00044: Protocol failed: %d format: a number is required, not NoneType 00045: FAILED: createOutputStep, step 3 00046: 2020-11-12 13:46:48.991279 00047: *** Last status is failed 00048: ------------------- PROTOCOL FAILED (DONE 3/3) run.log: 2020-11-12 13:46:48.438416 00020: 2020-11-12 13:46:48,972 INFO: FINISHED: runRelionStep, step 2 00021: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.524619 00022: 2020-11-12 13:46:48,973 INFO: STARTED: createOutputStep, step 3 00023: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.973668 00024: 2020-11-12 13:46:49,485 ERROR: Protocol failed: %d format: a number is required, not NoneType 00025: 2020-11-12 13:46:49,508 INFO: FAILED: createOutputStep, step 3 00026: 2020-11-12 13:46:49,508 INFO: 2020-11-12 13:46:48.991279 00027: 2020-11-12 13:46:49,570 INFO: ------------------- PROTOCOL FAILED (DONE 3/3) BQ_BEGIN 2020年11月12日 上午2:19,Grigory Sharov < [ mailto:sha...@gm... | sha...@gm... ] > 写道: Hi Yangyang, I've tried your config with Scipion2 and it seems to work fine. The only problem I found was using curly quotes (“) instead of straight ones ( " ) in the queues dictionary. Did you get the error message after the job was submitted and started to run or before? Best regards, Grigory -------------------------------------------------------------------------------- Grigory Sharov, Ph.D. MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK. tel. +44 (0) 1223 267228 e-mail: [ mailto:gs...@mr... | gs...@mr... ] On Wed, Nov 11, 2020 at 9:20 AM Yangyang Yi < [ mailto:yy...@si... | yy...@si... ] > wrote: BQ_BEGIN Dear Scipion users & devs, I am kindly asking for your advice. Now we are trying to set Scipion-2.0 on a slurm cluster. It could run on single machine but failed to submit the jobs to queue. Slurm cluster works well and running scipion on single node works. Here’s our settings for host.conf: host.conf: [localhost] PARALLEL_COMMAND = mpirun -np %_(JOB_NODES)d -bynode %_(COMMAND)s NAME = SLURM MANDATORY = 0 SUBMIT_COMMAND = sbatch %_(JOB_SCRIPT)s CANCEL_COMMAND = scancel %_(JOB_ID)s CHECK_COMMAND = squeue -j %_(JOB_ID)s SUBMIT_TEMPLATE = #!/bin/bash ####SBATCH --export=ALL #SBATCH -p %_(JOB_QUEUE)s #SBATCH -J %_(JOB_NAME)s #SBATCH -o %_(JOB_SCRIPT)s.out #SBATCH -e %_(JOB_SCRIPT)s.err #SBATCH --time=%_(JOB_TIME)s:00:00 #SBATCH --nodes=1 #SBATCH --ntasks=%_(JOB_NODES)d #SBATCH --cpus-per-task=%_(JOB_THREADS)d WORKDIR=$SLURM_JOB_SUBMIT_DIR export XMIPP_IN_QUEUE=1 cd $WORKDIR # Make a copy of node file echo $SLURM_JOB_NODELIST > %_(JOB_NODEFILE)s ### Display the job context echo Running on host `hostname` echo Time is `date` echo Working directory is `pwd` echo $SLURM_JOB_NODELIST echo CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES ################################# %_(JOB_COMMAND)s find "$SLURM_SUBMIT_DIR" -type f -user $USER -perm 644 -exec chmod 664 {} + QUEUES = { “a": [["JOB_TIME", "48", "Time (hours)", "Select the time expected (in hours) for this job"], ["NODES","1", "Nodes", "How many nodes required for all the nodes"], ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual jobs to queue"]], “b": [["JOB_TIME", "48", "Time (hours)", "Select the time expected (in hours) for this job"], ["NODES","1", "Nodes", "How many nodes required for all the nodes"], ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual jobs to queue"]], “c": [["JOB_MEMORY", "8192", "Memory (MB)", "Select amount of memory (in megabytes) for this job"], ["JOB_TIME", "48", "Time (hours)", "Select the time expected (in hours) for this job"], ["NODES","1", "Nodes", "How many nodes required for all the nodes"], ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual jobs to queue"]] } JOB_DONE_REGEX = And the Scipion reports: typeerror: %d format: a number is required, not nonetype And suggestions about how to solve the problem? Thanks! _______________________________________________ scipion-users mailing list [ mailto:sci...@li... | sci...@li... ] [ https://lists.sourceforge.net/lists/listinfo/scipion-users | https://lists.sourceforge.net/lists/listinfo/scipion-users ] _______________________________________________ scipion-users mailing list [ mailto:sci...@li... | sci...@li... ] [ https://lists.sourceforge.net/lists/listinfo/scipion-users | https://lists.sourceforge.net/lists/listinfo/scipion-users ] BQ_END _______________________________________________ scipion-users mailing list [ mailto:sci...@li... | sci...@li... ] [ https://lists.sourceforge.net/lists/listinfo/scipion-users | https://lists.sourceforge.net/lists/listinfo/scipion-users ] BQ_END -- Pablo Conesa - Madrid [ http://scipion.i2pc.es/ | Scipion ] team _______________________________________________ scipion-users mailing list sci...@li... https://lists.sourceforge.net/lists/listinfo/scipion-users |
From: Pablo C. <pc...@cn...> - 2020-11-23 08:49:44
|
I think job went in! I see here more an issue when loading the starfile. Maybe iteration is None? https://github.com/scipion-em/scipion-em-relion/blob/support/relion/protocols/protocol_classify3d.py#L166 Grigory, Jose Miguel? On 23/11/20 9:32, Yangyang Yi wrote: > Sorry for the late reply. Here’s the log from the job begin to the job > end. > run.stdout: > > 00001: RUNNING PROTOCOL ----------------- > 00002: HostName: headnode.cm.cluster > 00003: PID: 209177 > 00004: Scipion: v2.0 (2019-04-23) Diocletian > 00005: currentDir: > /ddn/users/spadm/ScipionUserData/projects/relion_benchmark > 00006: workingDir: Runs/000347_ProtRelionClassify3D > 00007: runMode: Continue > 00008: MPI: 5 > 00009: threads: 2 > 00010: len(steps) 3 len(prevSteps) 0 > 00011: Starting at step: 1 > 00012: Running steps > 00013: STARTED: convertInputStep, step 1 > 00014: 2020-11-12 13:46:13.708639 > 00015: Converting set from > 'Runs/000002_ProtImportParticles/particles.sqlite' into > 'Runs/000347_ProtRelionClassify3D/input_particles.star' > 00016: convertBinaryFiles: creating soft links. > 00017: Root: Runs/000347_ProtRelionClassify3D/extra/input -> > Runs/000002_ProtImportParticles/extra > 00018: FINISHED: convertInputStep, step 1 > 00019: 2020-11-12 13:46:48.416238 > 00020: STARTED: runRelionStep, step 2 > 00021: 2020-11-12 13:46:48.438416 > 00022: ** Submiting to queue: 'sbatch > /ddn/users/spadm/ScipionUserData/projects/relion_benchmark/Runs/000347_ProtRelionClassify3D/logs/347-0-1.job' > 00023: launched job with id 2552 > 00024: FINISHED: runRelionStep, step 2 > 00025: 2020-11-12 13:46:48.524619 > 00026: STARTED: createOutputStep, step 3 > 00027: 2020-11-12 13:46:48.973668 > 00028: Traceback (most recent call last): > 00029: File > "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/executor.py", line > 151, in run > 00030: self.step._run() # not self.step.run() , to avoid race conditions > 00031: File > "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line > 237, in _run > 00032: resultFiles = self._runFunc() > 00033: File > "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line > 233, in _runFunc > 00034: return self._func(*self._args) > 00035: File > "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", > line 77, in createOutputStep > 00036: self._fillClassesFromIter(classes3D, self._lastIter()) > 00037: File > "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", > line 176, in _fillClassesFromIter > 00038: self._loadClassesInfo(iteration) > 00039: File > "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py", > line 166, in _loadClassesInfo > 00040: self._getFileName('model', iter=iteration)) > 00041: File > "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line > 841, in _getFileName > 00042: return self.__filenamesDict[key] % kwargs > 00043: TypeError: %d format: a number is required, not NoneType > 00044: Protocol failed: %d format: a number is required, not NoneType > 00045: FAILED: createOutputStep, step 3 > 00046: 2020-11-12 13:46:48.991279 > 00047: *** Last status is failed > 00048: ------------------- PROTOCOL FAILED (DONE 3/3) > > run.log: > 2020-11-12 13:46:48.438416 > 00020: 2020-11-12 13:46:48,972 INFO: FINISHED: runRelionStep, step 2 > 00021: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.524619 > 00022: 2020-11-12 13:46:48,973 INFO: STARTED: createOutputStep, step 3 > 00023: 2020-11-12 13:46:48,973 INFO: 2020-11-12 13:46:48.973668 > 00024: 2020-11-12 13:46:49,485 ERROR: Protocol failed: %d format: a > number is required, not NoneType > 00025: 2020-11-12 13:46:49,508 INFO: FAILED: createOutputStep, step 3 > 00026: 2020-11-12 13:46:49,508 INFO: 2020-11-12 13:46:48.991279 > 00027: 2020-11-12 13:46:49,570 INFO: ------------------- PROTOCOL > FAILED (DONE 3/3) > > > >> 2020年11月12日 上午2:19,Grigory Sharov <sha...@gm... >> <mailto:sha...@gm...>> 写道: >> >> Hi Yangyang, >> >> I've tried your config with Scipion2 and it seems to work fine. The >> only problem I found was using curly quotes (“) instead of straight >> ones (") in the queues dictionary. Did you get the error message >> after the job was submitted and started to run or before? >> >> Best regards, >> Grigory >> >> -------------------------------------------------------------------------------- >> Grigory Sharov, Ph.D. >> >> MRC Laboratory of Molecular Biology, >> Francis Crick Avenue, >> Cambridge Biomedical Campus, >> Cambridge CB2 0QH, UK. >> tel. +44 (0) 1223 267228 <tel:+44%201223%20267228> >> e-mail: gs...@mr... <mailto:gs...@mr...> >> >> >> On Wed, Nov 11, 2020 at 9:20 AM Yangyang Yi <yy...@si... >> <mailto:yy...@si...>> wrote: >> >> Dear Scipion users & devs, >> >> I am kindly asking for your advice. >> >> Now we are trying to set Scipion-2.0 on a slurm cluster. It could >> run on single machine but failed to submit the jobs to queue. >> Slurm cluster works well and running scipion on single node works. >> >> Here’s our settings for host.conf: >> >> host.conf: >> [localhost] >> PARALLEL_COMMAND = mpirun -np %_(JOB_NODES)d -bynode %_(COMMAND)s >> NAME = SLURM >> MANDATORY = 0 >> SUBMIT_COMMAND = sbatch %_(JOB_SCRIPT)s >> CANCEL_COMMAND = scancel %_(JOB_ID)s >> CHECK_COMMAND = squeue -j %_(JOB_ID)s >> SUBMIT_TEMPLATE = #!/bin/bash >> ####SBATCH --export=ALL >> #SBATCH -p %_(JOB_QUEUE)s >> #SBATCH -J %_(JOB_NAME)s >> #SBATCH -o %_(JOB_SCRIPT)s.out >> #SBATCH -e %_(JOB_SCRIPT)s.err >> #SBATCH --time=%_(JOB_TIME)s:00:00 >> #SBATCH --nodes=1 >> #SBATCH --ntasks=%_(JOB_NODES)d >> #SBATCH --cpus-per-task=%_(JOB_THREADS)d >> WORKDIR=$SLURM_JOB_SUBMIT_DIR >> export XMIPP_IN_QUEUE=1 >> cd $WORKDIR >> # Make a copy of node file >> echo $SLURM_JOB_NODELIST > %_(JOB_NODEFILE)s >> ### Display the job context >> echo Running on host `hostname` >> echo Time is `date` >> echo Working directory is `pwd` >> echo $SLURM_JOB_NODELIST >> echo CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES >> ################################# >> %_(JOB_COMMAND)s >> find "$SLURM_SUBMIT_DIR" -type f -user $USER -perm 644 -exec >> chmod 664 {} + >> QUEUES = { >> “a": [["JOB_TIME", "48", "Time (hours)", "Select the time >> expected (in hours) for this job"], >> ["NODES","1", "Nodes", "How many nodes required for all the nodes"], >> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual >> jobs to queue"]], >> “b": [["JOB_TIME", "48", "Time (hours)", "Select the time >> expected (in hours) for this job"], >> ["NODES","1", "Nodes", "How many nodes required for all the nodes"], >> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual >> jobs to queue"]], >> “c": [["JOB_MEMORY", "8192", "Memory (MB)", "Select amount of >> memory (in megabytes) for this job"], >> ["JOB_TIME", "48", "Time (hours)", "Select the time expected (in >> hours) for this job"], >> ["NODES","1", "Nodes", "How many nodes required for all the nodes"], >> ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual >> jobs to queue"]] >> } >> JOB_DONE_REGEX = >> >> And the Scipion reports: >> typeerror: %d format: a number is required, not nonetype >> >> And suggestions about how to solve the problem? Thanks! >> _______________________________________________ >> scipion-users mailing list >> sci...@li... >> <mailto:sci...@li...> >> https://lists.sourceforge.net/lists/listinfo/scipion-users >> >> _______________________________________________ >> scipion-users mailing list >> sci...@li... >> <mailto:sci...@li...> >> https://lists.sourceforge.net/lists/listinfo/scipion-users > > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users -- Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* |