scipion-users Mailing List for scipion-xmipp (Page 31)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

It could be also that job scripts starting with a number are not allowed by
Slurm: Runs/000429_ProtRelionClassify2D/logs/429

In this case SUBMIT_PREFIX is required:
https://scipion-em.github.io/docs/docs/scipion-modes/host-configuration.html


Best regards,
Grigory

--------------------------------------------------------------------------------
Grigory Sharov, Ph.D.

MRC Laboratory of Molecular Biology,
Francis Crick Avenue,
Cambridge Biomedical Campus,
Cambridge CB2 0QH, UK.
tel. +44 (0) 1223 267228 <+44%201223%20267228>
e-mail: gs...@mr...


On Thu, Nov 26, 2020 at 10:58 AM Pablo Conesa <pc...@cn...> wrote:

> Just an idea!!
>
> Maybe the "working directory" is not been passed to the node, thus working
> on home and failing?
> On 26/11/20 11:02, Grigory Sharov wrote:
>
> Hi,
>
> the place where you start scipion is not relevant, does SCIPION_USER_DATA
> variable in the config file point to a shared writable location?
>
> Best regards,
> Grigory
>
>
> --------------------------------------------------------------------------------
> Grigory Sharov, Ph.D.
>
> MRC Laboratory of Molecular Biology,
> Francis Crick Avenue,
> Cambridge Biomedical Campus,
> Cambridge CB2 0QH, UK.
> tel. +44 (0) 1223 267228 <+44%201223%20267228>
> e-mail: gs...@mr...
>
>
> On Thu, Nov 26, 2020 at 8:57 AM Yangyang Yi <yy...@si...> wrote:
>
>> I have check the job more carefully and also tried other jobs.
>> For the relion Class3D job, (I used relion_benchmark dataset) it reported
>> “Output Directory not exist” in Runs/000429_ProtRelionClassify2D/logs/429:
>> ERROR:
>> ERROR: output directory does not exist!
>> /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaab935e555]
>> /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi()
>> [0x436a8f]
>> ==================
>> ERROR:
>> ERROR: output directory does not exist!
>> /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41)
>> [0x447f91]
>> /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(_ZN11MlOptimiser17initialiseGeneralEi+0x248f)
>> [0x5ada9f]
>> /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(_ZN14MlOptimiserMpi10initialiseEv+0x998)
>> [0x4689f8]
>> /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi(main+0xb2f)
>> [0x4336ff]
>> /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaab935e555]
>> /cm/shared/apps/scipion/2.0/software/em/relion-3.0/bin/relion_refine_mpi()
>> [0x436a8f]
>>
>> I have run some other job and they reported similar error, such as
>> “Cannot open tif file” (in MotionCor2) or "forrtl: severe (24): end-of-file
>> during read, unit 5, file /proc/226944/fd/0” (in unblur). For
>> cisTEM-unblur, it reports:
>> IOError: [Errno 2] No such file or
>> directory:'Runs/000250_CistemProtUnblur/extra/May08_03.05.02_shifts.txt’,
>> but I found these files in my home directory where I opened scipion.
>>
>> I am suspecting the raw data’s location matters. Since my testing data
>> was located in our cluster shared storage, the permission is root and all
>> the people could read but cannot modify (like /data/tutorial_data/). Those
>> data has been used for software teaching or testing before, I’m sure that
>> all the users could process them outside scipion. But I started scipion in
>> my home directory which was located in cluster shared storage (like
>> /data/users/xxxlab/xxx). Is there anything I should take care about?
>>
>> I will also try scipion-3.0 to see if it works.
>>
>> 2020年11月23日 下午4:32，Yangyang Yi <yy...@si...> 写道：
>>
>> Sorry for the late reply. Here’s the log from the job begin to the job
>> end.
>> run.stdout:
>>
>> 00001:   RUNNING PROTOCOL -----------------
>> 00002:        HostName: headnode.cm.cluster
>> 00003:             PID: 209177
>> 00004:         Scipion: v2.0 (2019-04-23) Diocletian
>> 00005:      currentDir:
>> /ddn/users/spadm/ScipionUserData/projects/relion_benchmark
>> 00006:      workingDir: Runs/000347_ProtRelionClassify3D
>> 00007:         runMode: Continue
>> 00008:             MPI: 5
>> 00009:         threads: 2
>> 00010:   len(steps) 3 len(prevSteps) 0
>> 00011:    Starting at step: 1
>> 00012:    Running steps
>> 00013:   STARTED: convertInputStep, step 1
>> 00014:     2020-11-12 13:46:13.708639
>> 00015:   Converting set from
>> 'Runs/000002_ProtImportParticles/particles.sqlite' into
>> 'Runs/000347_ProtRelionClassify3D/input_particles.star'
>> 00016:   convertBinaryFiles: creating soft links.
>> 00017:      Root: Runs/000347_ProtRelionClassify3D/extra/input ->
>> Runs/000002_ProtImportParticles/extra
>> 00018:   FINISHED: convertInputStep, step 1
>> 00019:     2020-11-12 13:46:48.416238
>> 00020:   STARTED: runRelionStep, step 2
>> 00021:     2020-11-12 13:46:48.438416
>> 00022:   ** Submiting to queue: 'sbatch
>> /ddn/users/spadm/ScipionUserData/projects/relion_benchmark/Runs/000347_ProtRelionClassify3D/logs/347-0-1.job'
>> 00023:   launched job with id 2552
>> 00024:   FINISHED: runRelionStep, step 2
>> 00025:     2020-11-12 13:46:48.524619
>> 00026:   STARTED: createOutputStep, step 3
>> 00027:     2020-11-12 13:46:48.973668
>> 00028:   Traceback (most recent call last):
>> 00029:     File
>> "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/executor.py", line 151, in
>> run
>> 00030:       self.step._run()  # not self.step.run() , to avoid race
>> conditions
>> 00031:     File
>> "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 237, in
>> _run
>> 00032:       resultFiles = self._runFunc()
>> 00033:     File
>> "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 233, in
>> _runFunc
>> 00034:       return self._func(*self._args)
>> 00035:     File
>> "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py",
>> line 77, in createOutputStep
>> 00036:       self._fillClassesFromIter(classes3D, self._lastIter())
>> 00037:     File
>> "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py",
>> line 176, in _fillClassesFromIter
>> 00038:       self._loadClassesInfo(iteration)
>> 00039:     File
>> "/cm/shared/apps/scipion/2.0/software/lib/python2.7/site-packages/relion/protocols/protocol_classify3d.py",
>> line 166, in _loadClassesInfo
>> 00040:       self._getFileName('model', iter=iteration))
>> 00041:     File
>> "/cm/shared/apps/scipion/2.0/pyworkflow/protocol/protocol.py", line 841, in
>> _getFileName
>> 00042:       return self.__filenamesDict[key] % kwargs
>> 00043:   TypeError: %d format: a number is required, not NoneType
>> 00044:   Protocol failed: %d format: a number is required, not NoneType
>> 00045:   FAILED: createOutputStep, step 3
>> 00046:     2020-11-12 13:46:48.991279
>> 00047:   *** Last status is failed
>> 00048:   ------------------- PROTOCOL FAILED (DONE 3/3)
>>
>> run.log:
>>     2020-11-12 13:46:48.438416
>> 00020:   2020-11-12 13:46:48,972 INFO:  FINISHED: runRelionStep, step 2
>> 00021:   2020-11-12 13:46:48,973 INFO:    2020-11-12 13:46:48.524619
>> 00022:   2020-11-12 13:46:48,973 INFO:  STARTED: createOutputStep, step 3
>> 00023:   2020-11-12 13:46:48,973 INFO:    2020-11-12 13:46:48.973668
>> 00024:   2020-11-12 13:46:49,485 ERROR:  Protocol failed: %d format: a
>> number is required, not NoneType
>> 00025:   2020-11-12 13:46:49,508 INFO:  FAILED: createOutputStep, step 3
>> 00026:   2020-11-12 13:46:49,508 INFO:    2020-11-12 13:46:48.991279
>> 00027:   2020-11-12 13:46:49,570 INFO:  ------------------- PROTOCOL
>> FAILED (DONE 3/3)
>>
>>
>>
>> 2020年11月12日 上午2:19，Grigory Sharov <sha...@gm...> 写道：
>>
>> Hi Yangyang,
>>
>> I've tried your config with Scipion2 and it seems to work fine. The only
>> problem I found was using curly quotes (“) instead of straight ones (")
>> in the queues dictionary. Did you get the error message after the job was
>> submitted and started to run or before?
>>
>> Best regards,
>> Grigory
>>
>>
>> --------------------------------------------------------------------------------
>> Grigory Sharov, Ph.D.
>>
>> MRC Laboratory of Molecular Biology,
>> Francis Crick Avenue,
>> Cambridge Biomedical Campus,
>> Cambridge CB2 0QH, UK.
>> tel. +44 (0) 1223 267228 <+44%201223%20267228>
>> e-mail: gs...@mr...
>>
>>
>> On Wed, Nov 11, 2020 at 9:20 AM Yangyang Yi <yy...@si...>
>> wrote:
>>
>>> Dear Scipion users & devs,
>>>
>>> I am kindly asking for your advice.
>>>
>>> Now we are trying to set Scipion-2.0 on a slurm cluster. It could run on
>>> single machine but failed to submit the jobs to queue. Slurm cluster works
>>> well and running scipion on single node works.
>>>
>>> Here’s our settings for host.conf:
>>>
>>> host.conf:
>>> [localhost]
>>> PARALLEL_COMMAND = mpirun -np %_(JOB_NODES)d -bynode %_(COMMAND)s
>>> NAME = SLURM
>>> MANDATORY = 0
>>> SUBMIT_COMMAND = sbatch %_(JOB_SCRIPT)s
>>> CANCEL_COMMAND = scancel %_(JOB_ID)s
>>> CHECK_COMMAND = squeue -j %_(JOB_ID)s
>>> SUBMIT_TEMPLATE = #!/bin/bash
>>>         ####SBATCH --export=ALL
>>>         #SBATCH -p %_(JOB_QUEUE)s
>>>         #SBATCH -J %_(JOB_NAME)s
>>>         #SBATCH -o %_(JOB_SCRIPT)s.out
>>>         #SBATCH -e %_(JOB_SCRIPT)s.err
>>>         #SBATCH --time=%_(JOB_TIME)s:00:00
>>>         #SBATCH --nodes=1
>>>         #SBATCH --ntasks=%_(JOB_NODES)d
>>>         #SBATCH --cpus-per-task=%_(JOB_THREADS)d
>>>         WORKDIR=$SLURM_JOB_SUBMIT_DIR
>>>         export XMIPP_IN_QUEUE=1
>>>         cd $WORKDIR
>>>         # Make a copy of node file
>>>         echo $SLURM_JOB_NODELIST > %_(JOB_NODEFILE)s
>>>         ### Display the job context
>>>         echo Running on host `hostname`
>>>         echo Time is `date`
>>>         echo Working directory is `pwd`
>>>         echo $SLURM_JOB_NODELIST
>>>         echo CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES
>>>         #################################
>>>         %_(JOB_COMMAND)s
>>>         find "$SLURM_SUBMIT_DIR" -type f -user $USER -perm 644 -exec
>>> chmod 664 {} +
>>> QUEUES = {
>>>         “a": [["JOB_TIME", "48", "Time (hours)", "Select the time
>>> expected (in hours) for this job"],
>>>         ["NODES","1", "Nodes", "How many nodes required for all the
>>> nodes"],
>>>         ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual
>>> jobs to queue"]],
>>>         “b": [["JOB_TIME", "48", "Time (hours)", "Select the time
>>> expected (in hours) for this job"],
>>>         ["NODES","1", "Nodes", "How many nodes required for all the
>>> nodes"],
>>>         ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual
>>> jobs to queue"]],
>>>         “c": [["JOB_MEMORY", "8192", "Memory (MB)", "Select amount of
>>> memory (in megabytes) for this job"],
>>>         ["JOB_TIME", "48", "Time (hours)", "Select the time expected (in
>>> hours) for this job"],
>>>         ["NODES","1", "Nodes", "How many nodes required for all the
>>> nodes"],
>>>         ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual
>>> jobs to queue"]]
>>>         }
>>> JOB_DONE_REGEX =
>>>
>>> And the Scipion reports:
>>> typeerror: %d format: a number is required, not nonetype
>>>
>>> And suggestions about how to solve the problem? Thanks!
>>> _______________________________________________
>>> scipion-users mailing list
>>> sci...@li...
>>> https://lists.sourceforge.net/lists/listinfo/scipion-users
>>>
>> _______________________________________________
>> scipion-users mailing list
>> sci...@li...
>> https://lists.sourceforge.net/lists/listinfo/scipion-users
>>
>>
>>
>> _______________________________________________
>> scipion-users mailing list
>> sci...@li...
>> https://lists.sourceforge.net/lists/listinfo/scipion-users
>>
>
>
> _______________________________________________
> scipion-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/scipion-users
>
> --
> Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team*
> _______________________________________________
> scipion-users mailing list
> sci...@li...
> https://lists.sourceforge.net/lists/listinfo/scipion-users
>

2016	Jan (2)	Feb (13)	Mar (9)	Apr (4)	May (5)	Jun (2)	Jul (8)	Aug (3)	Sep (25)	Oct (7)	Nov (49)	Dec (15)
2017	Jan (24)	Feb (36)	Mar (53)	Apr (44)	May (37)	Jun (34)	Jul (12)	Aug (15)	Sep (14)	Oct (9)	Nov (9)	Dec (7)
2018	Jan (16)	Feb (9)	Mar (27)	Apr (39)	May (8)	Jun (24)	Jul (22)	Aug (11)	Sep (1)	Oct	Nov	Dec
2019	Jan (4)	Feb (5)	Mar	Apr (1)	May (21)	Jun (13)	Jul (31)	Aug (22)	Sep (9)	Oct (19)	Nov (24)	Dec (12)
2020	Jan (30)	Feb (12)	Mar (16)	Apr (4)	May (37)	Jun (17)	Jul (19)	Aug (15)	Sep (26)	Oct (84)	Nov (64)	Dec (55)
2021	Jan (18)	Feb (58)	Mar (26)	Apr (88)	May (51)	Jun (36)	Jul (31)	Aug (37)	Sep (79)	Oct (15)	Nov (29)	Dec (8)
2022	Jan (5)	Feb (8)	Mar (29)	Apr (21)	May (11)	Jun (11)	Jul (18)	Aug (16)	Sep (6)	Oct (10)	Nov (23)	Dec (1)
2023	Jan (18)	Feb	Mar (4)	Apr	May (3)	Jun (10)	Jul (1)	Aug	Sep	Oct (1)	Nov (3)	Dec (5)
2024	Jan (2)	Feb	Mar	Apr	May	Jun (1)	Jul	Aug	Sep	Oct	Nov	Dec
2025	Jan (1)	Feb	Mar	Apr (5)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

scipion-users Mailing List for scipion-xmipp (Page 31)

Image processing framework to integrate EM software packages.

scipion-users — Mailing list for Scipion users