Re: [scipion-users] CTFFind4 failing on large datasets

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Pablo and Grigory,

Thank you so much for kindly helping me to figure out the issue!

I run the job on a cluster requesting 1 node with 1 MPI and 2 threads (reading your comments I suppose generally I should try more threads for faster calculation, right?). Aligned micrographs were ready before launching CTFFind4, so I did not use streaming and left the corresponding tab unchanged (= sleep when waiting 0 sec, batch size 1).

As I am running the job on a HPC cluster, I assume filesystem should not be an issue?

Strangely, the run.stderr file is empty (literally zero bytes), so nothing useful to attach here. The run.stdout file is very big as the number of micrographs is so large, but I copy its header and the problematic part below in case it is useful.

Best regards,
Maria

Hostname: r18g05.bullx
PID: 39966
pyworkflow: 3.0.16
plugin: cistem
plugin v: 3.1.0
currentDir: /scratch/project_2004278/ScipionUserData/projects/immTBEV
workingDir: Runs/097922_CistemProtCTFFind
runMode: Continue
          MPI: 1
      threads: 2
 Starting at step: 1
 Running steps
ESC[35mSTARTEDESC[0m: estimateCtfStep, step 1, time 2022-08-18 12:25:28.707054
Estimating CTF of micrograph: 1

Estimating CTF of micrograph: 35874
^[[32m /appl/soft/math/scipion/3.0.7/software/em/cistem-1.0.0-beta/ctffind    << eof > Runs/097922_CistemProtCTFFind/extra/GridSquare_17902459_Data_FoilHole_$
Runs/097922_CistemProtCTFFind/tmp/mic_035874/GridSquare_17902459_Data_FoilHole_17908572_Data_17904056_17904058_20220320_195052_fractions_aligned_mic_DW.mrc
Runs/097922_CistemProtCTFFind/extra/GridSquare_17902459_Data_FoilHole_17908572_Data_17904056_17904058_20220320_195052_fractions_aligned_mic_DW_ctf.mrc
1.104100
300.000000
2.700000
0.100000
512
36.800000
2.570000
5000.000000
40000.000000
200.000000
no
no
yes
100.000000
no
no
eof

^[[0m
Error trying to update output of protocol, tries=1
Error trying to update output of protocol, tries=2
Error trying to update output of protocol, tries=3
Error trying to update output of protocol, tries=4
Traceback (most recent call last):
  File "/appl/soft/math/scipion/3.0.7/.scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 470, in __tryUpdateOutputSet
    outputSet.write()  # Write to commit changes
  File "/appl/soft/math/scipion/3.0.7/.scipion3/lib/python3.8/site-packages/pyworkflow/object.py", line 1165, in write
    self._getMapper().commit()
  File "/appl/soft/math/scipion/3.0.7/.scipion3/lib/python3.8/site-packages/pyworkflow/mapper/sqlite.py", line 762, in commit
    self.db.commit()
sqlite3.OperationalError: database is locked

On 29. Aug 2022, at 13.42, Grigory Sharov <sha...@gm...<mailto:sha...@gm...>> wrote:

Hi Maria,

Pablo is correct here. Could you please tell is what number of threads/mpi you used and also what parameters you set on streaming tab. Are you running the job on a cluster? Maybe you can attach here run.stderr?

Grigory

On Mon, Aug 29, 2022, 11:17 Pablo Conesa <pc...@cn...<mailto:pc...@cn...>> wrote:

Thank you Maria!

The error you are reporting may be unrelated with "CTFFind4 in Scipion".

“sqlite3.OperationalError: database is locked"

Is related, probably to concurrency. There are several factors that may cause this.

threads: If you specify a high number of threads, many of them will process at the same time CTFs. This is desirable and we have made the effort  to enable as many threads as available, but if this happens reducing the threads may help.

streaming: Streaming processing implied many active protocols that favors concurrency.

filesystem: A slow filesystem (e.g.: network filesystem mounted over a a slow ethernet connection) may provide locks.

In theory, CTFfind and many others, should cope nicely with large datasets, or at least is being doing this for a long time without any issue. This may be a specific issue.

On 29/8/22 9:07, Anastasina, Maria wrote:

Dear Scipion users,

I’m get an unexpected error when running CTFFind4 in Scipion 3.0.8 on a large (50K) set of micrographs. The protocol fails after processing about 70 % of micrographs with the error: “sqlite3.OperationalError: database is locked”.

Sometimes re-launching in continuation mode helps to process few more micrographs, but the protocol will eventually fail again with the same error. Breaking the data set in smaller chunks (about 10K micrographs each) helps and the protocol ran in the same conditions succeeds for all the micrographs.

I wonder does this indicate that CTFFind4 has problems handling large datasets and should be used on smaller datasets? Or is there possibly another problem that I am not noticing? I’m not attaching run.stdout as it is pretty large (31MB), if need I can upload it to eg grdive.

Thank you,
Maria

_______________________________________________
scipion-users mailing list
sci...@li...<mailto:sci...@li...>
https://lists.sourceforge.net/lists/listinfo/scipion-users

--
Pablo Conesa - Madrid Scipion<http://scipion.i2pc.es/> team
_______________________________________________
scipion-users mailing list
sci...@li...<mailto:sci...@li...>
https://lists.sourceforge.net/lists/listinfo/scipion-users
_______________________________________________
scipion-users mailing list
sci...@li...<mailto:sci...@li...>
https://lists.sourceforge.net/lists/listinfo/scipion-users

Re: [scipion-users] CTFFind4 failing on large datasets

Image processing framework to integrate EM software packages.

Re: [scipion-users] CTFFind4 failing on large datasets