From: Anastasina, M. <mar...@he...> - 2022-08-29 13:06:52
|
Hi Pablo and Grigory, Thank you so much for kindly helping me to figure out the issue! I run the job on a cluster requesting 1 node with 1 MPI and 2 threads (reading your comments I suppose generally I should try more threads for faster calculation, right?). Aligned micrographs were ready before launching CTFFind4, so I did not use streaming and left the corresponding tab unchanged (= sleep when waiting 0 sec, batch size 1). As I am running the job on a HPC cluster, I assume filesystem should not be an issue? Strangely, the run.stderr file is empty (literally zero bytes), so nothing useful to attach here. The run.stdout file is very big as the number of micrographs is so large, but I copy its header and the problematic part below in case it is useful. Best regards, Maria Hostname: r18g05.bullx PID: 39966 pyworkflow: 3.0.16 plugin: cistem plugin v: 3.1.0 currentDir: /scratch/project_2004278/ScipionUserData/projects/immTBEV workingDir: Runs/097922_CistemProtCTFFind runMode: Continue MPI: 1 threads: 2 Starting at step: 1 Running steps ESC[35mSTARTEDESC[0m: estimateCtfStep, step 1, time 2022-08-18 12:25:28.707054 Estimating CTF of micrograph: 1 Estimating CTF of micrograph: 35874 ^[[32m /appl/soft/math/scipion/3.0.7/software/em/cistem-1.0.0-beta/ctffind << eof > Runs/097922_CistemProtCTFFind/extra/GridSquare_17902459_Data_FoilHole_$ Runs/097922_CistemProtCTFFind/tmp/mic_035874/GridSquare_17902459_Data_FoilHole_17908572_Data_17904056_17904058_20220320_195052_fractions_aligned_mic_DW.mrc Runs/097922_CistemProtCTFFind/extra/GridSquare_17902459_Data_FoilHole_17908572_Data_17904056_17904058_20220320_195052_fractions_aligned_mic_DW_ctf.mrc 1.104100 300.000000 2.700000 0.100000 512 36.800000 2.570000 5000.000000 40000.000000 200.000000 no no yes 100.000000 no no eof ^[[0m Error trying to update output of protocol, tries=1 Error trying to update output of protocol, tries=2 Error trying to update output of protocol, tries=3 Error trying to update output of protocol, tries=4 Traceback (most recent call last): File "/appl/soft/math/scipion/3.0.7/.scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 470, in __tryUpdateOutputSet outputSet.write() # Write to commit changes File "/appl/soft/math/scipion/3.0.7/.scipion3/lib/python3.8/site-packages/pyworkflow/object.py", line 1165, in write self._getMapper().commit() File "/appl/soft/math/scipion/3.0.7/.scipion3/lib/python3.8/site-packages/pyworkflow/mapper/sqlite.py", line 762, in commit self.db.commit() sqlite3.OperationalError: database is locked On 29. Aug 2022, at 13.42, Grigory Sharov <sha...@gm...<mailto:sha...@gm...>> wrote: Hi Maria, Pablo is correct here. Could you please tell is what number of threads/mpi you used and also what parameters you set on streaming tab. Are you running the job on a cluster? Maybe you can attach here run.stderr? Grigory On Mon, Aug 29, 2022, 11:17 Pablo Conesa <pc...@cn...<mailto:pc...@cn...>> wrote: Thank you Maria! The error you are reporting may be unrelated with "CTFFind4 in Scipion". “sqlite3.OperationalError: database is locked" Is related, probably to concurrency. There are several factors that may cause this. threads: If you specify a high number of threads, many of them will process at the same time CTFs. This is desirable and we have made the effort to enable as many threads as available, but if this happens reducing the threads may help. streaming: Streaming processing implied many active protocols that favors concurrency. filesystem: A slow filesystem (e.g.: network filesystem mounted over a a slow ethernet connection) may provide locks. In theory, CTFfind and many others, should cope nicely with large datasets, or at least is being doing this for a long time without any issue. This may be a specific issue. On 29/8/22 9:07, Anastasina, Maria wrote: Dear Scipion users, I’m get an unexpected error when running CTFFind4 in Scipion 3.0.8 on a large (50K) set of micrographs. The protocol fails after processing about 70 % of micrographs with the error: “sqlite3.OperationalError: database is locked”. Sometimes re-launching in continuation mode helps to process few more micrographs, but the protocol will eventually fail again with the same error. Breaking the data set in smaller chunks (about 10K micrographs each) helps and the protocol ran in the same conditions succeeds for all the micrographs. I wonder does this indicate that CTFFind4 has problems handling large datasets and should be used on smaller datasets? Or is there possibly another problem that I am not noticing? I’m not attaching run.stdout as it is pretty large (31MB), if need I can upload it to eg grdive. Thank you, Maria _______________________________________________ scipion-users mailing list sci...@li...<mailto:sci...@li...> https://lists.sourceforge.net/lists/listinfo/scipion-users -- Pablo Conesa - Madrid Scipion<http://scipion.i2pc.es/> team _______________________________________________ scipion-users mailing list sci...@li...<mailto:sci...@li...> https://lists.sourceforge.net/lists/listinfo/scipion-users _______________________________________________ scipion-users mailing list sci...@li...<mailto:sci...@li...> https://lists.sourceforge.net/lists/listinfo/scipion-users |