From: Grigory S. <sha...@gm...> - 2022-08-29 15:24:04
|
Hi, Yes, you could increase the number of threads and batch size as well. Despite the fact that ctffind4 cant process multiple mics at once, it can reduce the overhead and calls to update the database. E.g. if you have 30000 mics and submit a job with 24 threads and batch size = 24, then you will have 30000/24=1250 batches (protocol steps), each batch will run 24 mics in parallel (since 24 threads will run 24 ctffind jobs for each single micrograph) Best regards, Grigory -------------------------------------------------------------------------------- Grigory Sharov, Ph.D. MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK. tel. +44 (0) 1223 267228 <+44%201223%20267228> e-mail: gs...@mr... On Mon, Aug 29, 2022 at 4:12 PM Pablo Conesa <pc...@cn...> wrote: > Thanks Maria! > > I can see 2 cases here: > > Either your cluster filesystem is busy and can't cope with "high demands" > (this is kindof our fault) of Scipion needs to write data in sqlite. > > Or I've seen this message recently also when quota is exceeded. As a > cluster user you may have a quota. It may worth trying to check it. > > > On 29/8/22 14:50, Anastasina, Maria wrote: > > Hi Pablo and Grigory, > > Thank you so much for kindly helping me to figure out the issue! > > I run the job on a cluster requesting 1 node with 1 MPI and 2 threads > (reading your comments I suppose generally I should try more threads for > faster calculation, right?). Aligned micrographs were ready before > launching CTFFind4, so I did not use streaming and left the corresponding > tab unchanged (= sleep when waiting 0 sec, batch size 1). > > As I am running the job on a HPC cluster, I assume filesystem should not > be an issue? > > Strangely, the run.stderr file is empty (literally zero bytes), so nothing > useful to attach here. The run.stdout file is very big as the number of > micrographs is so large, but I copy its header and the problematic part > below in case it is useful. > > Best regards, > Maria > > Hostname: r18g05.bullx > PID: 39966 > pyworkflow: 3.0.16 > plugin: cistem > plugin v: 3.1.0 > currentDir: /scratch/project_2004278/ScipionUserData/projects/immTBEV > workingDir: Runs/097922_CistemProtCTFFind > runMode: Continue > MPI: 1 > threads: 2 > Starting at step: 1 > Running steps > ESC[35mSTARTEDESC[0m: estimateCtfStep, step 1, time 2022-08-18 > 12:25:28.707054 > Estimating CTF of micrograph: 1 > > Estimating CTF of micrograph: 35874 > ^[[32m > /appl/soft/math/scipion/3.0.7/software/em/cistem-1.0.0-beta/ctffind << > eof > > Runs/097922_CistemProtCTFFind/extra/GridSquare_17902459_Data_FoilHole_$ > > Runs/097922_CistemProtCTFFind/tmp/mic_035874/GridSquare_17902459_Data_FoilHole_17908572_Data_17904056_17904058_20220320_195052_fractions_aligned_mic_DW.mrc > > Runs/097922_CistemProtCTFFind/extra/GridSquare_17902459_Data_FoilHole_17908572_Data_17904056_17904058_20220320_195052_fractions_aligned_mic_DW_ctf.mrc > 1.104100 > 300.000000 > 2.700000 > 0.100000 > 512 > 36.800000 > 2.570000 > 5000.000000 > 40000.000000 > 200.000000 > no > no > yes > 100.000000 > no > no > eof > > ^[[0m > Error trying to update output of protocol, tries=1 > Error trying to update output of protocol, tries=2 > Error trying to update output of protocol, tries=3 > Error trying to update output of protocol, tries=4 > Traceback (most recent call last): > File > "/appl/soft/math/scipion/3.0.7/.scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 470, in __tryUpdateOutputSet > outputSet.write() # Write to commit changes > File > "/appl/soft/math/scipion/3.0.7/.scipion3/lib/python3.8/site-packages/pyworkflow/object.py", > line 1165, in write > self._getMapper().commit() > File > "/appl/soft/math/scipion/3.0.7/.scipion3/lib/python3.8/site-packages/pyworkflow/mapper/sqlite.py", > line 762, in commit > self.db.commit() > sqlite3.OperationalError: database is locked > > On 29. Aug 2022, at 13.42, Grigory Sharov <sha...@gm...> > wrote: > > Hi Maria, > > Pablo is correct here. Could you please tell is what number of threads/mpi > you used and also what parameters you set on streaming tab. Are you running > the job on a cluster? Maybe you can attach here run.stderr? > > Grigory > > On Mon, Aug 29, 2022, 11:17 Pablo Conesa <pc...@cn...> wrote: > >> Thank you Maria! >> >> The error you are reporting may be unrelated with "CTFFind4 in Scipion". >> >> “sqlite3.OperationalError: database is locked" >> >> Is related, probably to concurrency. There are several factors that may >> cause this. >> >> *threads*: If you specify a high number of threads, many of them will >> process at the same time CTFs. This is desirable and we have made the >> effort to enable as many threads as available, but if this happens >> reducing the threads may help. >> >> *streaming*: Streaming processing implied many active protocols that >> favors concurrency. >> >> *filesystem*: A slow filesystem (e.g.: network filesystem mounted over a >> a slow ethernet connection) may provide locks. >> >> In theory, CTFfind and many others, should cope nicely with large >> datasets, or at least is being doing this for a long time without any >> issue. This may be a specific issue. >> >> On 29/8/22 9:07, Anastasina, Maria wrote: >> >> Dear Scipion users, >> >> I’m get an unexpected error when running CTFFind4 in Scipion 3.0.8 on a large (50K) set of micrographs. The protocol fails after processing about 70 % of micrographs with the error: “sqlite3.OperationalError: database is locked”. >> >> Sometimes re-launching in continuation mode helps to process few more micrographs, but the protocol will eventually fail again with the same error. Breaking the data set in smaller chunks (about 10K micrographs each) helps and the protocol ran in the same conditions succeeds for all the micrographs. >> >> I wonder does this indicate that CTFFind4 has problems handling large datasets and should be used on smaller datasets? Or is there possibly another problem that I am not noticing? I’m not attaching run.stdout as it is pretty large (31MB), if need I can upload it to eg grdive. >> >> Thank you, >> Maria >> >> >> >> >> _______________________________________________ >> scipion-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/scipion-users >> >> -- >> Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es/> team* >> _______________________________________________ >> scipion-users mailing list >> sci...@li... >> https://lists.sourceforge.net/lists/listinfo/scipion-users >> > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > > > > > _______________________________________________ > scipion-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/scipion-users > > -- > Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > |