From: Pablo C. <pc...@cn...> - 2022-02-08 14:39:38
|
Hi Chistian, please see my thoughts in line. On 8/2/22 14:43, Christian Tüting wrote: > Hi Scipion Team, > > I have a few questions regarding project cleaning and space usage. > > (a) "clean" the data of a job, but keep the settings. > This is mainly for particle extraction jobs. These are often the jobs, that requires the most space, but the job in principle could be redo at any time, as you just need coordinates and box-size. In cryoSPARC, one can easily clean a job by "Clear Job", which deletes all the generated files, but keep the settings. Is something similar possible in scipion as well? Scipion protocols are often designed to save space. It is true that particle extraction is a step where "new images are generated" but, unless later on, you apply some "filtering/crop/resize" operations (these also are those kind generating new images), the new images should be reuse. It is true that we can clean them, but then nothing after this will work. All refinements, 2d classification etc should point to the same stacks produced at the extraction step. > > (b) identify redundant and obsolete jobs. > > Often multiple rounds of picking, extracting and 2D classification are done to get a good subset of particles. But the initial jobs, useful in the beginning, are often not needed for the later refinements. Is there a way of identifying jobs, that are potentially obsolete? Again, deleting them would be over the top, but cleaning these data as questioned in (a). Also, if a project is challenging, one might run multiple parallel jobs eventually creating redundancy, which blocks also space. Is there a easy way of analysing the workflow tree? I saw that there are some sqlite databases in the project folders, but it was not clear for me, how I can extract the tree from there, to see, if there are multiple/redundant jobs. We haven't done that effort, but you could select one protocol (let's say a final one with nice results), right click on it and "select to here"....protocols not selected didn't contribute to the final result and may be deleted. You could also identify a "dead branch" and delete that branch. If your project is one of those I've seen with several hundreds of protocols I can see this is more complicated. > > (c) Identify dead-end forks. > This goes hand-in-hand with (b). Is there a easy way of getting the entire tree from the project folder/database, to analyse it independently? Like getting the node information and use some python code, to get the information, without using the scipion gui (which is a bit laggy over ssh and x-forwarding). Scipion, has a strong almost 100% complete API. I'd say is possible but not trivial. Depends on your programming skills. We usually connect though a vnc server, but may be this is not possible in your case. > > Best > Christian > > > > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users -- Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* |