From: Brian W. <th...@gm...> - 2015-04-14 00:27:03
|
That was one of the pieces we could never effectively thread, due to a large quantity of thread-unsafe code. There was some work starting in CA8.0 to better prune the merges attempted, enabled with cgwMergeFilterLevel=2 (or 5, but scaffolding suffers). You can also increase the minimum number of mate pairs required for a scaffold join with cgwMinMergeWeight=2 (the default). This equates to cgw opiton -minmergeweight. I don't think masurca supports either though. With manual intervention, you can stop scaffolding at any time and move on to the next algorithmic step. The process is hopefully described adequately at: http://wgs-assembler.sourceforge.net/wiki/index.php/Scaffolder_failure under 'force it to recompute' about half way down the page. The summary is to kill the existing cgw, edit the 7-0-CGW/*timing file to change the '(logical ckp*)' to '(logical ckp05-1SM)', and restart. This forces cgw to restart from the algorithmic step after the one it is stuck on. b On Mon, Apr 13, 2015 at 3:49 PM, mathog <ma...@ca...> wrote: > Hi, > > I have been running a bunch of MaSuRCA jobs, testing how well it does > with different simulated library combinations on simulated homozygous > and heterozgyous (4% difference) genomes using only Illumina data. The > homozygous ones finish in about 8 hours. The one heterozygous one to > complete to date took 29 hours. Almost all of the extra time was spent > with cgw running single threaded, or at least it hovers at just around > 100% CPU, which is pretty low on a 46 core machine. So the issue seems > to be in the (modified) wgs assembler, not in the steps that MaSuRCA > runs before that. An earlier experiment with real(ly bad) data saw cgw > run for 29 _days_ before MaSuRCA crashed - almost all of that as cgw > running on one core. > > Is there a way to speed cgw up? Some command line switch which has been > omitted or set incorrectly? Here is the cgw command on the diploid test > that is running now (which is currently showing 6 hours of cumulative > CPU time, and has 8.4G resident): > > /opt/MaSuRCA/CA/Linux-amd64/bin/cgw -j 1 -k 5 -r 5 -s 2 -z -P 2 -B > 167006 -m 100 -g > /home/mathog/wgs_project/do_masurca_diploid/CA/genome.gkpStore -t > /home/mathog/wgs_project/do_masurca_diploid/CA/genome.tigStore -o > /home/mathog/wgs_project/do_masurca_diploid/CA/7-0-CGW/genome > > The "diploid" genome in this experiment is C elegans, with one wild type > copy and one mutated copy. > > Thanks, > > David Mathog > ma...@ca... > Manager, Sequence Analysis Facility, Biology Division, Caltech > > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live > exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- > event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > |