From: mathog <ma...@ca...> - 2015-04-13 19:49:31
|
Hi, I have been running a bunch of MaSuRCA jobs, testing how well it does with different simulated library combinations on simulated homozygous and heterozgyous (4% difference) genomes using only Illumina data. The homozygous ones finish in about 8 hours. The one heterozygous one to complete to date took 29 hours. Almost all of the extra time was spent with cgw running single threaded, or at least it hovers at just around 100% CPU, which is pretty low on a 46 core machine. So the issue seems to be in the (modified) wgs assembler, not in the steps that MaSuRCA runs before that. An earlier experiment with real(ly bad) data saw cgw run for 29 _days_ before MaSuRCA crashed - almost all of that as cgw running on one core. Is there a way to speed cgw up? Some command line switch which has been omitted or set incorrectly? Here is the cgw command on the diploid test that is running now (which is currently showing 6 hours of cumulative CPU time, and has 8.4G resident): /opt/MaSuRCA/CA/Linux-amd64/bin/cgw -j 1 -k 5 -r 5 -s 2 -z -P 2 -B 167006 -m 100 -g /home/mathog/wgs_project/do_masurca_diploid/CA/genome.gkpStore -t /home/mathog/wgs_project/do_masurca_diploid/CA/genome.tigStore -o /home/mathog/wgs_project/do_masurca_diploid/CA/7-0-CGW/genome The "diploid" genome in this experiment is C elegans, with one wild type copy and one mutated copy. Thanks, David Mathog ma...@ca... Manager, Sequence Analysis Facility, Biology Division, Caltech |