|
From: Keiran R. <kr...@sa...> - 2016-08-02 14:09:09
|
Hi Martin, Adding 'calmdnmrecompindetonly=1' will increase performance further as it will only recompute the MD/NM values if the reference section has ambiguity/N within the span of the reads. Regards, Keiran Raine Principal Bioinformatician Cancer Genome Project Wellcome Trust Sanger Institute kr...@sa... Tel:+44 (0)1223 834244 Ext: 7703 Office: H104 > On 2 Aug 2016, at 14:47, Martin MOKREJŠ <mmo...@gm...> wrote: > > Keiran Raine wrote: >> For BAM in/out yes: >> >> inputthreads=<[1]> : input helper threads (for inputformat=bam only, default: 1) >> outputthreads=<[1]> : output helper threads (for outputformat=bam only, default: 1) > > bamsort fixmates=1 calmdnm=1 calmdnmreference="$reference" blockmb=40960 inputthreads=8 outputthreads=8 level=9 I="$sample".realignedtogether.BQSR.namesorted.bam O="$sample".realignedtogether.BQSR.namesorted.fixmate.calmd.bam > > The above takes about 3 cores during input and since much later it starts writing output it takes 8 cores. Maybe just because of the extreme output compression only. But definitely, it outperformed "samtools clamd" step doing half of the work (just MD: tag calculations). Actually, processing the whole file took maybe 2 minutes in total? "samtools calmd" ran out of wallclock time limit at 12hrs on a cluster node (running on a single core). > > Thank you for pointing me to bamsort, I added biobambam2 with libmaus2 to my Gentoo Linux recently (is in science overlay now), so it was simple to call it. > > Martin -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |