From: Sendu B. <sb...@sa...> - 2009-08-06 10:53:19
|
When you do something silly like merge a bam with itself, why doesn't an rmdup/MarkDuplicates see the duplicate reads as duplicates? And why do samtools and picard behave so differently in this case? For example, starting with a 2000 read bam file that both samtools and picard agree has 6 duplicate reads, merged with itself it becomes 4000 reads. Then on the merged bam: picard-tools MarkDuplicates marks 28 reads as duplicates. samtools rmdup gets rid of 1073 reads. I'd have naively expected 2006 reads to be seen as duplicates. Or perhaps 12. Or even 6 again. But those results just seem random? -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |