|
From: John M. <jm...@sa...> - 2016-10-10 08:38:07
|
On 10 Oct 2016, at 05:35, Juan Daniel Montenegro Cabrera <jdm...@gm...> wrote:
> I did a few test in my spare time. All samtools version from 0.1.19 have the same sorting problem, with or without the use of (-@) multiple threads. Version 0.1.18 is able to sort the file correctly, but is slower than sambamba, especially for really big bam files.
> I have a reduce unsorted bam file of ~500Mb that can be used to reproduce this issue.
Thanks for the sample file. Looking at the properly-sorted and badly-sorted 6B_concat reads, it turns out that the wrongly-processed ones are those that have positions greater than 2^30.
The problem is some code in samtools sort that was written back when chromosomes were limited to 2^29 bases, and that doesn't work for positions beyond 2^30. Having identified the outdated code, this will be easy to fix. Thanks for the bug report.
John
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
|