|
From: Juan D. M. C. <jdm...@gm...> - 2016-10-10 05:00:31
|
Hi, You can download the unsorted bam file from here: https://cloudstor.aarnet.edu.au/plus/index.php/s/p90bYldJoqE5Fbv Regards, Juan Montenegro 2016-10-10 14:35 GMT+10:00 Juan Daniel Montenegro Cabrera < jdm...@gm...>: > Dear John, > > I did a few test in my spare time. All samtools version from 0.1.19 have > the same sorting problem, with or without the use of (-@) multiple threads. > Version 0.1.18 is able to sort the file correctly, but is slower than > sambamba, especially for really big bam files. > I have a reduce unsorted bam file of ~500Mb that can be used to reproduce > this issue. How would you like me to send it to you? > > Regards, > > Juan Montenegro > > 2016-10-07 23:07 GMT+10:00 John Marshall <jm...@sa...>: > >> On 7 Oct 2016, at 06:24, Juan Daniel Montenegro Cabrera < >> jdm...@gm...> wrote: >> > >> > samtools view -bh@ 15 in.sam | samtools sort -T tmp -@ 15 -o >> out.sorted.bam - >> > >> > When I try to index the sorted bam file it complains: >> > >> > samtools index out.sorted.bam >> > [E::hts_idx_push] NO_COOR reads not in a single block at the end 16 -1 >> > samtools index: "SORT.0.bam" is corrupted or unsorted >> > >> > when I check the last lines of the file, it effectively has mapped >> reads at the end, instead of the unmapped reads. The last 194 records in >> the sorted bam file are mapped reads and they come right after the unmapped >> reads: >> > >> > tail -n 195 SynOpDH_0.sam | head -n 2 >> > SRR1170581.75133375 141 * 0 0 * * >> 0 0 CAACATAAATTTGGCACACAAATAGTTCTC >> CATTAACCCTTTTAGTAAAAAGAGTAGAATCTATTTTCCAATTTCAAAGCCTTTTTCAAT >> GAGGAACTTGGTTAAGCATTTATAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGAT >> CCCFFFFFHHHHHJJHJJJJJJHIIHIJJJJJJJJJJJJJJJJJJHIIJJHIIIBBFHII >> JJJJJJJJJJIJJJJJJJHIHHHHHFFFFFFEEEEDDDDDDDDDDCDDCDDEFEDDEDDD >> DDBDDDB@BDBDACDDBCDDDDC>C>BCDA YT:Z:UP >> > SRR1170581.75025193 133 6B_concat 1073997887 0 >> * = 1073997887 0 AGGGTAGTAGCATTGCCCCTTCTCTCTTTT >> TCTCTCATTTTTTTGTTTTATCTTTTTTTGGGGGGGCCCTCTATTTTTTTGGCCTCTTTT >> TTTTCGTCCGGAGTCTCAACCCGACTTGTGGGGGAATCATAGTCTCCATCATCCTTTCCT >> BBCFDDFFHHHHHJJJJJJJJIIIJJJJJJJIIIJJGIIJJJJJJJJJJGFHIJJJJJHF >> FDDDDDDDDDDDCDEEEDDDD@CDDDDDDDDDDDCBBDBBBB@BCDDEDDDDD> >> BBDCCCDBD@9BBDDDCDDEEDDCDDDDDDDCCDCC YT:Z:UP >> > >> > In total there are 4561428 reads that map to the 6B_concat reference, >> but for some reason these 194 reads keep appearing at the end of the sorted >> file. >> >> These reads are the sequence number 16 (i.e. 6B_concat) reads following >> the really-unmapped reads that "NO_COOR reads not in a single block at the >> end 16 -1" is complaining about, and they really are not sorted. >> >> > Any ideas why this might be happening? >> >> To figure out what's going on here, it would be very helpful if you were >> able to provide us with access to the sort input file, so we can try to >> reproduce this. >> >> In the meantime, please try removing '-@ 15' from the sort command and >> sorting with just one thread. I am grasping at straws here, but it would >> be interesting to see whether the problem persists in this case. >> >> John >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a >> company registered in England with number 2742969, whose registered >> office is 215 Euston Road, London, NW1 2BE. >> > > |