From: James B. <jk...@sa...> - 2017-03-17 16:18:35
|
On Fri, Mar 17, 2017 at 12:07:47PM +0900, Anton Kratz wrote: > many samtools commands expect the input to be either name-sorted or > coordinate-sorted, respectively. > > Is it guaranteed that the output of these commands is still in the same > sort-order as the input? I cannot think of any commands which change the sort order other than the obvious ones (samtools sort, collate). > I am doing a process along the lines of fixmate --> rmdup --> index --> > flagstat and I am wondering if I have to sort again before each > intermediate step. You will need some sort steps as not all of these processes take the same order. The manpage for fixmate claims to need name sorted data, but I *think* name collated is sufficient and samtools collate would do as a faster alternative. I say "think" because the man page definitely implies this isn't true, but maybe it's just a historical quirk. Other parts of your pipeline require position sorted data, eg samtools index. If you're starting from aligner output then there is high probability it's already in name collated order anyway. (It may vary by aligner though, so check.) Note it's worth remembering within a unix pipe that using uncompressed BAM is typcially the fastest interchange, as you'll typically only want to do compression at the final stage. James -- James Bonfield (jk...@sa...) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |