|
From: Kate Im <kat...@gm...> - 2014-05-27 12:16:50
|
Thank you so much for clearing that up. I was unaware that was the convention when only one paired end is mapped. On Tue, May 27, 2014 at 6:56 AM, John Marshall <jm...@sa...> wrote: > On 27 May 2014, at 10:58, Wolfgang Maier < > wol...@bi...> wrote: > > On 22.05.2014 20:59, Kate Im wrote: > >> the number of unmapped read (estimated by > >> subtracting the reported number of mapped reads from the reported number > >> of total reads) is always higher than the number of sequences with an > "*" > >> in the third column of the SAM file. Shouldn't these be the same? > > > > Ideally, yes, but the SAM/BAM format specifications > > (http://samtools.github.io/hts-specs/SAMv1.pdf) say that: > > > > "Bit 0x4 [in the FLAG field] is the only reliable place to tell whether > > the segment is unmapped. > > In particular (see §2, 4.1 of that document), there is the common > convention for pairs in which just one end is mapped, of giving both reads > the RNAME and POS (3rd and 4th) columns of the mapped end. This has the > useful side-effect of bringing the unmapped end alongside its mate when the > file is coordinate-sorted. > > John > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > |