|
From: Sendu B. <sb...@sa...> - 2010-04-26 09:35:59
|
On 21/04/2010 17:45, Tim Fennell wrote: > Hi Feiyu, > > The algorithm probably does need describing somewhere in detail, but I > don't believe I have anything handy. Essentially what it does (for > pairs; single-end data is also handled) is to find the 5' coordinates > and mapping orientations of each read pair. When doing this it takes > into account all clipping that has taking place as well as any gaps or > jumps in the alignment. You can thus think of it as determining "if > all the bases from the read were aligned, where would the 5' most base > have been aligned". It then matches all read pairs that have > identical 5' coordinates and orientations and marks as duplicates all > but the "best" pair. "Best" is defined as the read pair having the > highest sum of base qualities as bases with Q>= 15. Am I right in thinking it will work correctly on a bam that has been split by chromosome? Or will something not work quite right if one read of a pair is missing because it mapped to a different chromosome? -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |