|
From: Tim F. <tfe...@br...> - 2010-04-26 13:24:14
|
It will not work optimally - it will only detect duplicates for pairs where it has access to both ends. So if you split your files by chromosome then you'll essentially lose inter-chromosomal duplicate marking. -t On Apr 26, 2010, at 5:35 AM, Sendu Bala wrote: > On 21/04/2010 17:45, Tim Fennell wrote: >> Hi Feiyu, >> >> The algorithm probably does need describing somewhere in detail, but I >> don't believe I have anything handy. Essentially what it does (for >> pairs; single-end data is also handled) is to find the 5' coordinates >> and mapping orientations of each read pair. When doing this it takes >> into account all clipping that has taking place as well as any gaps or >> jumps in the alignment. You can thus think of it as determining "if >> all the bases from the read were aligned, where would the 5' most base >> have been aligned". It then matches all read pairs that have >> identical 5' coordinates and orientations and marks as duplicates all >> but the "best" pair. "Best" is defined as the read pair having the >> highest sum of base qualities as bases with Q>= 15. > > Am I right in thinking it will work correctly on a bam that has been split by chromosome? Or will something not work quite right if one read of a pair is missing because it mapped to a different chromosome? > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |