From: Heng Li <lh...@sa...> - 2011-07-26 17:24:11
|
On Jul 26, 2011, at 1:00 PM, James Robinson wrote: > Heng, what's your opinion on deduping small genomes (such as viruses), where you will also get really deep coverage. I think it depends on the applications. For discovering variants, having high de-dup false positive rate should not hurt. Actually both samtools and GATK may downsample alignments during SNP calling. Heng > > > Jim > > On Tue, Jul 26, 2011 at 12:44 PM, Heng Li <lh...@sa...> wrote: > =I would think de-duplicating mostly does harm than good. For RNA/ChIP-seq, we mainly focus on the read depth. As long as the duplicate rate is uniform, the mean relative depth is not affected. The variance will be larger, but I guess the additional variance caused by duplicates should not be larger than sample prep. Of course, duplicate rate is probably context dependent, but this is not much different from the GC bias. > > RNA/ChIP-seq may have tons of coverage. De-dupping will lead to very high false positive rate and the rate is non-uniform. Enzymatic shearing makes it worse. > > Heng > > On Jul 26, 2011, at 11:54 AM, Ryan Golhar wrote: > > > I suppose for sonication, that would be correct. For enzymatic shearing, the enzymes cut at the same position so its not quite random. That would make the molecules all start at the same coordinate. > > > > I guess using MarkDups is dependent on the shearing protocol used. > > > > On Tue, Jul 26, 2011 at 11:12 AM, Alec Wysoker <al...@br...> wrote: > > Hi Ryan, > > > > I'm probably out of my depth with respect to library prep. Don't transcripts typically get randomly sheared up before PCR, so that chances that you have two distinct molecules pre-PCR that map to the same coordinate at both ends is pretty low? > > > > -Alec > > > > > > On 7/26/11 11:06 AM, Ryan Golhar wrote: > >> Thanks Alec. Wouldn't this remove copies of the same transcript? It seems possible to have reads with the same sequence, just different copies of the transcript. > >> > >> On Tue, Jul 26, 2011 at 11:01 AM, Alec Wysoker <al...@br...> wrote: > >> Hi Ryan, > >> > >> We are running MarkDuplicates on RNA-seq data. I don't think there is any reason to assume there won't be the same PCR and optical duplication issues that exist for DNA (but I'm just a s/w developer, not a biologist). The usual caveats apply that MarkDuplicates works much better on paired reads. > >> > >> -Alec > >> > >> > >> On 7/26/11 10:53 AM, Ryan Golhar wrote: > >>> Hi all - I'm analyzing some RNA-Seq data and was wondering about running MarkDups on it to remove duplicate reads. Does this seem like something reasonable to do or is MarkDups not necessary for RNA-Seq data? > >>> > >>> ------------------------------------------------------------------------------ > >>> Magic Quadrant for Content-Aware Data Loss Prevention > >>> Research study explores the data loss prevention market. Includes in-depth > >>> analysis on the changes within the DLP market, and the criteria used to > >>> evaluate the strengths and weaknesses of these DLP solutions. > >>> > >>> http://www.accelacomm.com/jaw/sfnl/114/51385063/ > >>> > >>> _______________________________________________ > >>> Samtools-help mailing list > >>> > >>> Sam...@li... > >>> https://lists.sourceforge.net/lists/listinfo/samtools-help > >> > >> > >> ------------------------------------------------------------------------------ > >> Magic Quadrant for Content-Aware Data Loss Prevention > >> Research study explores the data loss prevention market. Includes in-depth > >> analysis on the changes within the DLP market, and the criteria used to > >> evaluate the strengths and weaknesses of these DLP solutions. > >> > >> http://www.accelacomm.com/jaw/sfnl/114/51385063/ > >> > >> _______________________________________________ > >> Samtools-help mailing list > >> > >> Sam...@li... > >> https://lists.sourceforge.net/lists/listinfo/samtools-help > > > > ------------------------------------------------------------------------------ > > Magic Quadrant for Content-Aware Data Loss Prevention > > Research study explores the data loss prevention market. Includes in-depth > > analysis on the changes within the DLP market, and the criteria used to > > evaluate the strengths and weaknesses of these DLP solutions. > > http://www.accelacomm.com/jaw/sfnl/114/51385063/_______________________________________________ > > Samtools-help mailing list > > Sam...@li... > > https://lists.sourceforge.net/lists/listinfo/samtools-help > > > ------------------------------------------------------------------------------ > Magic Quadrant for Content-Aware Data Loss Prevention > Research study explores the data loss prevention market. Includes in-depth > analysis on the changes within the DLP market, and the criteria used to > evaluate the strengths and weaknesses of these DLP solutions. > http://www.accelacomm.com/jaw/sfnl/114/51385063/ > _______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |