Re: [Bio-bwa-help] [Samtools-help] contam/adatpers
Status: Beta
Brought to you by:
lh3lh3
From: Benjamin B. <be...@gm...> - 2009-11-24 22:28:29
|
Hi Heng, Adapter dimers are pretty rare in high quality libraries from whole- genomic DNA preps. They can be much more common however when there are problems with library construction, especially in ChIP-seq, RNA-seq and other methods where they might be abundant and not properly cleaned up. So in the general case, it would be good if some methods were available for contamination filtering. ben. On Nov 24, 2009, at 2:23 PM, Heng Li wrote: > We used to discuss this for the published NA07340-X data. Adapter > filtering is a bit complicated. Suppose the last two bases on a read > come from the adapter. We are unsure if they are by simply aligning > the > read to the adapter as 2bp is too short. When we align the read, the > 2bp > are likely to be mismatches, which potentially causes false SNPs if > contamination is a serious problem. Maq does two rounds of adapter > filtering. In the first round, it identifies long matches without > alignment and in the second round short matches to the adapter which > are > mismatches in the alignment. I know this sounds a bit messy, but I > could > not think of a clean way to do this. > > In the real world, fortunately, my experience is contaminations are > rare > at least for data produced at Sanger Institute. I have not used > adapter > trimming for quite some time. > > To Ben: there is not existing tools in samtools for adapter filtering. > > Heng > > On Tue, Nov 24, 2009 at 03:55:52PM -0500, Goncalo Abecasis wrote: >> Should it be the case that the possible contaminants only had up to >> very >> little sequence compared to the human reference? In that case, it >> shouldn't >> matter much for Maq or Bwa. >> >> Gon?alo >> >> >>> -----Original Message----- >>> From: Benjamin Berman [mailto:be...@gm...] >>> Sent: Tuesday, November 24, 2009 3:44 PM >>> To: bio...@li...; Samtools- >>> he...@li... help >>> Subject: [Samtools-help] contam/adatpers >>> >>> >>> We are moving from Maq to a BWA->SAM pipeline (Illumina GA). Is >>> there >>> any code in these projects to pre-filter adapter dimer sequences? >>> I've traditionally done it in one of two ways - either pre-filter >>> sequence files using regular expressions for the various adapter >>> sequence, or include the contam fasta file along with the reference >>> genome for alignments. For Maq, this could be problematic because >>> it >>> could slow down the aligner. But maybe for BWA it's fine. >>> >>> Any suggestions from people using BWA would be helpful. >>> >>> Thanks! >>> ben. >>> >>> ----------------------------------------------------------------------- >>> ------- >>> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 >>> 30-Day >>> trial. Simplify your report design, integration and deployment - and >>> focus on >>> what you do best, core application coding. Discover what's new with >>> Crystal Reports now. http://p.sf.net/sfu/bobj-july >>> _______________________________________________ >>> Samtools-help mailing list >>> Sam...@li... >>> https://lists.sourceforge.net/lists/listinfo/samtools-help >>> >> >> >> >> ------------------------------------------------------------------------------ >> Let Crystal Reports handle the reporting - Free Crystal Reports >> 2008 30-Day >> trial. Simplify your report design, integration and deployment - >> and focus on >> what you do best, core application coding. Discover what's new with >> Crystal Reports now. http://p.sf.net/sfu/bobj-july >> _______________________________________________ >> Samtools-help mailing list >> Sam...@li... >> https://lists.sourceforge.net/lists/listinfo/samtools-help > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. |