Re: [Bio-bwa-help] BWA and clipping
Status: Beta
Brought to you by:
lh3lh3
From: Heng Li <lh...@sa...> - 2010-08-19 19:49:04
|
Bwa will do Smith-waterman alignment for some unmapped reads and possibly trim the alignment. A feature rather than a bug. Just throw them away if you do not like. Heng On Aug 19, 2010, at 3:42 PM, Joseph Fass wrote: > That does sound like a bug to me, but hopefully others can comment ... > The part that may not be deterministic is the selection of one random alignment when there are multiple, equivalent, best alignments. It's an interesting question, actually ... for debugging purposes, it would be nice if it were pseudo-random (nicely distributed), but deterministic. Again, someone else (Heng?) will have to comment. > > ~Joe > > > > On Thu, Aug 19, 2010 at 12:10 PM, Dave Larson <dl...@ge...> wrote: > Hi Joe, > Thanks for the clarification, this helps narrow it down a bit. I'm definitely using aln, not bwa-sw as all of my data is from Illumina machines. But I'm clearly getting alignments with both ends trimmed. If what you say is correct, then this sounds like a bug to me. I suppose I may be able try one of the latest versions and see if this still happens. Should I expect deterministic behavior for BWA? Will the same reads yield the same alignments? I remembered maq allowed for the setting of a random number seed, but I do not see the same option for BWA. > > Dave > > > On 08/19/2010 01:53 PM, Joseph Fass wrote: >> Hi Dave, >> >> Heng should be able to speak more specifically to 0.5.5, but at least in current versions, here's the status: >> >> bwasw will soft clip based on the alignment ... in other words, it's a local (with respect to the read) aligner, and tries to place each chunk of your long read in the best place in the genome (and for each aligned chunk, bases outside the alignment are soft clipped). bwasw will not soft clip based on quality. >> >> aln, on the other hand, will only soft clip at the 3' end based on quality, but is otherwise a global (with respect to the read) aligner. >> >> So, if you know how you aligned your reads, that should tell you what the reason for the soft clipping was. If you don't know, then just look for soft-clipping at both ends of a CIGAR string ... that can't occur from using aln. >> >> ~Joe >> >> >> >> >> On Thu, Aug 19, 2010 at 10:55 AM, Dave Larson <dl...@ge...> wrote: >> Dear Heng, >> We've been primarily using bwa 0.5.5 to align our genomes over the >> last year. As I've been investigating various variant calls I keep >> noticing that some reads are being extensively soft-clipped. We did not >> request quality-based soft clipping for this data, so this was somewhat >> unexpected to me. I'm assuming that these are being soft clipped based >> on the alignment results. I was hoping to learn more about why this is >> happening. Is it a bug or a feature? Is the behavior still present in >> later versions of bwa? Will there be any way to distinguish >> alignment-based soft clipping from quality-based soft clipping? I have >> done my best to find an answer to this question online, but I have come >> up short. >> >> Thank you in advance, >> >> Dave >> >> -- >> --------------------------------------------------------------------------- >> GC Medical Genomics >> Washington University School of Medicine >> 4444 Forest Park Blvd, Box 8501 >> St. Louis, Missouri 63108 >> --------------------------------------------------------------------------- >> Office: 4113 >> Phone: +1 (314) 286-1814 >> Fax: +1 (314) 286-1810 >> --------------------------------------------------------------------------- >> >> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by >> >> Make an app they can't live without >> Enter the BlackBerry Developer Challenge >> http://p.sf.net/sfu/RIM-dev2dev >> _______________________________________________ >> Bio-bwa-help mailing list >> Bio...@li... >> https://lists.sourceforge.net/lists/listinfo/bio-bwa-help >> >> >> >> -- >> Joseph Fass >> Bioinformatics Programmer >> UC Davis Bioinformatics Core >> joseph.fass -at- gmail.com (professional) >> 970.227.5928 (c) || 530.752.2698 (w) > > > -- > --------------------------------------------------------------------------- > GC Medical Genomics > Washington University School of Medicine > 4444 Forest Park Blvd, Box 8501 > St. Louis, Missouri 63108 > --------------------------------------------------------------------------- > Office: 4113 > Phone: +1 (314) 286-1814 > Fax: +1 (314) 286-1810 > --------------------------------------------------------------------------- > > > > > -- > Joseph Fass > Bioinformatics Programmer > UC Davis Bioinformatics Core > joseph.fass -at- gmail.com (professional) > 970.227.5928 (c) || 530.752.2698 (w) > ------------------------------------------------------------------------------ > This SF.net email is sponsored by > > Make an app they can't live without > Enter the BlackBerry Developer Challenge > http://p.sf.net/sfu/RIM-dev2dev _______________________________________________ > Bio-bwa-help mailing list > Bio...@li... > https://lists.sourceforge.net/lists/listinfo/bio-bwa-help -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |