Re: [Bio-bwa-help] BWA allowing too many mismatches?
Status: Beta
Brought to you by:
lh3lh3
From: Kris W <kri...@gm...> - 2011-01-31 21:22:11
|
OK, thanks again, Joe. I'm going to run a script to pick out all the reads that were mapped this way (have the XT:A:M tag) and see if that accounts for all the reads with large edit distances... On Sun, Jan 30, 2011 at 6:03 PM, Joseph Fass <jos...@gm...> wrote: > Hi Kris, > I'm sure there's a cutoff for the Smith-Waterman alignment score, but I > don't know what it is (or if it depends on length, e.g.). I don't know of a > way to turn it off, but I think the "XT:A:M" tag will reliably indicate that > the read was aligned using Smith-Waterman after being anchored by its mate. > You could selectively edit those SAM records if you wanted to disqualify > those reads? > ~Joe > > > On Sun, Jan 30, 2011 at 1:26 PM, Kris W <kri...@gm...> wrote: > >> Hi Joe, >> Thanks for replying. Well, the mate is mapped with 2 mismatches. Is this >> the default behavior: if one mate maps within the mismatch threshold, the >> other mate is mapped with no threshold at all? If that is the default >> behavior, is there a way to turn it off? Do you know where I can find the >> documentation for this? >> >> Here is the mapping of the mate, if that is helpful. (But keep in mind >> that 100% or our reads mapped): >> >> IPAR1:4:104:13763:17273:1#0 99 chr10 42597783 15 >> 76M =42597861 142 >> CAAATGGATTTATCGAATGCAATCGAATGGAATTATCGAATGCAATCGAATAGAATCATCGAATGGACTCGAATGG >> GGBGGGGGGGIIIIIHIHIIIIIIIIIIIIIIIGIIIIIIIIIIIHIIIGHIHHHFHFHIIIIHHGGGGHGFGEEC >> XT:A:U NM:i:2 SM:i:15 AM:i:15 X0:i:1 X1:i:3 XM:i:2 XO:i:0 XG:i:0 >> MD:Z:8A57G9 >> XA:Z:chr10,+42395979,76M,3;chr10,+42390905,76M,3;chr10,+42599159,76M,3; >> >> >> >> On Sun, Jan 30, 2011 at 9:11 AM, Joseph Fass <jos...@gm...>wrote: >> >>> Hi Kris, >>> Unmapped reads can be "anchored" by their mapped mates for -more >>> permissive- Smith-Waterman alignment. Does that explain it for the read you >>> show? >>> ~Joe >>> On Jan 30, 2011 6:25 AM, "Kris W" <kri...@gm...> wrote: >>> > After aligning paired-end Illumina reads with BWA (see bottom for >>> details), >>> > I'm finding reads that are mapped with a high number of mismatches - >>> higher >>> > than should be allowed, I think. >>> > >>> > Here is an example of one of these reads (one end of a paired read): It >>> is >>> > 76 bp long and so - if I understand correctly - should not be allowed >>> to >>> > have more than 5 mismatches: >>> > @IPAR1:4:104:13763:17273:1#0/3 >>> > >>> CGATTGTATTCGATGGTGATTGATTTCGAGTCCATGGATTAATCCATTGCATTCCATTCGGTGANNNNNNNNNNNN >>> > + >>> > IIIIIIIIIIIIIIHIHEHIHGIIHIHIIGHIIHIEECEE?BBB@EEEEEIIIIIIIIIH >>> ?=A############# >>> > >>> > However, BWA mapped it with 16 mismatches: >>> > IPAR1:4:104:13763:17273:1#0/3 147 chr10 42597861 15 >>> > 12S64M = 42597783 -142 >>> > >>> NNNNNNNNNNNNTCACCGAATGGAATGCAATGGATTAATCCATGGACTCGAAATCAATCACCATCGAATACAATCG >>> > #############A=?HIIIIIIIIIEEEEE@BBB >>> ?EECEEIHIIHGIIHIHIIGHIHEHIHIIIIIIIIIIIIII >>> > XT:A:M NM:i:16 SM:i:15 AM:i:15 XM:i:16 XO:i:0 XG:i:0 >>> > MD:Z:3T11G6A2G0A1A3A1T0A4T0G0G5T3T4G0G5 >>> > >>> > Here are the details of how I used BWA. I tried this with both versions >>> > 0.5.6 and 0.5.9 (RHEL5, 64 bit) and got the same results: >>> > >>> > 1. index the reference genome: (this step was only done once - with >>> > version 0.5.6) >>> > bwa index -a bwtsw <reference fasta (hg19 complete)> >>> > >>> > 2. make one sai file from each fastq file: >>> > bwa aln <reference fasta> <reads1.fq> > <reads1.sai> >>> > bwa aln <reference fasta> <reads2.fq> > <reads2.sai> >>> > >>> > 3. output alignment to sam file: >>> > bwa sampe <reference fasta> <reads1.sai> <reads2.sai> <reads1.fq> >>> > <reads2.fq> > <alignment.sam> >>> > >>> > (I used defaults for all parameters.) >>> > >>> > After converting the SAM file to BAM, sorting it, and removing PCR >>> > duplicates (all with samtools), the "samtools flagstat" command shows >>> that >>> > 100% of the reads are mapped. >>> > >>> > Is this a bug, or am I doing something wrong, or have I misunderstood >>> what >>> > should happen with the default parameters? The output from running "bwa >>> > aln" includes these lines, and I'm assuming "max_diff" means the >>> maximum >>> > edit distance allowed for a read to be mapped: >>> > [bwa_aln] 17bp reads: max_diff = 2 >>> > [bwa_aln] 38bp reads: max_diff = 3 >>> > [bwa_aln] 64bp reads: max_diff = 4 >>> > [bwa_aln] 93bp reads: max_diff = 5 >>> > [bwa_aln] 124bp reads: max_diff = 6 >>> > [bwa_aln] 157bp reads: max_diff = 7 >>> > [bwa_aln] 190bp reads: max_diff = 8 >>> > [bwa_aln] 225bp reads: max_diff = 9 >>> > >>> > Thanks for any help! >>> > >>> > -Kris >>> >> >> > > > -- > Joseph Fass > Bioinformatics Programmer > UC Davis Bioinformatics Core > joseph.fass -at- gmail.com (professional) > 970.227.5928 (c) || 530.752.2698 (w) > |