Re: [Bio-bwa-help] Question about BWA 'aln -n' option
Status: Beta
Brought to you by:
lh3lh3
From: Tom B. <tb...@um...> - 2010-10-28 20:28:40
|
Kathie - The bwa manpage, in describing the "-n" option, reads: "... or the fraction of missing alignments given 2% uniform base error rate ...". So, just as you say, at a 2% uniform mismatch rate, the observed number of mismatches per sequence read is expected to follow a Poisson distribution with a mean of 0.8 mismatches per sequence read. Therefore, about 1% of reads will contain 4 or more mismatches, just by chance from Poisson random sampling. These are the 1% of potential mappings which will be missed if one calls 'bwa aln -n 0.01'. Is this 1% failure rate for read mapping acceptable, or do you want even more sensitivity ? If so, then set -n lower than 0.01. This just repeats what I said before. - tom blackwell - On Thu, 28 Oct 2010, Kathie Ngo wrote: > Hi Tom, > Thank you for the quick response. This answers my inquiry about why a higher > float for -n would result in less # of mismatches. However the other > question I have is how would I go about to determine 'x%' of mismatches for > different read lengths. The float option seemed a little misleading (now > that I know its based on the fraction of potentially correctly mapped > reads). For example I would like to allow a maximum of 2% mismatches per > read, for a 40bp read it would be about 1 mismatch. Is there a way to get > BWA to follow that? Thank you for all your help. > > Sincerely, > Kathie > > > On Thu, Oct 28, 2010 at 12:31 PM, Tom Blackwell <tb...@um...> wrote: > >> Kathie - >> >> Perhaps one can interpret the "-n" floating point number close to zero as: >> the fraction of potentially correctly mapped reads that you are willing to >> miss due to extremes in the Poisson random number of observed mismatches per >> sequence read. Smaller numbers produce a more nearly exhaustive search. >> >> - tom blackwell - >> >> >> On Thu, 28 Oct 2010, Kathie Ngo wrote: >> >> Hello Everyone, >>> I have some questions about BWA 'aln -n' option. I understand if you >>> specify >>> an 'int' you would get the 'int' maximum edit distance. However I do not >>> understand how the algorithm works when you specify a float for the '-n' >>> option. Can someone please explain this to me. >>> >>> When I used defaults for all options, I get this: >>> [bwa_aln] 17bp reads: max_diff = 2 >>> [bwa_aln] 38bp reads: max_diff = 3 >>> [bwa_aln] 64bp reads: max_diff = 4 >>> [bwa_aln] 93bp reads: max_diff = 5 >>> [bwa_aln] 124bp reads: max_diff = 6 >>> [bwa_aln] 157bp reads: max_diff = 7 >>> [bwa_aln] 190bp reads: max_diff = 8 >>> [bwa_aln] 225bp reads: max_diff = 9 >>> >>> If I use -n 0.02, I get this: >>> [bwa_aln] 17bp reads: max_diff = 2 >>> [bwa_aln] 29bp reads: max_diff = 3 >>> [bwa_aln] 51bp reads: max_diff = 4 >>> [bwa_aln] 77bp reads: max_diff = 5 >>> [bwa_aln] 105bp reads: max_diff = 6 >>> [bwa_aln] 135bp reads: max_diff = 7 >>> [bwa_aln] 166bp reads: max_diff = 8 >>> [bwa_aln] 198bp reads: max_diff = 9 >>> [bwa_aln] 231bp reads: max_diff = 10 >>> >>> If I use -n 0.15, I get this: >>> bwa_aln] 17bp reads: max_diff = 1 >>> [bwa_aln] 35bp reads: max_diff = 2 >>> [bwa_aln] 67bp reads: max_diff = 3 >>> [bwa_aln] 102bp reads: max_diff = 4 >>> [bwa_aln] 140bp reads: max_diff = 5 >>> [bwa_aln] 178bp reads: max_diff = 6 >>> [bwa_aln] 218bp reads: max_diff = 7 >>> >>> According to these results, it looks like the higher the percentage, the >>> less mismatches are allowed? This is the opposite from what I understand >>> from the manual. And how are these numbers calculated? >>> >>> Thank you! >>> >>> Sincerely, >>> >>> Kathie J Ngo >>> kj...@uc... >>> >>> > > > -- > |