Re: [Bio-bwa-help] Question about BWA 'aln -n' option

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Kathie  -

The bwa manpage, in describing the "-n" option, reads: 
"... or the fraction of missing alignments given 2% uniform 
base error rate ...".

So, just as you say, at a 2% uniform mismatch rate, the 
observed number of mismatches per sequence read is expected 
to follow a Poisson distribution with a mean of 0.8 mismatches 
per sequence read.  Therefore, about 1% of reads will contain 
4 or more mismatches, just by chance from Poisson random 
sampling.  These are the 1% of potential mappings which will 
be missed if one calls 'bwa aln -n 0.01'.  Is this 1% failure 
rate for read mapping acceptable, or do you want even more 
sensitivity ?  If so, then set -n lower than 0.01.  This 
just repeats what I said before.

 	 	 	 	 	 	-  tom blackwell  -

On Thu, 28 Oct 2010, Kathie Ngo wrote:

> Hi Tom,
> Thank you for the quick response. This answers my inquiry about why a higher
> float for -n would result in less # of mismatches. However the other
> question I have is how would I go about to determine 'x%'  of mismatches for
> different read lengths. The float option seemed a little misleading (now
> that I know its based on the fraction of potentially correctly mapped
> reads). For example I would like to allow a maximum of 2% mismatches per
> read, for a 40bp read it would be about 1 mismatch. Is there a way to get
> BWA to follow that? Thank you for all your help.
>
> Sincerely,
> Kathie
>
>
> On Thu, Oct 28, 2010 at 12:31 PM, Tom Blackwell <tb...@um...> wrote:
>
>> Kathie  -
>>
>> Perhaps one can interpret the "-n" floating point number close to zero as:
>>  the fraction of potentially correctly mapped reads that you are willing to
>> miss due to extremes in the Poisson random number of observed mismatches per
>> sequence read.  Smaller numbers produce a more nearly exhaustive search.
>>
>>                                                -  tom blackwell  -
>>
>>
>> On Thu, 28 Oct 2010, Kathie Ngo wrote:
>>
>>  Hello Everyone,
>>> I have some questions about BWA 'aln -n' option. I understand if you
>>> specify
>>> an 'int' you would get the 'int' maximum edit distance. However I do not
>>> understand how the algorithm works when you specify a float for the '-n'
>>> option. Can someone please explain this to me.
>>>
>>> When I used defaults for all options, I get this:
>>> [bwa_aln] 17bp reads: max_diff = 2
>>> [bwa_aln] 38bp reads: max_diff = 3
>>> [bwa_aln] 64bp reads: max_diff = 4
>>> [bwa_aln] 93bp reads: max_diff = 5
>>> [bwa_aln] 124bp reads: max_diff = 6
>>> [bwa_aln] 157bp reads: max_diff = 7
>>> [bwa_aln] 190bp reads: max_diff = 8
>>> [bwa_aln] 225bp reads: max_diff = 9
>>>
>>> If I use -n 0.02, I get this:
>>> [bwa_aln] 17bp reads: max_diff = 2
>>> [bwa_aln] 29bp reads: max_diff = 3
>>> [bwa_aln] 51bp reads: max_diff = 4
>>> [bwa_aln] 77bp reads: max_diff = 5
>>> [bwa_aln] 105bp reads: max_diff = 6
>>> [bwa_aln] 135bp reads: max_diff = 7
>>> [bwa_aln] 166bp reads: max_diff = 8
>>> [bwa_aln] 198bp reads: max_diff = 9
>>> [bwa_aln] 231bp reads: max_diff = 10
>>>
>>> If I use -n 0.15, I get this:
>>> bwa_aln] 17bp reads: max_diff = 1
>>> [bwa_aln] 35bp reads: max_diff = 2
>>> [bwa_aln] 67bp reads: max_diff = 3
>>> [bwa_aln] 102bp reads: max_diff = 4
>>> [bwa_aln] 140bp reads: max_diff = 5
>>> [bwa_aln] 178bp reads: max_diff = 6
>>> [bwa_aln] 218bp reads: max_diff = 7
>>>
>>> According to these results, it looks like the higher the percentage, the
>>> less mismatches are allowed? This is the opposite from what I understand
>>> from the manual. And how are these numbers calculated?
>>>
>>> Thank you!
>>>
>>> Sincerely,
>>>
>>> Kathie J Ngo
>>> kj...@uc...
>>>
>>>
>
>
> --
>