Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo


#276 bowtie2 -k n parameter doesnt report n matches when n exist

bowtie (175)
John Aach

I am using bowtie2's ability to map reads containing N characters to find genomic sequences with small numbers of mismatches against very short query sequences with a single N, e.g., TATATGTATGTANGG. The parameters I have been using are

-f -N 1 --n-ceil 1 -L 12 -i C,6 --dpad 0 --gbar 13 --end-to-end --mp 1,1 --np 0 --score-min C,-1 --no-sq --no-hd

With these parameters, end-to-end alignments with a single mismatch (not counting the N) have scores of -1. I extend these parameters with either -k 20 or with -a to examine the space of mismatching sequences against each query -- samples of up to 20 mismatches in the former case and comprehensive samples in the latter. At present I have processed ~754K such queries against a version of the S. cerevisiae genome.

Out of these ~754K, I have found 18 cases in which use of the -k 20 parameter appears not to find *any* 1bp mismatches, but the -a parameter finds large numbers. For instance, for the query sequence above, bowtie2 finds 1 exact match and 19 1bp mismatches, with alignment scores of 0 and -1 respectively (as given by the AS entry in the SAM output). With the -k 20 parameter, bowtie2 only finds the single exact match and *none* of the 1bp score mismatches, even though the -a bowtie2 output indicates that there are 19 1bp mismatches with score -1 in the genome, and -k value of 20 should instruct bowtie(2) to return them all.

I emphasize that for the vast majority of the ~754K queries I have processed, this does not happen. For these, bowtie2 -k 20 finds (at least a subset of) the 1bp mismatches also found by -a and reports them properly. Thus, it seems possible that bowtie(2) may have a bug in which scoring or counting of aligned targets is not properly computed in a small number of cases.

However, although I am reporting this as a possible bug, it is possible that this is just the way the -k parameter works. bowtie(2) documentation is clear that with the -k parameter bowtie(2) will report *at most* the indicated number of aligning targets consistent with scoring parameters. But documentation is not as clear whether bowtie(2) with -k n will necessarily report all n alignments consistent with scoring, if n alignments exist. If -k does behave in this latter fashion, I would request that bowtie(2) documentation be clarified accordingly.

The version of bowtie2 I am using is 2.0.6.


  • John Aach
    John Aach

    • summary: bowtie2 -k parameter misses matches --> bowtie2 -k n parameter doesnt report n matches when n exist