I have been using your aligners Bowtie and Bowtie2 for several years now and I find them excellent pieces of software. I noticed an anomaly with Bowtie2 (V:2.2.5) and wanted to report this.
It seems that the FASTQ header ID determines whether some reads map or not. Please see the attached files mini.fastq and mini2.fastq. Those files contain 1 read that differ only in terms of the header. When mapped against a bowtie2 index of mito.fa (also attached) the read in mini.fastq did not map, but the read in mini2.fastq did map.
Do you know why this is happening?
Many thanks.
Steven Wingett
I built my index:
bowtie2-build mito.fa
I ran the bowtie commands:
bowtie2 -x ./mito -U mini.fastq
bowtie2 -x ./mito -U mini2.fastq
Results:
bowtie2 -x ./mito -U mini.fastq
1 reads; of these:
1 (100.00%) were unpaired; of these:
1 (100.00%) aligned 0 times
0 (0.00%) aligned exactly 1 time
0 (0.00%) aligned >1 times
0.00% overall alignment rate
@HD VN:1.0 SO:unsorted
@SQ SN:MT1 LN:16338
@SQ SN:MT2 LN:16596
@SQ SN:MT3 LN:16569
@SQ SN:MT4 LN:16299
@SQ SN:MT5 LN:16313
@SQ SN:MT6 LN:85779
@SQ SN:MT7 LN:19431
@PG ID:bowtie2 PN:bowtie2 VN:2.2.5 CL:"/bi/apps/bowtie2/2.2.5/bowtie2-align-s --wrapper basic-0 -x ./mito -U mini.fastq"
HS3:608:C6LNLACXX:7:1206:3677:72107 4 * 0 0 * * 0 0 GGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCA CCCFFFFFHHFFHJJJIJJIIJJJJIJIIJJJJJJJIIJIJJJJJIJJJH YT:Z:UU
bowtie2 -x ./mito -U mini2.fastq
1 reads; of these:
1 (100.00%) were unpaired; of these:
0 (0.00%) aligned 0 times
1 (100.00%) aligned exactly 1 time
0 (0.00%) aligned >1 times
100.00% overall alignment rate
@HD VN:1.0 SO:unsorted
@SQ SN:MT1 LN:16338
@SQ SN:MT2 LN:16596
@SQ SN:MT3 LN:16569
@SQ SN:MT4 LN:16299
@SQ SN:MT5 LN:16313
@SQ SN:MT6 LN:85779
@SQ SN:MT7 LN:19431
@PG ID:bowtie2 PN:bowtie2 VN:2.2.5 CL:"/bi/apps/bowtie2/2.2.5/bowtie2-align-s --wrapper basic-0 -x ./mito -U mini2.fastq"
0.HS3:608:C6LNLACXX:7:1206:3677:72107 0 MT3 1 0 4M1I45M * 0 0 GGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCA CCCFFFFFHHFFHJJJIJJIIJJJJIJIIJJJJJJJIIJIJJJJJIJJJH AS:i:-23 XN:i:0 XM:i:3 XO:i:1 XG:i:1 NM:i:4 MD:Z:1A0T0C45 YT:Z:UU
This is because the pseudo-random number generator is initialized fresh for each read, and the seed is a function of a few things including the read name:
http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#randomness-in-bowtie-2
Since the pseudo-random generator affects the aligner's heuristics, it's not surprising that there will be cases where the read name affects whether or not the read aligns.
Best,
Ben