Hello,
I have been using bowtie2 in various pipelines with extreme satisfaction and will continue to do so.
I found a subtle bug in the latest version (2.2.6, as of 07/01/2016) of bowtie2 in local mode. I thought you might want to know this. This wouldn't impact vast majority of the users but it is still worth noting in my humble opinion.
I first observed the problem on a real data set having around 20 million reads and 200 K fasta entries. But in my example here, I am providing you a very short and simple data set that I managed to reproduce the same problem with.
Below, I am explaining how I got the problem. Briefly, the source of the error is a reference coming from all N's in running bowtie2 in local mode. Though, in general, it does not make sense to have a reference of all N's, my reference is the output of a pipeline of mine that aligns a particular type of RNA where sequences are coming from an incomplete genome assembly. I couldn't find a complete genome assembly for that particular organism.
I hope this information will be useful to you.
Kind Regards,
Hakan Ozadam
Setup:
I have two reference fasta files: ref_1_sequences.fa , ref_2_sequences.fa
They have 4 entries each. The only difference is that the sequence with the label "sequence_101" is made of all N's in ref_2_sequences.fa and the sequence having the same label, namely "sequence_101", in ref_1_sequences.fa, begins with a single A followed by N's. The rest of the sequences in both fasta files are exactly the same.
Reference Preparation:
I made references as follows
bowtie2-build ref_1_sequences.fa ref_1
bowtie2-build ref_2_sequences.fa ref_2
Both reference builds complete successfully with no errors or warnings. Both of them produce reference *.bt2 reference files.
Alignment:
We run bowtie2 in local mode. In both cases, we expect a successful alignment
bowtie2 --local -i S,1,0 -L 5 -R 20 -D 200 --mp 3,1 --rdg 0,2 --rfg 0,2 --ma 2 --score-min L,36,0 --no-unal -x reference_1/ref_1 -U sample.fastq -S alignment_1.sam
bowtie2 --local -i S,1,0 -L 5 -R 20 -D 200 --mp 3,1 --rdg 0,2 --rfg 0,2 --ma 2 --score-min L,36,0 --no-unal -x reference_2/ref_2 -U sample.fastq -S alignment_2.sam
Observation:
Alignment against ref_2 gives no hits but alignment against ref_1 gives a hit on sequence_2 as expected. Please note that the only difference between the two references is "sequence_101", i.e., "sequence_2" is the same in both references. So I was expecting a mapping in the second alignment to sequence_2 as well.
sequence_1
GCGCGCGCAAAAAGCGCGC
sequence_101
ANNNNNNNNNNNNNNNNNN
sequence_2
GTGAGGCTGAAGTGTAATC
sequence_3
AGCTAGCTAGCTAGCTAAA
sequence_1
GCGCGCGCAAAAAGCGCGC
sequence_101
NNNNNNNNNNNNNNNNNNN
sequence_2
GTGAGGCTGAAGTGTAATC
sequence_3
AGCTAGCTAGCTAGCTAAA
@read_1
AAAAAAAAAAGTGAGGCTGAAGTGTAATCAAAAAAAAAA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@HD VN:1.0 SO:unsorted
@SQ SN:sequence_1 LN:19
@SQ SN:sequence_101 LN:19
@SQ SN:sequence_2 LN:19
@SQ SN:sequence_3 LN:19
@PG ID:bowtie2 PN:bowtie2 VN:2.2.6 CL:"/home/ho86w/bin/bowtie2-2.2.6/bowtie2-align-s --wrapper basic-0 --local -i S,1,0 -L 5 -R 20 -D 200 --mp 3,1 --rdg 0,2 --rfg 0,2 --ma 2 --score-min L,36,0 -x reference_1/ref_1 --passthrough -U sample.fastq"
read_1 0 sequence_2 1 22 10S19M10S * 0 0 AAAAAAAAAAGTGAGGCTGAAGTGTAATCAAAAAAAAAA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:38 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:19 YT:Z:UU
@HD VN:1.0 SO:unsorted
@SQ SN:sequence_1 LN:38
@SQ SN:sequence_2 LN:19
@SQ SN:sequence_3 LN:19
@PG ID:bowtie2 PN:bowtie2 VN:2.2.6 CL:"/home/ho86w/bin/bowtie2-2.2.6/bowtie2-align-s --wrapper basic-0 --local -i S,1,0 -L 5 -R 20 -D 200 --mp 3,1 --rdg 0,2 --rfg 0,2 --ma 2 --score-min L,36,0 -x reference_2/ref_2 --passthrough -U sample.fastq"