Menu

#344 local mode alignments with N's in the reference

v0.9.0
open
nobody
local mode (1)
5
2016-01-07
2016-01-07
Hakan
No

Hello,

I have been using bowtie2 in various pipelines with extreme satisfaction and will continue to do so.

I found a subtle bug in the latest version (2.2.6, as of 07/01/2016) of bowtie2 in local mode. I thought you might want to know this. This wouldn't impact vast majority of the users but it is still worth noting in my humble opinion.

I first observed the problem on a real data set having around 20 million reads and 200 K fasta entries. But in my example here, I am providing you a very short and simple data set that I managed to reproduce the same problem with.

Below, I am explaining how I got the problem. Briefly, the source of the error is a reference coming from all N's in running bowtie2 in local mode. Though, in general, it does not make sense to have a reference of all N's, my reference is the output of a pipeline of mine that aligns a particular type of RNA where sequences are coming from an incomplete genome assembly. I couldn't find a complete genome assembly for that particular organism.

I hope this information will be useful to you.

Kind Regards,

Hakan Ozadam

         Setup:

I have two reference fasta files: ref_1_sequences.fa , ref_2_sequences.fa
They have 4 entries each. The only difference is that the sequence with the label "sequence_101" is made of all N's in ref_2_sequences.fa and the sequence having the same label, namely "sequence_101", in ref_1_sequences.fa, begins with a single A followed by N's. The rest of the sequences in both fasta files are exactly the same.

        Reference Preparation:

I made references as follows

bowtie2-build ref_1_sequences.fa ref_1
bowtie2-build ref_2_sequences.fa ref_2

Both reference builds complete successfully with no errors or warnings. Both of them produce reference *.bt2 reference files.

         Alignment:

We run bowtie2 in local mode. In both cases, we expect a successful alignment

bowtie2 --local -i S,1,0 -L 5 -R 20 -D 200 --mp 3,1 --rdg 0,2 --rfg 0,2 --ma 2 --score-min L,36,0 --no-unal -x reference_1/ref_1 -U sample.fastq -S alignment_1.sam

bowtie2 --local -i S,1,0 -L 5 -R 20 -D 200 --mp 3,1 --rdg 0,2 --rfg 0,2 --ma 2 --score-min L,36,0 --no-unal -x reference_2/ref_2 -U sample.fastq -S alignment_2.sam

          Observation:

Alignment against ref_2 gives no hits but alignment against ref_1 gives a hit on sequence_2 as expected. Please note that the only difference between the two references is "sequence_101", i.e., "sequence_2" is the same in both references. So I was expecting a mapping in the second alignment to sequence_2 as well.

ref_1_sequences.fa

sequence_1
GCGCGCGCAAAAAGCGCGC
sequence_101
ANNNNNNNNNNNNNNNNNN
sequence_2
GTGAGGCTGAAGTGTAATC
sequence_3
AGCTAGCTAGCTAGCTAAA

ref_2_sequences.fa

sequence_1
GCGCGCGCAAAAAGCGCGC
sequence_101
NNNNNNNNNNNNNNNNNNN
sequence_2
GTGAGGCTGAAGTGTAATC
sequence_3
AGCTAGCTAGCTAGCTAAA

sample.fastq

@read_1
AAAAAAAAAAGTGAGGCTGAAGTGTAATCAAAAAAAAAA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

alignment_1.sam

read_1 is mapped to sequence_2

@HD VN:1.0 SO:unsorted
@SQ SN:sequence_1 LN:19
@SQ SN:sequence_101 LN:19
@SQ SN:sequence_2 LN:19
@SQ SN:sequence_3 LN:19
@PG ID:bowtie2 PN:bowtie2 VN:2.2.6 CL:"/home/ho86w/bin/bowtie2-2.2.6/bowtie2-align-s --wrapper basic-0 --local -i S,1,0 -L 5 -R 20 -D 200 --mp 3,1 --rdg 0,2 --rfg 0,2 --ma 2 --score-min L,36,0 -x reference_1/ref_1 --passthrough -U sample.fastq"
read_1 0 sequence_2 1 22 10S19M10S * 0 0 AAAAAAAAAAGTGAGGCTGAAGTGTAATCAAAAAAAAAA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:38 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:19 YT:Z:UU

alignment_2.sam

No mapping

@HD VN:1.0 SO:unsorted
@SQ SN:sequence_1 LN:38
@SQ SN:sequence_2 LN:19
@SQ SN:sequence_3 LN:19
@PG ID:bowtie2 PN:bowtie2 VN:2.2.6 CL:"/home/ho86w/bin/bowtie2-2.2.6/bowtie2-align-s --wrapper basic-0 --local -i S,1,0 -L 5 -R 20 -D 200 --mp 3,1 --rdg 0,2 --rfg 0,2 --ma 2 --score-min L,36,0 -x reference_2/ref_2 --passthrough -U sample.fastq"

Discussion


Log in to post a comment.