Just have a simple reference of 25 (A)s. In base space I have two reads: one that is 25 (A)s and the other is 25 (T)s. The output is correct where the one with the (T)s maps to the negative strand.
[dyermd@foshtdtrain01 bowtie]$ ./bowtie -fa test test_reads.fasta
read1 + entry 0 AAAAAAAAAAAAAAAAAAAAAAAAA IIIIIIIIIIIIIIIIIIIIIIIII 0
read2 - entry 0 AAAAAAAAAAAAAAAAAAAAAAAAA IIIIIIIIIIIIIIIIIIIIIIIII 0
However, when I set up the two string in color space
>read1
T3000000000000000000000000
>read2
T0000000000000000000000000
The program returns that each string hits both the positive an negative strand
[dyermd@foshtdtrain01 bowtie]$ ./bowtie -Cfa test_c test_reads.csfasta
read1 + entry 1 AAAAAAAAAAAAAAAAAAAAAAA qqqqqqqqqqqqqqqqqqqqqqq 0
read1 - entry 1 AAAAAAAAAAAAAAAAAAAAAAA qqqqqqqqqqqqqqqqqqqqqqq 0
read2 + entry 1 AAAAAAAAAAAAAAAAAAAAAAA qqqqqqqqqqqqqqqqqqqqqqq 0
read2 - entry 1 AAAAAAAAAAAAAAAAAAAAAAA qqqqqqqqqqqqqqqqqqqqqqq 0
Each read really has a single alignment, read1 to the positive strand and read2 to the negative strand
When aligning in colorspace, Bowtie chops off the primer base and the first color (see manual). The primer is removed because it's not a genomic base - it's just the primer, and it's the same regardless of where the read originated in the genome. The color is removed because one of the two bases it encodes is the primer - so the same logic applies. The second color is the first one that encodes a pair of genomic bases, so that's the first one to keep. Your result is expected given that the primer and first color are removed.
Thanks,
Ben
View and moderate all "feature-requests Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Feature Requests"
View and moderate all "feature-requests Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Feature Requests"
You correctly described the read structure, but there is still an issue with the mapping. The first base comes from the final base of the P1 adaptor and as you state the first color is composed of that base and the first base of your genomic sequence. Working from my two examples and using the 2BE rules if you were to strip that P1 adaptor base you would have
>read1
A000000000000000000000000
>read2
T000000000000000000000000
If you were to translate these now back to base space it would be a string of A's and string of T's. Give a reference which is a string of A's you would expect read 1 to map to the positive strand and read 2 to the negative strand. I think the issue arises from the fact that the reference is A0000000........ and the reverse compliment of any sequence in color space is just the reverse of the string, i.e., the negative strand would be a string of 0's as well.
It looks as if the mapping is ignoring the leading base and just looking at the color space values to do the mapping, thus both positive and negative strands are 0's and thus each read is being reported as mapped to both strands.
View and moderate all "feature-requests Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Feature Requests"
Basically any palindromic color space sequence will be incorrectly mapped to both strands. Need to consider the second base of the first color (first base of the genomic sequence) when mapping to avoid this issue.
"It looks as if the mapping is ignoring the leading base and just looking
at the color space values to do the mapping, thus both positive and
negative strands are 0's and thus each read is being reported as mapped to
both strands."
Correct - that's exactly what is happening and it's by design. You're right that we could be using information about the primer a first base to "break ties" between otherwise identical alignments. It would help a little in resolving repeats, as long as you trust that the first base is correct and that the one additional base of information is sufficient to break the tie. But that's a little doubtful and, either way, it's really a feature request rather than a bug
I'll move this to a feature request section.
Thanks,
Ben