Hi,
We have noticed a difference in the way Bowtie handles uridines in sequences supplied either on the command line (-c) or in a Fasta file (-f). In the former case Bowtie appears to interpret the uridines as thymines when comparing to a genomic index. However, when passed to Bowtie in a Fasta file the uridines seem to be deleted from the sequence prior to alignment (without a warning). Other ambiguous characters also appear to be behaving strangely in the Fasta file. If you replace N's with D's you get different alignments returned, while sequences containing D's return different alignments if supplied either with the -c and -f option.
eg.
bowtie -n 2 --chunkmbs 512 -a --best --strata human_63 --fullref -c GCUGCUUAACCAGUGGGG
0 + 14 dna:chromosome chromosome:GRCh37:14:1:107349540:1 59193928 GCTGCTTAACCAGTGGGG IIIIIIIIIIIIIIIIII 2 8:G>A
0 - 5 dna:chromosome chromosome:GRCh37:5:1:180915260:1 50110471 CCCCACTGGTTAAGCAGC IIIIIIIIIIIIIIIIII 2 0:A>C
0 - 19 dna:chromosome chromosome:GRCh37:19:1:59128983:1 4359753 CCCCACTGGTTAAGCAGC IIIIIIIIIIIIIIIIII 2 8:G>T
bowtie -n 2 --chunkmbs 512 -a --best --strata human_63 --fullref -f seq.fa
MCO_0015114 + 8 dna:chromosome chromosome:GRCh37:8:1:146364022:1 18358069 GCGCAACCAGGGGG IIIIIIIIIIIIII 4
MCO_0015114 + 8 dna:chromosome chromosome:GRCh37:8:1:146364022:1 7824425 GCGCAACCAGGGGG IIIIIIIIIIIIII 4
MCO_0015114 + 8 dna:chromosome chromosome:GRCh37:8:1:146364022:1 11985170 GCGCAACCAGGGGG IIIIIIIIIIIIII 4
MCO_0015114 + 15 dna:chromosome chromosome:GRCh37:15:1:102531392:1 41227211 GCGCAACCAGGGGG IIIIIIIIIIIIII 4
MCO_0015114 - 8 dna:chromosome chromosome:GRCh37:8:1:146364022:1 7200974 CCCCCTGGTTGCGC IIIIIIIIIIIIII 4
bowtie version 0.12.7
64-bit
Built on sycamore.umiacs.umd.edu
Tue Sep 7 17:19:06 EDT 2010
Compiler: gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)
Options: -O3 -m64 -Wl,--hash-style=both
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}
Thanks.
Thank you for reporting this. I will think about how best to handle it.
Best,
Ben