I have noticed an odd occurrence when running Bowtie and I am not sure if it is intentional. I have read sequences which contain ambiguous characters that are not N (i.e. K to represent G or T).
The problem is that when I run bowtie on these read sequences, the sequence column in the SAM file changes the ambiguous character (i.e K) into A everytime.
Is there a way to suppress this?
Yes, with the exception of 'N' bowtie does not take into consideration ambiguous characters in places where sequence operations are required. However, this behavior looks somehow like a bug to me and afaik it was not intended to behave as such. I will consult with our team and let you know if there is a bowtie2 specific work around besides preprocessing the reads before and after passing them to bowtie2.
thanks,
Val
We fixed this in the next release. Transforming to As was clearly a mistake. The unknown chars should be translated into Ns, as bowtie2 will consider them Ns anyway for the purpose of alignment.