Hello,
I recently published a little NGS visualization software, COV2HTML http://mmonot.eu/COV2HTML and I received several bug reports from people mapping their data with Bowtie2. My software contains a program, MAP2COV, that convert mapping files and for some people with paired-reads an error occurs : the mapping position of a read was out of the reference sequence. However this error disappear when they change the mapping software.
I repeat the issue with published data http://sra.dnanexus.com/runs/SRR959071 on the reference genome Clostridium bolteae ATCC BAA-613 http://www.ncbi.nlm.nih.gov/genome/989?project_id=54523. My command line was : bowtie2 -q1 SRR959071_1.fastq -q2 SRR959071_2.fastq -p 10 -x Bolteae_INDEX --very-fast -S Bolteae.sam. I found the same error in my software as the read SRR959071.8141851.1 (101 bases) matched from position 6377280 to 6377381 which is out of the reference sequence (6377378 bp).
The read SRR959071.8141851.1 sequence is :
GAAAGGCTCCATAGCGACAGCAGAGACCGGATACGCCTTCAAGAACTGGACGAAGGACGGAGTGGTGGTAAGCTGGAATGCAGAGCTGAAGCCAGAGGACA
The end of Clostridium bolteae ATCC BAA-61 sequence is :
AAAGGGCTCCATAGCGACAGCAGAGACCGGATACGCCTTCAAGAACTGGACGAAGGACGGAGTGGTGGTAAGCTGGAATGCAGAGCTGAAGCCGGAGGA
I would like to know if a mapping position could be out of the reference sequence ? If yes I have to modify my software :)
Best Regards,
Marc
Dr Marc Monot - PhD - Institut Pasteur
Laboratoire Pathogenege des Bacteries Anaerobie
25 rue du docteur roux 75015 Paris
Phone / Fax. +33145688390 ; +33140613123
Webpage. http://www.mmonot.eu
Hi Marc,
When you are calculating the ending referencing coordinate of an alignment, are you taking gaps into account? E.g. there could be insertions in the read w/r/t the reference.
Best,
Ben