Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#251 -trim5 bases are reported as "M"atching in CIGAR

pending
nobody
None
5
2012-12-14
2012-10-02
Ian Davis
No

I'm using Bowtie 2.0.0-beta7 on 64-bit Ubuntu Linux. I have single-end Illumina reads with an 11bp adapter on the 5' end of each read. I passed "--trim5 11" to Bowtie. In the output SAM file, the full (untrimmed) read sequence is reported. The 12th base (first base of real data) is aligned with the correct location in the genome, but the SAM record has the 11 adapter bases aligned with the 11 bases preceeding that in the genome, even though they don't match.

This would be OK if the CIGAR string reported those 11 leading bases as hard- or soft-masked (I'm not sure of the difference), but it doesn't -- they are reported as matching ("M") the genome at those positions! To me, this is extremely surprising behavior to say the least! It seems like a bug, but if it was intentional, I'd be very interested in the rationale. It seems like either excluding the trimmed bases from the output altogether or modifying the CIGAR string to mark the trimmed bases as masked would be an appropriate fix.

Thanks!

Discussion

  • 301 Moved Permanently I was recommended this blog by my cousin. Im not sure whether this post is written by him as nobody else know such detailed about my problem. Youre incredible! Thanks! your article about 301 Moved Permanently Best Regards Rolf Lisa
    <a href="http://www.madville.com/blogs/708886_wedding_dress_selection_hen_house_to_construct_a_far_more_perfect_figure" title="Wedding Dress">Wedding Dress</a>

     
  • So many writers today dont take pride in their work the way you obviously do. Thank you for your dedication to excellent writing and creating this wonderful content. Its as if you read my mind.
    <a href="http://www.deinekollegen.de/blog.php?user=duanemcmahan718&blogentry_id=5339575" title="Crimson">Crimson</a>

     
  • Ben Langmead
    Ben Langmead
    2012-12-14

    Hi Ian,

    I am having trouble re-creating this. When I use --trim5, the bases are omitted from the SAM record. I.e. the SEQ field contains the trimmed read sequence. Can you tell me exactly what version you're using, and perhaps provide some example data?

    Best,
    Ben

     
  • Ben Langmead
    Ben Langmead
    2012-12-14

    • status: open --> pending