I'm using BBMap to align 100-bp Illumina reads to very short (30nt to 99nt) reference sequences. I'm using BBMAP because it allows the reads to map while "overhanging" the reference sequence on one or both ends, something I've been unable to do with other tools.
I seem to have a problem. If I plot reference sequence length vs. the maximum length of any aligned read, I get a clear positive relationship between the two, until I hit my cap at full-length (100bp) reads:
At reference length 30, no read longer than 54bp aligns
At reference length 39, no read longer than 68bp aligns
At reference length 51, no read longer than 88bp aligns
At reference length 57, full-length (100bp) reads align
This relationship is so clear and consistent that it looks like an artifact of either a parameter, or of an intrinsic property of the alignment algorithm or the index.
Again, these are 100bp reads, so I would expect 100bp reads aligning to all those sequences -- the only reads aligning to my ref sequences of <57bp are truncated reads, so of course very few alignments.
Is this a reasonable use case for BBMap? Is there anything I can do to work around this apparent issue? If not, can you recommend another software tool that will handle my oddball use case?
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
BBMap is a global aligner and will penalize reads for going off the end of a reference contig. You can greatly reduce the penalty by adding the "local" flag when aligning, or set "minid=0" which will increase sensitivity across the board.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm using BBMap to align 100-bp Illumina reads to very short (30nt to 99nt) reference sequences. I'm using BBMAP because it allows the reads to map while "overhanging" the reference sequence on one or both ends, something I've been unable to do with other tools.
I seem to have a problem. If I plot reference sequence length vs. the maximum length of any aligned read, I get a clear positive relationship between the two, until I hit my cap at full-length (100bp) reads:
This relationship is so clear and consistent that it looks like an artifact of either a parameter, or of an intrinsic property of the alignment algorithm or the index.
Again, these are 100bp reads, so I would expect 100bp reads aligning to all those sequences -- the only reads aligning to my ref sequences of <57bp are truncated reads, so of course very few alignments.
Is this a reasonable use case for BBMap? Is there anything I can do to work around this apparent issue? If not, can you recommend another software tool that will handle my oddball use case?
Thanks!
Hi Damon,
BBMap is a global aligner and will penalize reads for going off the end of a reference contig. You can greatly reduce the penalty by adding the "local" flag when aligning, or set "minid=0" which will increase sensitivity across the board.