Hi - I am trying to align very hort reads to a very short reference. An average read might be 25 and an average reference 25.
I am running into an issue with alignments, where due to the gap penalty and extension penalty currently used, bases are misassigned as mismatches rather than the aligned read having deletions. As an example:
Reference:
CCAAGTTTATGACGAGC
Reads:
1. CAAGTTTATGACGAGC
2. CCAAGTTTATGACGAC
3. CCAAGTTTATGACGC
Cigar strings:
1. 1=1D15=
2. 15=1X
3. 14=1X
In this case I am essentially certain (due to the nature of the experiment) that the final nucleotide of 2. and 3. is a correct match to the final base, with preceeding deletions. I.e I think the Cigar strings should be:
I can't see an option to change the penalty and have been unable to locate where to do this (if it is possible) in the code.
https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbmap-guide/ This link leads me to believe it is possible in the discussion on Minratio and Minid.
Any help would be much appreciated. If I am crazy and doing this with the wrong tool and in the wrong way, any direction would also be appreciated.
Many thanks!
I have a probably related "issue" where mismatches to "real bases" are preferred over matches to "NNNN", something like:
ACGTNNNNNNGGC (reference)
ACGT––––––AT– (observed)
ACGTAT––––––– (expected)
So, I would need to change the gap extension penalty or some other parameters, which I don't see to be accessible in the user interface. (I have worked around it by deleting the 3' end for the moment.)
However, I'm not sure if these parameters are at all accessible since BBMap uses a "convex gap penalty" model https://www.biostars.org/p/205960/.