Menu

#35 Internal Evolutionary Model for BBMap(?)

1.0
open
nobody
None
2020-05-15
2020-05-15
J C
No

Hi,

I was wondering if there were details on the internal evolutionary model used by the alignment(match/substitution/gap) scores for BBMap. I know that standard aligners tend to use an evolutionary model based on Chapter 2.2 of Durbin et al's book.

To quickly summarize here, they determine the score for a match or mismatch pair as:
s(a,b) = log (p_ab / q_a q_b)
where s(a,b) is the score for two base pairs a and b and p_ab is the probability of observing them together if they shared common ancestry and q_a and q_b is the probability of observing the base pairs at random.

BBMap states that it assigns a score of +100 for each match and -127 for each mismatch. (Let's ignore gaps for ease of exposition). Thus, this implies that
s(a,a) = 100 = log(p_aa) - log(q_a q_a)
s(a,b) = -127 = log(p_ab) - log(q_a q_b) if a \neq b
It seems reasonable to assume q_a = q_b or at least very close, so we get:
s(a,a) - s(a,b) = 227 = log(p_aa/p_ab)
e^(227) = p_aa/p_ab

When I saw this number it shocked me because this implies that if we assume independence between base pairs in our evolutionary model (which I believe is implied by the additive nature of matches and mismatches in the alignment scores), then a perfectly aligned read is said to be e^227 times MORE likely than a read that is hamming distance 1 away from the reference irrespective of length.

This does not make intuitive sense to me, and I was wondering if you could shed some light as to what the underlying evolutionary model that is being used is, or if any of my calculations are incorrect.

Thanks!
Jeffrey

Discussion


Log in to post a comment.