#9 SOLiD Color Space SNP calling rules


When using the Maq cns2cssnp command, the rules applied for calling single SNPs in color space are incorrect: too strict in one sense and not strict enough in another.

The current functionality discards many, many putative SNPs, and simultaneously allows a vaery small minority of incorrect SNPs through.

An excellent reference on color space SNP calling rules may be found here: http://marketing.appliedbiosystems.com/mk/submit/SOLID_KNOWLEDGE_RD?_JS=T&rd=ex

Specifically: in aux_utils.c (0.6.6) at line 120-121:
if (r[0] != 15 && r[1] != 15 && b[0] != 15 && b[1] != 15 && r[0] != b[0] && r[1] != b[1]
&& (b[0]&r[0]) == (b[1]&r[1]))

The final constraint (b[0]&r[0]) == (b[1]&r[1]) is too strict. What this says is that in order to be considered a candidate SNP, the bitwise and of colors at the first and second positions must be the same, e.g.

Allowing substitutions:

11 -> 22 1&2 == 1&2
48 -> 84 4&8 == 8&4

but NOT: 12 -> 48 1&4 != 2&8

However, that last example IS a permissable pair of color changes according to ABIs base transition color codes (see whitepaper linked above)...

This can be corrected by removing (b[0]&r[0]) == (b[1]&r[1]) from line 121 of aux_utils.

In this case, an additional constraint is needed further down to disallow these cases:

11 -> 23 1&2 == 1&3
12 -> 23 1&2 == 2&3

This can be accomplished by adding following constraint at line 127

(nst_nt16_count_table[b[0]] == nst_nt16_count_table[b[1]])

requiring both of the consensus positions to contain the same level of degeneracy (1 or 2 alleles, not one of each).

Also, since the consensus sequence often contains "degenerate codes" mapped to 4 bit codes with two bits "on" (corresponding to two alleles), additional verification is required (at some point) to ensure that the reads leading to the degenerate consensus comply with the color transition rules. For example:

11 -> 33 seems OK, since 11 in the reference is assumed to be 11 and 22 in the underlying reads...

And 12 -> 33 is assumed to be 12 and 21 in the reads.

But the degenerate codes "33" do not actually contain enough information to validate the Assumptions above.

For that, we must return to the reads mapping to a given position to ensure that they are the allowed color combinations. Problems with this should be infrequent, but high error rates and low coverage will conspire to occasionally create these situations.

Thanks for making a great toolset here, I understand that I may be amoung the first to try to make it fully work with SOLiD data.

Cheers, Vaughn
vsi [at] u.washington.edu


  • Nobody/Anonymous

    Logged In: NO

    Just To be clear, statements like 11 -> 33 above are meant to represent 8-bit hex numbers referring to the codes for two consecutive colors or degenerate calls, ie. 0x11 -> 0x33 -V

  • Nobody/Anonymous

    Logged In: NO

    On further testing, an additional constraint at line 127 of aux_utils.c is necessary to prevent degenerate codes that have 3 colors between them
    (e.g. KR -> G|T + A|G. Such combinations can never be valid consensus color pairs because all possible 2-pair combinations resulting from them are mutually exclusive under the colorspace single SNP rules. These cases can be detected with the following test:

    (nst_nt16_count_table[b[0]|b[1]] != 3)

    So the complete if statement at line 127 would now be:

    if ((c == 4 || c == 2) && (nst_nt16_count_table[b[0]] == nst_nt16_count_table[b[1]]) && (nst_nt16_count_table[b[0]|b[1]] != 3)) is_snp = 1;

    Thanks, -Vaughn

  • lh3

    lh3 - 2008-07-26

    Logged In: YES
    Originator: NO

    In fact, cns2cssnp is not recommended any more. The recommended way is to run "csmap2nt" after "map -c" and then run through "assemble", "cns2snp"... as if you are dealing with nucleotide alignments. Sorry, I should update documentation earlier. I will try to do this tomorrow.

    Anyway, this is a bug anyway and cns2cssnp might be still an alternative way to do SNP calling for SOLiD. I will fix this later after I release maq-0.6.8. Thank you for this.




Log in to post a comment.