Does anybody know what the abbreviations 'SNR' and 'PKD' in Staden's HTML mutation report mean?
Good question. I think it's "peak drop" and "signal to noise ratio", but what exactly they are measuring is trickier still to answer.
This wasn't code I wrote (and the author has long since gone to do other things), but it's all there in the code somewhere... So I took a look.
SNR is defined in decibels, so it's 20 * log10 (signal/noise). Signal is defined to be the highest peak and noise is the second highest. (For real heterozygotes it's really a matter of determining whether that "noise" is really a secondary signal). Homozygous differences should have a high SNR value while you'd expect heterozygous ones to have low SNR values.
PKD appears to be computed by dividing the peak height in suspected heterozygous base by the peak height of the corresponding base in the reference. This is after normalising for the average peak height across traces. We expect this to be not too far from 0.5 (0.2-0.7 are the default parameters used in Pregap4 for the peak drop threshold).
The PKD is probably one of the more important factors. In a heterozygous base changing from C to CT (or example) you may not know what the expected height of the T peak would be, but if the wildtype has a C then you do know that the amplitude of the C in a CT mixture should drop relative to the C when not in a mixture. This is much more powerful than just looking at the SNR, which is more or less what phred scores will be reflecting (and phred averages over multiple bases, making it a bad idea for point mutation detection).
It may be possible to callibrate these figures into a single confidence value if you have enough data of where the real SNPs are, but it's not something I've tried to do.
James Bonfield (email@example.com)
A Staden Package developer: https://sourceforge.net/projects/staden/