I have been using both GATK and Samtools for variant calling in
individual samples. Both these tools uses a bayesian approach to call
the genotypes but still produces slightly different variants. Is the
difference between the two algorithms arise because of prior
probabilities or likelihood calculation they take into account to
calculate the posterior probability? There exists a slight difference in
the likelihood model in modelling errors, is there also a difference in
priors? If there exists a difference only the Genotype of the variant
detected at a specific position should vary between the two algorithms,
but i have observed difference in total number of SNPs called by the two
methods. Could you explain where does this difference comes from?
In the article "Genotype and SNP calling from next-generation sequencing
data" it is mentioned that variant calling is done in two steps: SNP
calling followed by genotype calling for the called sites during SNP
calling. Since the genotype calling is done by similar Bayesain approach
in both GATK and samtools, does the SNP calling methods used in these
two tools produce different number of SNP calls?
If the difference is from SNP calling methods used in GATK and Samtools,
would you please suggest some source or give a short summary of the SNP
I have read the following papers to figure out the difference between
the algorithms. Unfortunately i am only able to find out the genotype
calling methods carried by both the algorithms but not the difference
to different number of SNP calls between the two.
1. Framework for variation discovery and genotyping from
next-generation DNA sequencing(GATK).
2. Mapping short DNA sequencing reads and calling variants using
mapping quality scores(Samtools)
3. The Genome Analysis Toolkit: a MapReduce framework for analyzing
next-generation DNA sequencing data
Could someone give your suggestions!!