From: Petr D. <pd...@sa...> - 2015-07-09 10:17:28
|
Hi Arti, On Fri, 2015-07-03 at 10:13 -0400, Arti Tandon wrote: > I have been using gtcheck functionality of bcftools to calculate > genotype discordance between 2 samples, and used the -a option to > print this information at all sites. The command I used is: > > > ./gatk/bcftools/bcftools gtcheck -G 1 -g $file1.vcf.gz $file2.vcf.bgz > -a -S $samp1 > > > > > > And, the header header of the output file I get is below. How is the > column [6] calculated; The -a option simply displays the PL probability which corresponds to the -g genotype (thus the LKs are scaled to 1). > and why is it outputting the Query PLs and not the Query GT. gtcheck works primarily with PL likelihoods, the -G option is meant only as a workaround for cases where PL is not present. This is why Query PLs are printed, not GTs. If you need to make GT comparisons, take a look at the `bcftools stats --verbose -s -` command. > # [1]SC, Site by Site Comparison [2]Chromosome [3]Position > [4]-g alleles [5]-g GT (LP6005441-DNA_E02) [6]match log > LK [7]Query alleles [8-]Query PLs (LP6005441-DNA_E02) > > > Also, how does it calculate the genotype concordance, which sites does > it consider when doing this; it seems to be dropping quite a lot of > sites, which ones are those? All sites in the intersection of the two files are considered. Sites where the samples have missing genotypes are skipped. Also, only diploid genotypes are compared. This is because for the primary purpose of the tool (genotype checking) we can ignore X and Y. > Also, what is the expected discordance between 2 random samples? The value depends on the number of sites compared: with more sites one has more evidence, but also more noise. Imagine we have genotyped 1e6 sites and calculate the discordance. Then we subset to 1e3 sites. The absolute values will have changed significantly. One wants multiple samples to compare against and, ideally, all with about the same number of genotypes. I hope this helps Best wishes, petr > Thanks, > Arti > ------------------------------------------------------------------------------ > Don't Limit Your Business. Reach for the Cloud. > GigeNET's Cloud Solutions provide you with the tools and support that > you need to offload your IT needs and focus on growing your business. > Configured For All Businesses. Start Your Cloud Today. > https://www.gigenetcloud.com/ > _______________________________________________ Samtools-help mailing list Sam...@li... https://lists.sourceforge.net/lists/listinfo/samtools-help -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |