|
From: Errbii, M. <me...@un...> - 2019-11-12 09:31:30
|
Dear all,
I have been encountering a weird behaviour from the het function in vcftools. In short, the used vcf file was generated using GATK 4 and it contains 612482 SNPs. Using the het function I generated a het file that looks like this:
INDV O(HOM) E(HOM) N_SITES F
leiden3 15768 43812.0 116271 -0.38703
leiden6 17628 43812.0 116068 -0.36238
My question is why vcftools —het function ignore such a huge number of SNPs (almost 500000 SNPs) ? I checked on the archive of the mailing list and I found that this behaviour could be due to the fact that "the --het function ignores sites that are not biallelic, sites that are not diploid and also ignores any sites where the alternate allele does not appear”. I have checked my vcf file and it doesn’t contain any of these type of variant. Here is my log file:
VCFtools - 0.1.17
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--vcf leiden_biallelic.recode.vcf
--het
--out leiden_biallelic_het
Warning: Expected at least 2 parts in FORMAT entry: ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
Warning: Expected at least 2 parts in FORMAT entry: ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
Warning: Expected at least 2 parts in FORMAT entry: ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
Warning: Expected at least 2 parts in FORMAT entry: ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
After filtering, kept 2 out of 2 Individuals
Outputting Individual Heterozygosity
After filtering, kept 600773 out of a possible 600773 Sites
Run Time = 2.00 seconds
Sorry if this question was already addressed and I missed it. I tried to find something on the forums and I was unfortunate.
Thank you very much for such a nice (and very useful tool)
Best,
Simo
___________________________________________
Mohammed Errbii “Simo"
Molecular Evolution and Sociology Group (Gadau lab)
Institute for Evolution & Biodiversity
University of Münster
Hüfferstr. 1
DE-48149 Münster
Germany
+49 (0) 251 83 21659
https://www.uni-muenster.de/Evolution/molevolsocbio/
|