|
From: Anthony M. <ant...@ei...> - 2014-05-07 13:43:29
|
Hi Debora,
This discrepancy in site numbers may be because the --het function ignores sites that are not biallelic and sites that are not diploid. Please update vcftools to the latest version (v0.1.12a) because the program will now print a warning message if it ignores any sites because of these reasons.
Let me know if you need any further assistance.
--
Anthony Marcketta
Bioinformatics Analyst
Department of Genetics
Albert Einstein College of Medicine
Van Etten B01
Bronx, New York 10461
________________________________________
From: Débora Brandt [deb...@gm...]
Sent: Tuesday, May 06, 2014 5:39 PM
To: vcf...@li...
Subject: [Vcftools-help] --het option excluding sites
Dear all
When using the --het option to calculate per-individual heterozygosity I noticed that the total number of sites shown in the .het file (the N_SITES column) is not equal to the number of sites in my vcf file (see log and het files below).
I wondered if it was because of missing data, but when I ran --missing on my vcf file, I didn't get any missing data at all.
Does anyone know which sites are being filtered out for the heterozygosity calculations?
Thank you a lot!
Debora
---
Here's my log file (*11205* sites in vcf file after my filters):
VCFtools - v0.1.11
(C) Adam Auton 2009
Parameters as interpreted:
--gzvcf chr22.vcf.gz
--het
--out chr22_het
--positions exonic_positions_chr22.txt
--remove-indels
Using zlib version: 1.2.7
Reading Index file.
File contains 494328 entries and 1092 individuals.
Applying Required Filters.
Filtering sites by allele type
Filtering sites by include/exclude positions files
After filtering, kept 1092 out of 1092 Individuals
After filtering, kept 11205 out of a possible 494328 Sites
Outputting Individual Heterozygosity
Run Time = 234.00 seconds
---
And the beggining of the .het file (*10668* sites included here; N_SITES is the same for all individuals):
INDV O(HOM) E(HOM) N_SITES F
HG00096 10404 10296.4 10668 0.28965
HG00097 10359 10296.4 10668 0.16857
HG00099 10345 10296.4 10668 0.13090
|