|
From: Dharmendra G. <dha...@eg...> - 2018-06-06 19:35:03
|
Hello, I am working on Pigs whole genome date from several different species and at the same time they belong to several different generations(G0, G1, G2, G3, G4), we are trying to calculate how inbred these pigs are? This will also tell us how close these pigs are genetical to each other. For this I am trying VCFtools OUTPUT OTHER STATISTICS --het option, which does say it calculate heterozygosity but what we get as an output is the calculation of observed vs expected homozygous and inbreeding coefficient. vcftools --vcf input_file.vcf --het --out output_het If you don’t mind can you please let us know if this option --het application on VCF file is the best way to understand how inbred and close genetically particular animals are using variants VCF? Or do you have any other suggestions/advice to get this number? Another question on -het output, What we get as inbreeding coefficient number range from -0.62854 to 0.67488. The number closer to positive 1 is more inbred than the number which goes to zero or below. Is negative number indicated toward heterogeneity in the samples. Another question on N_sites, In VCF file we have more than several million SNPs and Indels (approx. 12 million) but we only see close to one and half million in N_sited, do you have any idea how the N_sites number come from and how the N_sited are filtered from the input VCF files. Please also point to the right reference you used for --het calculation that will be very helpful. Thanks Dharm -- *Dharmendra Goswami, PhD * Computational Biologist 300 Technology Square, Third Floor, Suite 301 Cambridge, MA 02139 Nearest Parking - Technology Square Garage 595 Technology Square [image: egen sig.jpg] O: (617) 941-7591 <(617)%20941-7491> C: (601) 291-0829 <(617)%20941-7491> www.egenesisbio.com |