|
From: Rabail Z. <rab...@gm...> - 2017-08-10 08:38:11
|
Hi, Ive downloaded vcf files from 1000 genomes and the INFO column is reordered like in the screenshot attached. command: *tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr20.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz <ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr20.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz> 20:6767838-7367838 -c NA18525 -c NA18526 -c NA18528.....* However, when i manually download the file from browser, the INFO column is like: AA=T|||;AC=78;AF=0.0163738;AFR_AF=0.059;AMR_AF=0.0058;AN=1322;DP=101591;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;SF=0,1,2,3,4,5,6;VT=SNP when i apply a command for these two differently patterned files for minor allele frequency cutoff 0.05, the number of SNPs resulting are not the same. command: *vcftools --vcf input.vcf --recode --recode-INFO-all --maf 0.05 --out output.vcf* it is 2136 for the screenshot file and 1757 for the manually downloaded file. 1757 is the correct number though as it is also resulting with the vcf_to_ped converter script over 1000 genomes for the same frequency cutoff. What is possibly wrong? I cannot manually download all the files. <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free. www.avg.com <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> |