Hi there,
I am using vcftools to evaluate pi in two different populations of the same species. Both of them were genotyped with the same SNP dataset. Therefore, the SNPs list is the same in both of them. I used a window size of 100 kb with a step of 10 kb. While i was comparing the values of pi in the two populations (just a ratio), i realized that the windows for some chromosomes were not the same in the two populations. In particular, the pop 2 missed some windows (i.e: in pop 2 the windows 9100001-9200000, 9110001-9210000 are absent).
pop 1
CHROM BIN_START BIN_END N_VARIANTS PI
6 9010001 9110000 3 9.27E-01
6 9020001 9120000 3 9.27E-01
6 9030001 9130000 3 9.27E-01
6 9040001 9140000 4 1.25E-01
6 9050001 9150000 4 1.25E-01
6 9060001 9160000 3 9.78E-01
6 9070001 9170000 3 9.78E-01
6 9080001 9180000 3 9.78E-01
6 9090001 9190000 3 9.78E-01
6 9100001 9200000 1 3.18E-01
6 9110001 9210000 1 3.18E-01
6 9120001 9220000 2 7.12E-01
6 9130001 9230000 2 7.12E-01
pop 2
CHROM BIN_START BIN_END N_VARIANTS PI
6 9010001 9110000 3 6.88E-01
6 9020001 9120000 3 6.88E-01
6 9030001 9130000 3 6.88E-01
6 9040001 9140000 3 6.88E-01
6 9050001 9150000 3 6.88E-01
6 9060001 9160000 2 5.39E-01
6 9070001 9170000 2 5.39E-01
6 9080001 9180000 2 5.39E-01
6 9090001 9190000 2 5.39E-01
6 9120001 9220000 1 3.68E-01
6 9130001 9230000 1 3.68E-01
Herwith the code i have used:
vcftools --vcf pop1.vcf --window-pi 100000 --window-pi-step 10000 --chr 6 --out pop1_pi_chr6
vcftools --vcf pop2.vcf --window-pi 100000 --window-pi-step 10000 --chr 6 --out pop2_pi_chr6
Is there some errors i have done? If not, is there some way to have the same windows in both populations?
Thanks
Annarita
Hi Annarita,
For the window-pi function of vcftools, the results from a bin are only written out if there is at least one polymorphic site within that window. The files may be different between your two populations because one vcf file contains only homozygous reference genotypes within that specific window.
Hi,
yes i already realized that the excluded bin have monomorphic SNPs
yestarday after checking the vcf files. So in order to compare the two
populations i just put that those bin have pi=0.
Thanks a lot.
Annarita
2015-06-10 20:06 GMT+02:00 Anthony Marcketta amarcket@users.sf.net:
--
Annarita Marrano, Ph.D. student
Foundation Edmund Mach
Research and Innovation Centre
Genomics and Biology of Fruit Crops
Grapevine Applied Genomics
Via Mach 1,
38010 San Michele all'Adige (Trento)
Italy
Related
Bugs: #57
Hi there,
I am having a similar issue to the above with the --window-pi setting in VCF-tools, where bins are being skipped: however, in this case, I am sure that there are relevant SNPs in the skipped windows.
My data set consists of individuals that have been sorted into different groups: In one case, all individuals were merged into a file called All.merged.vcf. The second case merged VCF files from only individuals from a specific geographic site, into a file called AM.merged.vcf. Worth noting that AM.merged.vcf is a subset of All.merged.vcf.
My commands are as follows:
vcftools --vcf All.merged.vcf --window-pi 50000 --out Run1
vcftools --vcf AM.merged.vcf --window-pi 50000 --out Run2
However, when I look at the results, the All.merged.vcf result skips over the first million bases in the chromosome:
cat Run1.windowed.pi | grep "ChrX" | head
Bm_v4_ChrX_scaffold_001 1050001 1100000 3 3.06977e-05
Bm_v4_ChrX_scaffold_001 1700001 1750000 1 8.11839e-06
Bm_v4_ChrX_scaffold_001 1950001 2000000 1 9.19662e-06
Bm_v4_ChrX_scaffold_001 7800001 7850000 1 4.8203e-06
Bm_v4_ChrX_scaffold_001 10550001 10600000 2 2.04651e-05
Bm_v4_ChrX_scaffold_001 10800001 10850000 2 2.04651e-05
These bases are not, however, skipped in the AM.merged.vcf result:
cat Run2.windowed.pi | grep "ChrX" | head
Bm_v4_ChrX_scaffold_001 1 50000 1 5e-06
Bm_v4_ChrX_scaffold_001 50001 100000 1 1.07143e-05
Bm_v4_ChrX_scaffold_001 150001 200000 1 1.14286e-05
Bm_v4_ChrX_scaffold_001 200001 250000 1 1.14286e-05
Bm_v4_ChrX_scaffold_001 400001 450000 1 1.07143e-05
Bm_v4_ChrX_scaffold_001 450001 500000 1 5e-06
I can definitely confirm that the All.merged.vcf file has valid SNPs in the first million bases (in fact, through other means I can confirm that the SNP density is quite high there), so I am wondering if I am entering the command wrong, and why the tool might be skipping those buckets in one case but not the other, when the second is a subset of the first?
I can provide the vcf-merge commands to create those files, if that might be a cause of the issue, but other tools seem to work fine with the vcf files (specifically ldepth.mean).
Thank you for your help!
John