Menu

#57 window-pi vcftools

v1.0_(example)
open
nobody
None
1
2017-03-29
2015-06-09
annarita
No

Hi there,
I am using vcftools to evaluate pi in two different populations of the same species. Both of them were genotyped with the same SNP dataset. Therefore, the SNPs list is the same in both of them. I used a window size of 100 kb with a step of 10 kb. While i was comparing the values of pi in the two populations (just a ratio), i realized that the windows for some chromosomes were not the same in the two populations. In particular, the pop 2 missed some windows (i.e: in pop 2 the windows 9100001-9200000, 9110001-9210000 are absent).

pop 1
CHROM BIN_START BIN_END N_VARIANTS PI
6 9010001 9110000 3 9.27E-01
6 9020001 9120000 3 9.27E-01
6 9030001 9130000 3 9.27E-01
6 9040001 9140000 4 1.25E-01
6 9050001 9150000 4 1.25E-01
6 9060001 9160000 3 9.78E-01
6 9070001 9170000 3 9.78E-01
6 9080001 9180000 3 9.78E-01
6 9090001 9190000 3 9.78E-01
6 9100001 9200000 1 3.18E-01
6 9110001 9210000 1 3.18E-01
6 9120001 9220000 2 7.12E-01
6 9130001 9230000 2 7.12E-01

pop 2
CHROM BIN_START BIN_END N_VARIANTS PI
6 9010001 9110000 3 6.88E-01
6 9020001 9120000 3 6.88E-01
6 9030001 9130000 3 6.88E-01
6 9040001 9140000 3 6.88E-01
6 9050001 9150000 3 6.88E-01
6 9060001 9160000 2 5.39E-01
6 9070001 9170000 2 5.39E-01
6 9080001 9180000 2 5.39E-01
6 9090001 9190000 2 5.39E-01
6 9120001 9220000 1 3.68E-01
6 9130001 9230000 1 3.68E-01

Herwith the code i have used:
vcftools --vcf pop1.vcf --window-pi 100000 --window-pi-step 10000 --chr 6 --out pop1_pi_chr6
vcftools --vcf pop2.vcf --window-pi 100000 --window-pi-step 10000 --chr 6 --out pop2_pi_chr6

Is there some errors i have done? If not, is there some way to have the same windows in both populations?

Thanks

Annarita

Related

Bugs: #57

Discussion

  • Anthony Marcketta

    Hi Annarita,

    For the window-pi function of vcftools, the results from a bin are only written out if there is at least one polymorphic site within that window. The files may be different between your two populations because one vcf file contains only homozygous reference genotypes within that specific window.

     
    • annarita

      annarita - 2015-06-10

      Hi,

      yes i already realized that the excluded bin have monomorphic SNPs
      yestarday after checking the vcf files. So in order to compare the two
      populations i just put that those bin have pi=0.

      Thanks a lot.

      Annarita

      2015-06-10 20:06 GMT+02:00 Anthony Marcketta amarcket@users.sf.net:

      Hi Annarita,

      For the window-pi function of vcftools, the results from a bin are only
      written out if there is at least one polymorphic site within that window.
      The files may be different between your two populations because one vcf
      file contains only homozygous reference genotypes within that specific
      window.


      Status: open
      Group: v1.0_(example)
      Created: Tue Jun 09, 2015 08:10 PM UTC by annarita
      Last Updated: Tue Jun 09, 2015 08:10 PM UTC
      Owner: nobody

      Hi there,
      I am using vcftools to evaluate pi in two different populations of the
      same species. Both of them were genotyped with the same SNP dataset.
      Therefore, the SNPs list is the same in both of them. I used a window size
      of 100 kb with a step of 10 kb. While i was comparing the values of pi in
      the two populations (just a ratio), i realized that the windows for some
      chromosomes were not the same in the two populations. In particular, the
      pop 2 missed some windows (i.e: in pop 2 the windows 9100001-9200000,
      9110001-9210000 are absent).

      pop 1
      CHROM BIN_START BIN_END N_VARIANTS PI
      6 9010001 9110000 3 9.27E-01
      6 9020001 9120000 3 9.27E-01
      6 9030001 9130000 3 9.27E-01
      6 9040001 9140000 4 1.25E-01
      6 9050001 9150000 4 1.25E-01
      6 9060001 9160000 3 9.78E-01
      6 9070001 9170000 3 9.78E-01
      6 9080001 9180000 3 9.78E-01
      6 9090001 9190000 3 9.78E-01
      6 9100001 9200000 1 3.18E-01
      6 9110001 9210000 1 3.18E-01
      6 9120001 9220000 2 7.12E-01
      6 9130001 9230000 2 7.12E-01

      pop 2
      CHROM BIN_START BIN_END N_VARIANTS PI
      6 9010001 9110000 3 6.88E-01
      6 9020001 9120000 3 6.88E-01
      6 9030001 9130000 3 6.88E-01
      6 9040001 9140000 3 6.88E-01
      6 9050001 9150000 3 6.88E-01
      6 9060001 9160000 2 5.39E-01
      6 9070001 9170000 2 5.39E-01
      6 9080001 9180000 2 5.39E-01
      6 9090001 9190000 2 5.39E-01
      6 9120001 9220000 1 3.68E-01
      6 9130001 9230000 1 3.68E-01

      Herwith the code i have used:
      vcftools --vcf pop1.vcf --window-pi 100000 --window-pi-step 10000 --chr 6
      --out pop1_pi_chr6
      vcftools --vcf pop2.vcf --window-pi 100000 --window-pi-step 10000 --chr 6
      --out pop2_pi_chr6

      Is there some errors i have done? If not, is there some way to have the
      same windows in both populations?

      Thanks

      Annarita

      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/vcftools/bugs/57/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Annarita Marrano, Ph.D. student

      Foundation Edmund Mach
      Research and Innovation Centre
      Genomics and Biology of Fruit Crops
      Grapevine Applied Genomics

      Via Mach 1,
      38010 San Michele all'Adige (Trento)
      Italy

       

      Related

      Bugs: #57

  • John Mattick

    John Mattick - 2017-03-29

    Hi there,

    I am having a similar issue to the above with the --window-pi setting in VCF-tools, where bins are being skipped: however, in this case, I am sure that there are relevant SNPs in the skipped windows.

    My data set consists of individuals that have been sorted into different groups: In one case, all individuals were merged into a file called All.merged.vcf. The second case merged VCF files from only individuals from a specific geographic site, into a file called AM.merged.vcf. Worth noting that AM.merged.vcf is a subset of All.merged.vcf.

    My commands are as follows:

    vcftools --vcf All.merged.vcf --window-pi 50000 --out Run1

    vcftools --vcf AM.merged.vcf --window-pi 50000 --out Run2

    However, when I look at the results, the All.merged.vcf result skips over the first million bases in the chromosome:

    cat Run1.windowed.pi | grep "ChrX" | head
    Bm_v4_ChrX_scaffold_001 1050001 1100000 3 3.06977e-05
    Bm_v4_ChrX_scaffold_001 1700001 1750000 1 8.11839e-06
    Bm_v4_ChrX_scaffold_001 1950001 2000000 1 9.19662e-06
    Bm_v4_ChrX_scaffold_001 7800001 7850000 1 4.8203e-06
    Bm_v4_ChrX_scaffold_001 10550001 10600000 2 2.04651e-05
    Bm_v4_ChrX_scaffold_001 10800001 10850000 2 2.04651e-05

    These bases are not, however, skipped in the AM.merged.vcf result:

    cat Run2.windowed.pi | grep "ChrX" | head
    Bm_v4_ChrX_scaffold_001 1 50000 1 5e-06
    Bm_v4_ChrX_scaffold_001 50001 100000 1 1.07143e-05
    Bm_v4_ChrX_scaffold_001 150001 200000 1 1.14286e-05
    Bm_v4_ChrX_scaffold_001 200001 250000 1 1.14286e-05
    Bm_v4_ChrX_scaffold_001 400001 450000 1 1.07143e-05
    Bm_v4_ChrX_scaffold_001 450001 500000 1 5e-06

    I can definitely confirm that the All.merged.vcf file has valid SNPs in the first million bases (in fact, through other means I can confirm that the SNP density is quite high there), so I am wondering if I am entering the command wrong, and why the tool might be skipping those buckets in one case but not the other, when the second is a subset of the first?

    I can provide the vcf-merge commands to create those files, if that might be a cause of the issue, but other tools seem to work fine with the vcf files (specifically ldepth.mean).

    Thank you for your help!

    John

     

Log in to post a comment.

MongoDB Logo MongoDB