Hello,
I have the following sync file that I used as an input:
1 13961 C 27:0:10:0:0:0 24:0:3:0:0:0 26:0:8:0:0:0 22:0:6:0:0:0 51:0:13:0:0:0 48:0:14:0:0:0 77:0:21:0:0:0
1 13988 C 0:26:4:0:0:0 0:21:0:0:0:0 0:31:1:0:0:0 0:25:0:0:0:0 0:47:4:0:0:0 0:56:1:0:0:0 0:78:5:0:0:0
1 14005 G 0:30:0:3:0:0 0:26:0:0:0:0 0:30:0:1:0:0 0:27:0:1:0:0 0:56:0:3:0:0 0:57:0:2:0:0 0:86:0:4:0:0
1 14215 C 0:34:4:0:0:0 0:28:4:0:0:0 0:26:2:0:0:0 0:28:0:0:0:0 0:62:8:0:0:0 0:54:2:0:0:0 0:88:10:0:0:0
I run popoolation2 to calculate FST per snp and over a 500bp window that includes these 4 SNPs only.
I use the following commands:
For per_snp:
perl /popoolation2/fst-sliding.pl --input file_input.sync --output file_out_per_snp.fst --suppress-noninformative --min-count 2 --min-coverage 4 --max-coverage 500 --min-covered-fraction 0.0 --window-size 1 --step-size 1 --pool-size 30:30:30:14:60:44:90
For the 500bp window:
perl /popoolation2/fst-sliding.pl --input file_input.sync --output /file_out_500bp_win.fst --min-count 2 --min-coverage 4 --max-coverage 500 --min-covered-fraction 0.0 --window-size 500 --step-size 250 --pool-size 30:30:30:14:60:44:90
Results:
per_snp: POS=13750-14250:
CHROM POS VarPerWin CovPerWin AvrMinCov Fst_p1_p2 p1_p3 p1_p4
1 13961 1 1.000 27.0 1:2=0.04760931 1:3=0.00289877 1:4=0.03004437 1:5=0.01893165 1:6=0.01287065 1:7=0.02219204 2:3=0.03185784 2:4=0.03466877 2:5=0.03940876 2:6=0.04336600 2:7=0.04979839 3:4=0.02419922 3:5=0.01603647 3:6=0.01205424 3:7=0.02065334 4:5=0.03692451 4:6=0.03520677 4:7=0.04259817 5:6=0.00396872 5:7=0.00582843 6:7=0.00885687
1 13988 1 1.000 21.0 1:2=0.08515130 1:3=0.03492866 1:4=0.11417409 1:5=0.01968370 1:6=0.05136134 1:7=0.02937396 2:3=0.03250189 2:4=0.00000000 2:5=0.08401443 2:6=0.04961828 2:7=0.08693245 3:4=0.06319493 3:5=0.03069365 3:6=0.01089913 3:7=0.03170471 4:5=0.11307328 4:6=0.07976832 4:7=0.11589873 5:6=0.02555926 5:7=0.00708400 6:7=0.02525081
1 14005 1 1.000 26.0 1:2=0.05563187 1:3=0.01632254 1:4=0.04403781 1:5=0.01719663 1:6=0.02068531 1:7=0.02250156 2:3=0.02269861 2:4=0.02097902 2:5=0.06354515 2:6=0.04917957 2:7=0.07111021 3:4=0.02045796 3:5=0.02171465 3:6=0.01346339 3:7=0.02598114 4:5=0.04434173 4:6=0.03315474 4:7=0.04705491 5:6=0.00543518 5:7=0.00560214 6:7=0.01065786
1 14215 1 1.000 28.0 1:2=0.00329836 1:3=0.00919713 1:4=0.10153204 1:5=0.01520005 1:6=0.02354744 1:7=0.01895554 2:3=0.01094626 2:4=0.10757985 2:5=0.01654647 2:6=0.03264043 2:7=0.02100261 3:4=0.07498632 3:5=0.02847563 3:6=0.01608308 3:7=0.03052647 4:5=0.13220530 4:6=0.08407128 4:7=0.13437933 5:6=0.02937016 5:7=0.00500426 6:7=0.03099850
500bp window:
1 14000 4 0.008 25.5 1:2=0.04270060 1:3=0.01210611 1:4=0.05721567 1:5=0.01800513 1:6=0.02300450 1:7=0.02301279 2:3=0.02431261 2:4=0.05363120 2:5=0.03909072 2:6=0.04070549 2:7=0.04520839 3:4=0.03287184 3:5=0.02190920 3:6=0.01283939 3:7=0.02503795 4:5=0.06482645 4:6=0.04029225 4:7=0.06637237 5:6=0.01264480 5:7=0.00580603 6:7=0.01562284
Manual Fst calculation (pool1 versus pool4) for the first SNP in the window:
REF_Allele_freq_pool1= 10/(10+27)=0.2702703
ALT_Allele_freq_pool1= 27/(10+27)=0.7297297
REF_Allele_freq_pool4= 6/(6+22) = 0.2142857
ALT_Allele_freq_pool4= 22/(6+22) = 0.7857143
Pi_pool1=1-(0.2702703^2 + 0.7297297^2) = 0.3944485
Pi_pool4=1-(0.2142857^2 + 0.7857143^2) = 0.3367347
Pi_within=(0.3944485 + 0.3367347)/2 = 0.3655916
REF_Allele_freq_pool1_and_pool4 = (0.2702703 + 0.2142857)/2 = 0.242278
ALT_Allele_freq_pool1_and_pool4 = (0.7297297 + 0.7857143)/2 = 0.757722
Pi_totoal = 1-(0.242278^2 + 0.757722^2) = 0.3671587
Fst = (0.3671587 - 0.3655916)/0.3671587 = 0.004268182
The Fst value obtained manually is 0.004268182
The Fst value obtained with popoolation2 is 0.03004437
What is the source of difference?
How should I correct my calculations too be able to obtain the same values given by the popoolation2 tool?
And can you also describe how you would calculate the Fst for the window that has all the four SNPs?
Thank you and best regards,
Rakan. N.