We tested whether selection on beneficial loci is correctly implemented.
We simulated the trajectory of a locus with s=0.1 for 200 generations. The locus had a starting frequency p=0.1 and the population size was N=10000. The simulations were repeated 50 times.
The graphic above shows the expected (dark line) and the observed trajectories for the selected loci (grey lines). The expected and the observed trajectories are clearly overlapping. Genetic drift is leading to some variation of the observed trajectories around the theoretical expectation.
All files required for the simulations can be found here):
First, 50 independent simulations were performed with MimicrEE2. We recorded the allele frequency of the selected locus in intervals of 10 generations.
:::bash
for i in {1..50}; do java -jar mim2.jar w --haplotypes-g0 10000inds_basepop --recombination-rate one_locus_rr --snapshots 0,10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200 --replicate-runs 1 --output-sync sel_validation_rep${i}.sync --fitness fitness_file --threads 6; done
To estimate allele frequencies from a sync file, a script written in Python was used (here)).
To run the script for every sync file generated from MimicrEE2 we followed the next command (use --help to see how the program works):
:::bash
for i in {1..50}; do python freq_validation_more_than_one.py --n $i --input sel_validation_rep${i}.sync --minor-allele 0 --output all_freq_sel_coef_${i}.txt --chosen-snp 100 --generations-number 20; done
next we merged all output files:
:::bash
cat all_freq_sel_coef_{1..50}.txt > all_freq_sel_coef_50repl
Finally we plotted the observed and the expected trajectories with the following R script
:::R
##p0 is the initial allele frequency
##s is the selection coefficient
##t is the number of generations
##repl is the number of replicate runs from mimicrEE2
##h is the dominance (by default no dominance is used)
sel_traj<- function(p0,s,t,repl,h=0.5 ){
setwd("/my/working/dir")
##initial allele frequencies
p<-p0
q<-1-p0
##initialize matrix
traj<- matrix(NA, ncol=3, nrow=max(t)+1)
traj[1,1]<-p
traj[1,2]<-0
g=1
##how to estimate fitness of the three genotypes
wAA=1+s
wAa=1+h*s
waa=1
##get back the estimated values
print(paste("sel_coefficient is:", s))
print(paste("waa:", waa))
print(paste("wAa:", wAa))
print(paste("wAA:", wAA))
##estimate frequencies for t generations and s selection coefficient
while(g<=max(t)){
##mean fitness
w<- (p^2)*wAA + (2*p*q*wAa) + (q^2)*waa
##estimate af
p<-( p*(p*wAA + q*wAa) ) / w
q<-1-p
##save into matrix
traj[g+1,1]<-p
traj[g+1,2]<-g
g<-g+1
}
##make file for plotting
traj<- as.data.frame(traj)
traj[,3]<- "black"
colnames(traj)<- c("freq", "generations", "color")
##LOAD MIMICREE2 SIMULATIONS
####FORMAT OF THE INPUT FILE####
#frequencies (in one column)
# 0.1 (for generation 0 repeat 1)
# 0.15 (for generation 10 repeat 1)
# 0.25 (for generation 20 repeat 1)
# 0.1 (for generation 0 repeat 2)
# 0.15 (for generation 10 repeat 2)
# 0.25 (for generation 20 repeat 2)
# .
# .
####FORMAT OF THE INPUT FILE####
##number of generations, must always be equal to t
generations=t
##load file with allele frequencies for mimicree (format shown above)
mimicree_simulations_sel_coef<- read.table("all_freq_sel_coef_50repl")
##10 means that we get estimates after every ten generations (e.g for generation 0,10,20,30..)
mimicree_simulations_sel_coef[,2]<- rep(seq(0,generations,10), times=repl)
##21 is the count of generations for which estimates are provided
##(e.g if we have values for generations 0,5,10,15,20,25, then this value should be equal to 6)
mimicree_simulations_sel_coef[,3]<- rep(c(1:repl), each=21)
mimicree_simulations_sel_coef[,3]<- as.character(mimicree_simulations_sel_coef[,3])
colnames(mimicree_simulations_sel_coef)<- colnames(traj)
##plot results
library(ggplot2)
ggplot(rbind(mimicree_simulations_sel_coef,traj), aes(x=generations, y=freq, color=color)) +
geom_line() + theme_bw() + labs(x="Generations", y="Allele Frequency(%)") +
scale_color_manual(values=c(rep("grey",times=repl), "black"), guide=FALSE)
}
##how to run the function above
sel_traj(0.1,0.1,200,50)