Menu

SelectionCoefficient

Robert Kofler Christos Vlachos
Attachments
sel_coef_validation.png (18325 bytes)

Validation for selection

We tested whether selection on beneficial loci is correctly implemented.
We simulated the trajectory of a locus with s=0.1 for 200 generations. The locus had a starting frequency p=0.1 and the population size was N=10000. The simulations were repeated 50 times.

The graphic above shows the expected (dark line) and the observed trajectories for the selected loci (grey lines). The expected and the observed trajectories are clearly overlapping. Genetic drift is leading to some variation of the observed trajectories around the theoretical expectation.

Materials and Methods

All files required for the simulations can be found here):
First, 50 independent simulations were performed with MimicrEE2. We recorded the allele frequency of the selected locus in intervals of 10 generations.

:::bash
for i in {1..50}; do java -jar mim2.jar w --haplotypes-g0 10000inds_basepop  --recombination-rate one_locus_rr --snapshots 0,10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200 --replicate-runs 1 --output-sync sel_validation_rep${i}.sync --fitness fitness_file  --threads 6; done 

To estimate allele frequencies from a sync file, a script written in Python was used (here)).

To run the script for every sync file generated from MimicrEE2 we followed the next command (use --help to see how the program works):

:::bash
for i in {1..50}; do python freq_validation_more_than_one.py --n $i --input sel_validation_rep${i}.sync --minor-allele 0 --output all_freq_sel_coef_${i}.txt --chosen-snp 100 --generations-number 20; done

next we merged all output files:

:::bash
cat all_freq_sel_coef_{1..50}.txt > all_freq_sel_coef_50repl

Finally we plotted the observed and the expected trajectories with the following R script

:::R

##p0 is the initial allele frequency
##s is the selection coefficient
##t is the number of generations
##repl is the number of replicate runs from mimicrEE2
##h is the dominance (by default no dominance is used)

sel_traj<- function(p0,s,t,repl,h=0.5 ){

    setwd("/my/working/dir")

  ##initial allele frequencies 
  p<-p0
  q<-1-p0

  ##initialize matrix
  traj<- matrix(NA, ncol=3, nrow=max(t)+1)
  traj[1,1]<-p
  traj[1,2]<-0

  g=1

  ##how to estimate fitness of the three genotypes
  wAA=1+s
  wAa=1+h*s
  waa=1

  ##get back the estimated values
  print(paste("sel_coefficient is:", s))
  print(paste("waa:", waa))
  print(paste("wAa:", wAa))
  print(paste("wAA:", wAA))


  ##estimate frequencies for t generations and s selection coefficient
  while(g<=max(t)){

    ##mean fitness
    w<- (p^2)*wAA + (2*p*q*wAa) + (q^2)*waa

    ##estimate af
    p<-( p*(p*wAA + q*wAa) ) / w
    q<-1-p

    ##save into matrix
    traj[g+1,1]<-p
    traj[g+1,2]<-g
    g<-g+1
    }

  ##make file for plotting
  traj<- as.data.frame(traj)
  traj[,3]<- "black"
  colnames(traj)<- c("freq", "generations", "color")





  ##LOAD MIMICREE2 SIMULATIONS


  ####FORMAT OF THE INPUT FILE####

  #frequencies (in one column)
  # 0.1 (for generation 0 repeat 1)
  # 0.15 (for generation 10 repeat 1)  
  # 0.25 (for generation 20 repeat 1)  
  # 0.1 (for generation 0 repeat 2)
  # 0.15 (for generation 10 repeat 2)  
  # 0.25 (for generation 20 repeat 2)  
  #  .
  #  .

  ####FORMAT OF THE INPUT FILE####


   ##number of generations, must always be equal to t
  generations=t

  ##load file with allele frequencies for mimicree (format shown above)
  mimicree_simulations_sel_coef<- read.table("all_freq_sel_coef_50repl")

  ##10 means that we get estimates after every ten generations (e.g for generation 0,10,20,30..)
  mimicree_simulations_sel_coef[,2]<- rep(seq(0,generations,10), times=repl) 

  ##21 is the count of generations for which estimates are provided 
  ##(e.g if we have values for generations 0,5,10,15,20,25, then this value should be equal to 6)

  mimicree_simulations_sel_coef[,3]<- rep(c(1:repl), each=21)
  mimicree_simulations_sel_coef[,3]<- as.character(mimicree_simulations_sel_coef[,3])

  colnames(mimicree_simulations_sel_coef)<- colnames(traj)



  ##plot results
  library(ggplot2)
  ggplot(rbind(mimicree_simulations_sel_coef,traj), aes(x=generations, y=freq, color=color)) + 
    geom_line() + theme_bw() + labs(x="Generations", y="Allele Frequency(%)") + 
    scale_color_manual(values=c(rep("grey",times=repl), "black"), guide=FALSE)
  }

##how to run the function above
sel_traj(0.1,0.1,200,50)

Related

Wiki: Home

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.