Menu

MinimalWalkthrough

Robert Kofler Christos Vlachos

1 Introduction

In this walkthrough we quickly introduce the usage of MimicrEE2 using small toy examples.
We focus on diploids. Haploids will be treated in later walkthroughs.

2 Minimal walkthrough

Minimal simulations with MimicrEE2 require solely a haplotype file [HaplotypeFile].

haplotypes

We first generate a minimal haplotype file. We will simulate 10 SNPs and provide 20 haplotypes (thus N=10 diploid individuals).

We store the following toy example of diploid haplotypes in file: mini.mimhap

2L      1       G       A/G     AG AA AA AG AA AA AA AA AA AA
2L      2       G       A/G     AG AA AA AA AA AA AA AA AA AA
2L      3       G       A/G     AG AA AA AA AA AA AA AA AA AA
2L      4       G       A/G     AG AA AA AA AA AA AA AA AA AA
2L      5       G       A/G     AG AA AA AG AA AA AA AA AA AA
2L      6       G       A/G     AG AA AA AA AA AG AA AA AA AA
2L      7       G       A/G     AG AA AA AA AA AA AA AA AA AA
2L      8       G       A/G     AG AA AA AA AA AA AA AA AA AA
2L      9       G       A/G     AG AA AA AG AA AG AA AG AA AA
2L      10      G       A/G     AG AA AA AG AA AA AA AA AA AA

This file may also be obtained here: https://sourceforge.net/projects/mimicree2/files/walkthrough_for_wmode/data/mini.mimhap/download

minimal simulations

Now we can start the minimal simulations. We use 1 replicate and as output we want to obtain the allele frequencies.
Simulations will be performed for 50 generations.

java -jar mim2.jar w --replicate-runs 1 --haplotypes-g0 mini.mimhap --snapshots 50 --output-sync allele-freqs.txt

output from minimal simulations

We may view the content of the resulting sync file (it basically contains the allele frequencies) with the following command

> gzip -cd allele-freqs.txt

and may for example obtain:

2L  1   G   18:0:0:2:0:0    20:0:0:0:0:0
2L  2   G   19:0:0:1:0:0    20:0:0:0:0:0
2L  3   G   19:0:0:1:0:0    20:0:0:0:0:0
2L  4   G   19:0:0:1:0:0    20:0:0:0:0:0
2L  5   G   18:0:0:2:0:0    20:0:0:0:0:0
2L  6   G   18:0:0:2:0:0    20:0:0:0:0:0
2L  7   G   19:0:0:1:0:0    20:0:0:0:0:0
2L  8   G   19:0:0:1:0:0    20:0:0:0:0:0
2L  9   G   16:0:0:4:0:0    6:0:0:14:0:0
2L  10  G   18:0:0:2:0:0    20:0:0:0:0:0

The above sync file contains the allele frequencies of the start population (column 4) and the evolved population (column 5), in the form A:T:C:G:N:del (for details see [OutputFiles]; this file can be used directly for downstream analysis with tools such as PoPoolation2 https://sourceforge.net/projects/popoolation2/).

In this example all segregating SNPs - except one - got fixed during the neutral simulations.

3 Minimal simulations with recombination and selection

In this walkthrough we introduce the usage of MimicrEE2 with recombination and selected loci.
We use the haplotype file specified above and additionally provide a recombination map as well as a selected locus.

recombination map

We specify that in the region between base 1 and 10 of chromosome 2L on the average 0.1 recombination should occur. We save the following content in file recrate.txt

[lambda]
2L:1..10    0.1

For details on this file format see [RecombinationRate] . The file may also be obtained here https://sourceforge.net/projects/mimicree2/files/walkthrough_for_wmode/data/recrate.txt/download

selected loci

Finally we specify that the locus at position 5 is positively selected with a selection coefficient of s = 0.15 (and an additive effect of h=0.5). We store the following content in file selected.txt.

[s]
2L  5   A/G 0.15    0.5

For details on this file format see [SelectedLoci]. The file may also be obtained here https://sourceforge.net/projects/mimicree2/files/walkthrough_for_wmode/data/selected.txt/download

minimal simulations with recombination and selection

Having the input files we can proceed with the forward simulations. We use 1 replicate and simulate selection for 50 generations. As output we want to obtain the allele frequencies

java -jar mim2.jar w --replicate-runs 1 --haplotypes-g0 mini.mimhap --recombination-rate recrate.txt --fitness selected.txt --snapshots 50 --output-sync allele-freqs.txt

output: the allele frequencies (--output-sync)

The output is zipped and may be displayed with command gzip -cd allele-freqs.txt

2L  1   G   18:0:0:2:0:0    0:0:0:20:0:0
2L  2   G   19:0:0:1:0:0    20:0:0:0:0:0
2L  3   G   19:0:0:1:0:0    20:0:0:0:0:0
2L  4   G   19:0:0:1:0:0    20:0:0:0:0:0
2L  5   G   18:0:0:2:0:0    0:0:0:20:0:0
2L  6   G   18:0:0:2:0:0    20:0:0:0:0:0
2L  7   G   19:0:0:1:0:0    20:0:0:0:0:0
2L  8   G   19:0:0:1:0:0    20:0:0:0:0:0
2L  9   G   16:0:0:4:0:0    0:0:0:20:0:0
2L  10  G   18:0:0:2:0:0    14:0:0:6:0:0

We specified that the G at position 5 is positively selected. This file shows that the G went from a low frequency of 10% in the base population (18:0:0:2:0:0) to fixation in the evolved population (0:0:0:20:0:0).
With the exception of the SNP at position 10 all other SNPs where fixed. Individual number 4 contains a distinct haplotype that is linked to the positively selected variant. This haplotype has a distinct allele at positions 1, 9 and 10. The sync file demonstrates that the alleles at position 1 and 9 hitchhiked to fixation with the selected locus (at position 5). However the linked variant at position 10 did not get fixed.

4 Minimal simulations with several replicates and fitness as output

Using the input files generated above we perform forward simulations for 2 replicates and 50 generations. As output we want the allele frequencies and the fitness of the individuals.

java -jar mim2.jar w --replicate-runs 2 --haplotypes-g0 mini.mimhap --recombination-rate recrate.txt --fitness selected.txt --snapshots 50 --output-sync allele-freqs.txt --output-gpf pop-fitness.txt

output: the allele frequencies (--output-sync)

Drift is strong with a population size of N=10, so the results may differ between replicates (they actually differ in the following output):

2L  1   G   18:0:0:2:0:0    0:0:0:20:0:0    18:0:0:2:0:0    0:0:0:20:0:0
2L  2   G   19:0:0:1:0:0    20:0:0:0:0:0    19:0:0:1:0:0    0:0:0:20:0:0
2L  3   G   19:0:0:1:0:0    20:0:0:0:0:0    19:0:0:1:0:0    0:0:0:20:0:0
2L  4   G   19:0:0:1:0:0    20:0:0:0:0:0    19:0:0:1:0:0    20:0:0:0:0:0
2L  5   G   18:0:0:2:0:0    0:0:0:20:0:0    18:0:0:2:0:0    20:0:0:0:0:0
2L  6   G   18:0:0:2:0:0    20:0:0:0:0:0    18:0:0:2:0:0    20:0:0:0:0:0
2L  7   G   19:0:0:1:0:0    20:0:0:0:0:0    19:0:0:1:0:0    20:0:0:0:0:0
2L  8   G   19:0:0:1:0:0    20:0:0:0:0:0    19:0:0:1:0:0    20:0:0:0:0:0
2L  9   G   16:0:0:4:0:0    0:0:0:20:0:0    16:0:0:4:0:0    20:0:0:0:0:0
2L  10  G   18:0:0:2:0:0    0:0:0:20:0:0    18:0:0:2:0:0    20:0:0:0:0:0
  • column 1: chromosome
  • column 2: position
  • column 3: reference character
  • column 4: base population
  • column 5: evolved population of replicate 1 (50 generations)
  • column 6: base population
  • column 7: evolved population of replicate 2 (50 generations)
    For details on this output file see [OutputFiles]

Note the beneficial allele at position 5 got fixed in the first replicate (compare column 4 vs 5) but got lost in the second replicate (compare column 6 vs 7).

output: fitness (--output-gpf)

The following file contains the fitness (column 5) of all individuals in the base (column_2 = 0) and evolved (column_2 = 50) populations from both replicates (replicate is column 1). For details of this file format see [OutputFiles]

1   0   1.0 1.0 1.075
1   0   1.0 1.0 1.0
1   0   1.0 1.0 1.0
1   0   1.0 1.0 1.075
1   0   1.0 1.0 1.0
1   0   1.0 1.0 1.0
1   0   1.0 1.0 1.0
1   0   1.0 1.0 1.0
1   0   1.0 1.0 1.0
1   0   1.0 1.0 1.0
1   50  1.0 1.0 1.15
1   50  1.0 1.0 1.15
1   50  1.0 1.0 1.15
1   50  1.0 1.0 1.15
1   50  1.0 1.0 1.15
1   50  1.0 1.0 1.15
1   50  1.0 1.0 1.15
1   50  1.0 1.0 1.15
1   50  1.0 1.0 1.15
1   50  1.0 1.0 1.15
2   0   1.0 1.0 1.075
2   0   1.0 1.0 1.0
2   0   1.0 1.0 1.0
2   0   1.0 1.0 1.075
2   0   1.0 1.0 1.0
2   0   1.0 1.0 1.0
2   0   1.0 1.0 1.0
2   0   1.0 1.0 1.0
2   0   1.0 1.0 1.0
2   0   1.0 1.0 1.0
2   50  1.0 1.0 1.0
2   50  1.0 1.0 1.0
2   50  1.0 1.0 1.0
2   50  1.0 1.0 1.0
2   50  1.0 1.0 1.0
2   50  1.0 1.0 1.0
2   50  1.0 1.0 1.0
2   50  1.0 1.0 1.0
2   50  1.0 1.0 1.0
2   50  1.0 1.0 1.0

Fom the fitness files we can also see that the beneficial locus got fixed in the first replicate (all 10 individuals have a fitness of 1.15 at generation 50) while it got lost in the second replicate (all ten individuals have a fitness of 1.0 at generation 50).


Related

Wiki: HaplotypeFile
Wiki: Home
Wiki: OutputFiles
Wiki: RecombinationRate
Wiki: SelectedLoci

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.