1 Introduction
MimicrEE2 allows performing genome-wide forward simulations of evolving populations. It supports haploid as well as diploid organism.
MimicrEE2 offers three primary simulation modes: i) w-mode: classical simulations with loci having a constant effect size (e.g. selection coefficient), ii) qt-mode: truncating selection of a quantitative trait and iii) qff-mode: a quantitative trait mapping to fitness using distinct fitness functions (stabilizing selection, diminishing returns epistasis, etc).
Additionally MimicrEE2 supports some secondary features that may, for example, be used to convert haplotypes to fasta entries.
installation
Just download the most recent jar-file here https://sourceforge.net/projects/mimicree2/files/versions/
and move it into a folder of choice.
Go to the command line and type the following:
# example usage
java -jar /any/folder/mim2-v161.jar
assign more memory to MimicrEE2
Forward simulations of evolving populations may require considerable memory. In case of insufficient memory an error message is produced. It is possible to assign more memory to the Java Virtual Machine using the following command
# for example, assigning 12GB RAM to Java
java -Xmx12g -jar /any/folder/mim2-v161.jar
Note make sure that the assigned memory does not exceed the memory of your computer
2 Main features
w-mode: adaptation with loci having a given fitness
This mode allows assigning constant selection coefficients to a given set of SNPs. Furthermore arbitrary complex pattern of epistasis may be simulated for pairs of loci.
Many options are supported. Following an example of a minimum call:
# typical usage
# note the following command will perform neutral simulations for 1 replicate and 20 generations
java -jar mim2.jar w --haplotypes-g0 haplotypes.mimhap.gz --recombination-rate rec-rate.txt --snapshots 20 --output-sync results.sync
The following parameters may be provided:
- --haplotypes-g0 a file with the haplotypes of the base population, see [HaplotypeFile] ; mandatory
- --recombination-rate a file with the recombination rate for windows, see [RecombinationRate] ; optional, per default no recombination is used
- --population-size a file with the population sizes that will be used during the simulations, see [MiscInput] ; optional, per default the population size of the base population will be used (--haplotypes-g0, e.g. if 1000 haplotypes are provided an N of 500 will be used).
- --chromosome-definition specify the reference sequences that constitute a chromosome, see [MiscInput] ; optional, per default random segregation of all reference chromosomes is assumed
- --sex a file with the ratio of males, females and hermaphrodites; also the selfing-rate may be specified, see [sex]; optional, per default a population of hermaphrodites without selfing is simulated
- --snapshots a coma separated list of generations for which the requested output will be saved (valid examples are 20 or 10,20,30,100 etc); use this parameter to set the default for all outputs; these defaults may be overridden (see below) semi-mandatory; at least one of --snapshots, --snapshots-sync, --snapshots-dir and --snapshots-gpf must be provided
- --snapshots-sync use a distinct list of output generations for --output-sync; overrides any default set by --snapshots; semi-mandatory; at least one of --snapshots, --snapshots-sync, --snapshots-dir and --snapshots-gpf must be provided
- --snapshots-dir use a distinct list of output generations for --output-dir; overrides any default set by --snapshots; semi-mandatory; at least one of --snapshots, --snapshots-sync, --snapshots-dir and --snapshots-gpf must be provided
- --snapshots-gpf use a distinct list of output generations for --output-gpf; overrides any default set by --snapshots; semi-mandatory; at least one of --snapshots, --snapshots-sync, --snapshots-dir and --snapshots-gpf must be provided
- --replicate-runs number of replicates; optional, default = 1
- --output-sync the output file for the allele frequency counts; will be zipped; see [OutputFiles]; semi-mandatory, either --output-dir or --output-sync (or both) must be provided
- --output-dir the output directory for the haplotypes, see [OutputFiles]; directory must exist; semi-mandatory, either --output-dir or --output-sync (or both) must be provided
- --output-gpf the output file for the genotypic values / phenotypic values / fitness of the individuals, see [OutputFiles]; optional
- --fitness selection coefficients of SNPs (alternatively the absolute fitness value of the genotypes may be provided) see [SelectedLoci]; optional
- --epistasis epistasis may be provided as absolute fitness for all genotypes involving a pair of SNPs (waabb...waAbB...wAABB), see [SelectedLoci] ; optional
- --migration-regime specify the migration regime, see [MiscInput] ; optional
- --mutation-rate the mutation rate per genomic site and generation; optional, default=0.0
- --haploid perform simulations of haploid organism
- --clonal perform simulations of clonal evolution
- --detailed-log print a more detailed log message; optional
- --threads the number of threads to use; optional, default=1
- --help print the help; optional
Note MimicrEE2 will perform simulations until the maximum number of generations in either of --snapshots, --snapshots-sync, --snapshots-dir and --snapshots-gpf is reached.
Note When using --snapshots the output will always also contain the base population (generation 0). This behavior may be overridden by using --snapshots-sync, --snapshots-dir and --snapshots-gpf; Use 0 (zero) to request output for the base population.
flow diagram
The following flow diagram shows the sequence of events occurring during one generation of the simulations.

The width of the blue arc indicates the population size. The new population size (N_new) will be adjusted after zygote formation. Events in square brackets are optional. In case of migration, the given number of individuals will be replaced from N_new (indicated as yellow arch). Genotypic values and phenotypic values are ignored (set to the default value = 1.0). The fitness of individuals will be computed as the product of the fitness of all SNPs and epistatic pairs. The mating success of individuals scales linearly with fitness (see also validation [MatingFunctions]). Mutations may be introduced into the gametes.
qt-mode: truncating selection of quantitative trait
This mode allows performing truncating selection on a quantitative trait. The strength of selection (ie. the fraction of individuals selected) may vary during the experiment.
# typical usage
java -jar mim2.jar qt --selection-regime truncating.txt --effect-size loci.txt --heritability 0.6 --haplotypes-g0 haplotypes.mimhap.gz --recombination-rate rec-rate.txt --snapshots 20 --output-sync results.sync
- --haplotypes-g0 a file with the haplotypes of the base population, see [HaplotypeFile] ; mandatory
- --recombination-rate a file with the recombination rate for windows, see [RecombinationRate] ; optional, per default no recombination is used
- --population-size a file with the population sizes that will be used during the simulations, see [MiscInput]; optional, per default the population size of the base population will be used (--haplotypes-g0, e.g. if 1000 haplotypes are provided an N of 500 will be used).
- --effect-size the effect sizes of the SNPs contributing to the quantitative trait, see [SelectedLoci]; optional; if not provided a default value (1) will be used as genotypic value of each individual
- --ve environmental variance (NOT std.dev); semi-mandatory either --ve or --heritability needs to be provided
- --heritability heritability (h^2); semi-mandatory either --ve or --heritability needs to be provided
- --chromosome-definition specify the reference sequences that constitute a chromosome, see [MiscInput]); optional, per default random segregation of all reference chromosomes is assumed
- --sex a file with the ratio of males, females and hermaphrodites; also the selfing-rate may be specified, see [sex]; optional, per default a population of hermaphrodites without selfing is simulated
- --snapshots a coma separated list of generations for which the requested output will be saved (valid examples are 20 or 10,20,30,100 etc); use this parameter to set the default for all outputs; these defaults may be overridden (see below) semi-mandatory; at least one of --snapshots, --snapshots-sync, --snapshots-dir and --snapshots-gpf must be provided
- --snapshots-sync use a distinct list of output generations for --output-sync; overrides any default set by --snapshots; semi-mandatory; at least one of --snapshots, --snapshots-sync, --snapshots-dir and --snapshots-gpf must be provided
- --snapshots-dir use a distinct list of output generations for --output-dir; overrides any default set by --snapshots; semi-mandatory; at least one of --snapshots, --snapshots-sync, --snapshots-dir and --snapshots-gpf must be provided
- --snapshots-gpf use a distinct list of output generations for --output-gpf; overrides any default set by --snapshots; semi-mandatory; at least one of --snapshots, --snapshots-sync, --snapshots-dir and --snapshots-gpf must be provided
- --replicate-runs number of replicates; optional, default = 1
- --output-sync the output file for the allele frequency counts; will be zipped; see [OutputFiles]; semi-mandatory, either --output-dir or --output-sync (or both) must be provided
- --output-dir the output directory for the haplotypes, see [OutputFiles]; directory must exist; semi-mandatory, either --output-dir or --output-sync (or both) must be provided
- --output-gpf the output file for the genotypic values / phenotypic values / fitness of the individuals, see [OutputFiles]; optional
- --selection-regime the truncating selection regime, see [SelectionRegimes]; mandatory
- --migration-regime specify the migration regime, see [MiscInput]; optional
- --mutation-rate the mutation rate per genomic site and generation; optional, default=0.0
- --haploid perform simulations of haploid organism
- --clonal perform simulations of clonal evolution
- --detailed-log print a more detailed log message; optional
- --threads the number of threads to use; optional, default=1
- --help print the help; optional
Note MimicrEE2 will perform simulations until the maximum number of generations in either of --snapshots, --snapshots-sync, --snapshots-dir and --snapshots-gpf is reached.
Note When using --snapshots the output will always also contain the base population (generation 0). This behavior may be overridden by using --snapshots-sync, --snapshots-dir and --snapshots-gpf; Use 0 (zero) to request output for the base population.
flow diagram
The following flow diagram shows the sequence of events occurring during one generation of the simulations.

The width of the blue arc indicates the population size. A given fraction of individuals from N_old survive truncating selection (those with the highest phenotypic values). Survivors will mate randomly. Mutations may be introduced into the gametes. The new population size (N_new) will be adjusted after zygote formation. Events in square brackets are optional. In case of migration the given number of individuals will be replaced from N_new (indicated as yellow arch).
The genotypic values will be computed as the sum of the effect sizes of specified SNPs. Based on the specified environmental variance (or h^2) a Gaussian random number will be added to the genotypic value, yielding the phenotypic value.
The fitness value will be ignored (set to the default value = 1.0).
qff-mode: a quantitative trait mapping to fitness using a fitness function (qff)
This mode simulates a quantitative trait mapping to fitness using an arbitrary fitness function. This allows to simulate stabilizing selection, disruptive selection, diminishing returns epistasis, directional selection, etc.
Additionally, the fitness functions may change during the experiment, which allows to simulate adaptation to a moving optimum.
# typical usage
java -jar mim2.jar qff --fitness-function stabilzing.txt --effect-size loci.txt --heritability 0.6 --haplotypes-g0 haplotypes.mimhap.gz --recombination-rate rec-rate.txt --snapshots 20 --output-sync results.sync
- --haplotypes-g0 a file with the haplotypes of the base population, see [HaplotypeFile]; mandatory
- --recombination-rate a file with the recombination rate for windows, see [RecombinationRate]; optional, per default no recombination is used
- --population-size a file with the population sizes that will be used during the simulations, see [MiscInput]; optional, per default the population size of the base population will be used (--haplotypes-g0, e.g. if 1000 haplotypes are provided an N of 500 will be used).
- --effect-size the effect sizes of the SNPs contributing to the quantitative trait, see [SelectedLoci]; optional; if not provided a default value (1) will be used as genotypic value of each individual
- --ve environmental variance (NOT std.dev); semi-mandatory either --ve or --heritability needs to be provided
- --heritability heritability (h^2); semi-mandatory either --ve or --heritability needs to be provided
- --chromosome-definition specify the reference sequences that constitute a chromosome, see [MiscInput]); optional, per default random segregation of all reference chromosomes is assumed
- --sex a file with the ratio of males, females and hermaphrodites; also the selfing-rate may be specified, see [sex]; optional, per default a population of hermaphrodites without selfing is simulated
- --snapshots a coma separated list of generations for which the requested output will be saved (valid examples are 20 or 10,20,30,100 etc); use this parameter to set the default for all outputs; these defaults may be overriden (see below) semi-mandatory; at least one of --snapshots, --snapshots-sync, --snapshots-dir and --snapshots-gpf must be provided
- --snapshots-sync use a distinct list of output generations for --output-sync; overrides any default set by --snapshots; semi-mandatory; at least one of --snapshots, --snapshots-sync, --snapshots-dir and --snapshots-gpf must be provided
- --snapshots-dir use a distinct list of output generations for --output-dir; overrides any default set by --snapshots; semi-mandatory; at least one of --snapshots, --snapshots-sync, --snapshots-dir and --snapshots-gpf must be provided
- --snapshots-gpf use a distinct list of output generations for --output-gpf; overrides any default set by --snapshots; semi-mandatory; at least one of --snapshots, --snapshots-sync, --snapshots-dir and --snapshots-gpf must be provided
- --replicate-runs number of replicates; optional, default = 1
- --output-sync the output file for the allele frequency counts; will be zipped; see [OutputFiles]; semi-mandatory, either --output-dir or --output-sync (or both) must be provided
- --output-dir the output directory for the haplotypes, see [OutputFiles]; directory must exist; semi-mandatory, either --output-dir or --output-sync (or both) must be provided
- --output-gpf the output file for the genotypic values / phenotypic values / fitness of the individuals, see [OutputFiles]; optional
- --fitness-function a function used for mapping phentoypic values to fitness. It is feasibly to specify i) gaussian stabilizing selection, ii) adaptation to a moving optimum, iii) directional selection, iv) disruptive selection, v) diminishing returns epistasis, vi) an arbitrary fitness function using an arbitrary precision; see [SelectionRegimes]; mandatory
- --migration-regime specifiy the migration regime, see [MiscInput]; optional
- --mutation-rate the mutation rate per genomic site and generation; optional, default=0.0
- --haploid perform simulations of haploid organism
- --clonal perform simulations of clonal evolution
- --detailed-log print a more detailed log message; optional
- --threads the number of threads to use; optional, default=1
- --help print the help; optional
Note MimimcrEE2 will perform simulations until the maximum number of generations in either of --snapshots, --snapshots-sync, --snapshots-dir and --snapshots-gpf is reached.
Note When using --snapshots the output will always also contain the base population (generation 0). This behaviour may be overriden by using --snapshots-sync, --snapshots-dir and --snapshots-gpf; Use 0 (zero) to request output for the base population.
flow diagram
The following flow diagram shows the sequence of events occuring during one generation of the simulations.

The width of the blue arc indicates the population size. The new population size (N_new) will be adjusted after zygote formation. Events in square brackets are optional. With migration the given number of indviduals will be replaced from N_new (indicated as yellow arch).
The genotypic values will be computed as the sum of the effect sizes of specified SNPs. Based on the specified environmental variance (or h^2) a gaussian random number will be added to the genotypic value, yielding the phenotypic value. The fitness value will be computed using the speficied fitness function. The mating success of individuals scales linearly with fitness (see also validation [MatingFunctions]). Mutations may be introduced into the gametes.
3 Secondary features
mimhap2fasta
In real E&R studies, the allele frequencies or the haplotypes are usually esimated based on mapped sequencing data (e.g. Illumina or PacBio). MimicrEE2 allows circumventing this laborous step by directly providing the allele freuencies of SNPs or the haplotypes.
However, for some applications it may be necessary to follow the entire worklfow of an E&R, study including mapping of short reads. To enable this MimicrEE2 supports conversion of the haplotypes into fasta sequences, which may be used in any of the available tools for simulating short reads (e.g ART https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3278762/).
# Minimal usage
java -jar mim2.jar mimhap2fasta --mimhap haps.mimhap --reference refgenome.fasta --output-fasta output.fasta
The following parameters may be provided:
- --mimhap a mimicree haplotype file; see [HaplotypeFile]; mandatory
- --reference a reference genome in fasta format; reference-IDs must match with haplotypes; mandatory
- --output-fasta the output of the resulting fasta entries in a single filesemi-mandatory either --output-fasta or --output-dir must be provided
- --output-dir provide separate entries for the resulting fasta file; per default a fasta file is generated for each haploid genome in the population , i.e. a fasta file containing an entry for each chromosome (naming scheme of files: "mimhap_[hapNr].fasta") semi-mandatory either --output-fasta or --output-dir must be provided
- --split-chromosome; only in combination with --output-dir; generate a separate file for each chromosome of each haploid genome. Thus every fasta entry will be in a separate file. (naming scheme of files: "[chromosome]_mimhap_[hapNr].fasta")
- --stringent check if the character in the reference genome matches the reference character in the haplotype file; optional
For a quick walkthrough see [Mimhap2Fasta]
arp2mimhap
Convert a
# minimal usage
java -jar mim2.jar arp2mimhap --chrname 2L --input 20kb-chr1_1_1.arp --output chr2L.mimhap
- --chrname name of the generated chromosome, eg.: 2L, 4, 3L
- --input the fastsimcoal output (or Arlequin output) that should be converted into the MimicrEE2 format
- --output the resulting MimicrEE2 haplotype file
- --haploid if this flag is provided the generated haplotypes may be used for simulations of a haploid organism; per default haplotypes for diploid organism are generated
For a quick walkthrough see [GeneratingHaplotypesWithFastsimcoal]
unit-tests
Unit-tests are an important aspect of software engineering. They enable to ensure that all components of a software (i.e. classes) have the expected behaviour. All components are treated as black boxes. When some input is provided to these black boxes the expected and the observed values of the output are compared. For example imagine a component computing the area of rectangles. For a given input of the sides (a=2 and b=5) the expected result (10) is compared to the observed output. If all unit tests evaluate correctly we can conclude that all components show the expected behaviour. To ensure proper functionallity of MimicrEE2 we implemented a large number of unit-tests (almost 200) for most of the components.
It is possible to run the unit tests implemented in MimicrEE2 using the following command:
java -jar ../out/artifacts/mim2_jar/mim2-v162.jar unit-tests
We should get an output similar to
OK: junit_mimcore.data.basic.Test_BitArray - set_seventh_bit
OK: junit_mimcore.data.basic.Test_BitArray - immutable_finished_bitarray
OK: junit_mimcore.data.basic.Test_BitArray - set_fourth_bit
OK: junit_mimcore.data.basic.Test_BitArray - set_fifth_bit
OK: junit_mimcore.data.basic.Test_BitArray - first_bit_set
...
...
...
OK: junit_mimcore.data.Test_SurvivalRegimeTruncatingSelection - generation_1_survivors_05
OK: junit_mimcore.data.Test_SurvivalRegimeTruncatingSelection - generation_1_survivors_08
OK: junit_mimcore.data.Test_SurvivalRegimeTruncatingSelection - correct_survivors_three_survivors
OK: junit_mimcore.data.Test_SurvivalRegimeTruncatingSelection - generation_10_survivors_01
Finished testing
0 tests failed
190 tests were OK
In case one (or more) of the unit-tests failed please contact the author immediately.