The following section introduces the core functionality of MimcrEE and the used file formats
The haplotypes at generation zero of the simulations, for example:
2L 686891 G G/A AG GG GG GG GG 2L 936681 A A/G GG AA AA AA AA 2L 861026 T A/T TT AT AA AA TT 2L 966618 C T/C TC TC TT TT TT 2R 134298 A A/C AC AC CC CC CC
The file consists of exactly five columns which are separated by a 'tab'!
Note: the popualtion size will be infered from col5, i.e: the population size equals the number of provided genotypes
Note: haplotypes will be inferred from the order of the alleles in the genotpes, i.e.: the first allele always referes to the first haplotype and consequently the second to the second haplotype. For example the first specimen at chromosome 2L has the two haplotypes {{AGTT}} & {{GGTC}}
The inversion frequencies at generation zero of the simulations. The file consists of two parts, the first part is the definition of the inversions and the second part indicate the absence/presence of the inversions for each specimen. For example:
> Inversion definition In2 = 3R:1023334-11454555 InP = 3R:14000232-18000455 In3 = 2L:1200000-1500000 > Population description 1 In2,InP:In2 2 3 In3:InP,In3 4 In2:InP 5 In2,InP:In2,InP 6 In3,In2,InP:In3,In2,InP 8 -:- 9 InP:-
The two parts of the inversion file are initiated by the symbol {{>}}. The first section will always be treated as {{Inversion definition}} and the second as definition of the Population
Definition of the inversions must be in the format: {{name=chr:start-end}} where space will be ignored.
For every specimen the presence of the previously defined inversions need to be provided. If nothing is provided the default of absence of inversions will be assumed. This section consists of two tab-separted columns:
*Note Every inversion used in the {{population description}} must have been previously defined in {{Inversion definition}}. *Note Inversions within inversions are not allowed. Thus overlapping inversions will result in an error. *Note* The ordering of the haplotypes is not considered. For every specimen a single inversion genotype should be specified. However in case multiple definitions are present for the same specimen (eg number 8), only the last definition will be used. A inversion genotype may also be entirely missing, in which case MimicrEE assumes no inversions.
A file of the following format can be obtained from: http://petrov.stanford.edu/cgi-bin/recombination-rates_updateR5.pl
2L:0..100000 0.00 0.00 0.00 2L:100000..200000 0.00 0.00 0.00 2L:200000..300000 0.00 0.00 1.89
Note: Mimicree will only read the recombination rate in the middle of the window and than assume an uniform recombination rate for the window. However, to increase the resolution you may decrease the window size. Note: MimicrEE operates with 1/2 the recombination rate provided in the above file, as males do not recombine. (MimicrEE just operates with hermaphrodites)
This parameter is necessary to accurately simulate the recombination landscape. In nature all chromosomes show random assortment, however to simplify analysis of the data some chromosomes have been artificially split. For example chromosome {{2}} of //D. melanogaster// has been split into {{2L}} and {{2R}}
Chromosome definitions have to be provided in the following format: {{new_chr=chr_part_1+chr_part_2,newchr=chr_part1+chr_part_2,...}} For example: {{2=2L+2R,3=3L+3R}}.
Note: It is essential that you provide the chromosome parts in the correct order. For example {{2=2R+2L}} would be wrong, this would concatenate the telomeric region of {{2R}} with the centromeric region of {{2L}}.
Note: It is possible to provide more than two parts of chromosomes eg: {{3=3L+3M+3R}}
Additive fitness effects of the genotypes of SNPs will be modeled as relative viabilities ({{w}}). Viability can be imagined as the probability of survival. The additive effects are based on the following equations
w_{11} = 1 w_{12} = 1 + hs w_{22} = 1 + s
Where {{s}} is the selection coefficient, {{h}} the heterozygous effect and {{w}} the relative viabilities (of homozygous or heterozygous individuals). This information needs to be provided for non-neutral SNPs in the following format
2L 12 A 0.5 0.5 2L 17 C 1.0 0.0 2L 22 C -0.5 0.5 2R 55 T -0.2 -1.2
Note The selection coeffiecient must be between -1 and +infinite, whereas the heterozygous effect may be larger or smaller than one.
Note The nucleotide provided in {{col3}} must be either the major or the minor allele provided in {{--haplotypes-g0}}. This will be strictly enforced.
Note Per default a selection coefficient of s=0 will be assumed for all SNPs not present in this file
~ h
~ effect
0
A1 is dominant
1
A2 is dominant
-1
underdominance
0<h<1
intermediate effect
2
overdominance
Similar to the additive effects a selection coefficient ({{s}}) needs to be specified for epistatic effects. The following definition only allow to specify synergistic and antagonistic epistasis. In a two-locus interaction this type of epistasis would result in a phenotypic ratio of 9:7, other phenotypic ratios than this can not be modeled.
w_e=1+s
Following an example of an input file for epistatic effects
>e1:-0.2 2L 102 T 2L 117 C >epistatic_effect2:0.9 2R 1024 T 2R 8988 G 3L 7811 A
For every epistatic effect the file has two parts, similar to a fasta sequence. First the definition of the epsistatic effect starting with a {{>}} and second the individual SNPs contributing to the epistatic effect.
The definition line of an epistatic effect has the following syntax {{>name:s}}
The definition of a SNP contributing to an epistatic effect has three columns:
An epistatic effect will be assumed to be present when the alleles of the epistatic SNPs are present at least once in a diploid genome. This epistatic effects are thus assumed to be dominant. Using this definition of epistasis it is possible to model synergistic and antagonistic epistasis.
Number of replicate runs that should be processed in parallele; Memory consumption increases linearly with increasing number of replicate runs. Furthermore setting threads >1 when only a single replicate run has been chosen will not result in an increased processing speed.
May either be:
Fitness is calculated as the result of additive effects and epistatic effects
The additive fitness effect is
w_a=\prod_{i}w_{ai}
The epistatic fitness effect is treated as an additional layer on top of the additive effects
w_e=\prod_{i}w_{ei}
the fitness in total is calculated as
w=w_{a}*w_{e}
The fitness will directly reflect the probability of mating. There is a linear relationship between the probability of mating and the fitness. Self-matings are excluded and all individuals are hermaphrodites, i.e.: they may mate with any other individual.
Note: MimicrEE operates with 1/2 the recombination rate provided in the file. The recombination rate in D.mel is for females. Males are not recombining. In MimicrEE we are just simulating hermaphrodites. The issue of no recombination in males is thus addressed by multiplying the recombination rate by 1/2.
The Kosambi function will be used to calculate 'r' (recombinaton fraction) for a given window. Online resources usually provide the recombination rate in centi-Morgen per million base pairs (cM/Mb). It is thus necessary to translate this recombination fraction per million base pairs into the recombination fraction per window (of the given size). Note that simply dividing the recombination rate per Mb by the number of windows per Mb will not work as this approach does not consider double cross overs. The conversion from 'rMb' to 'rwin' is done in three steps.
the recombination fraction per window is calculated with the mapping function (Kosambi function)
d_{Mb}=\frac{1}{4}*ln( \frac{1+2r_{Mb}}{1-2r_{Mb}})
d_{win}=\frac{d_{Mb}}{c_{win}}
r_{win}=\frac{e^{4d_{win}}-1}{2*(e^{4d_{win}}+1)}
dMb: map distance per mega base pair (Mb)
We tested this approach with simulations. The following table lists the number of recombination events for a window of the given size with the given recombination rate. 10,000 simulations have been performed. Note that 1cM corresponds to 1% of recombinants, which is 100 for 10,000 simulations.
~ win.size
~ rec.rate (cM/Mbp)
~ recombination events
1000k
0
0
1000k
4
403
100k
4
43
1000k
20
2018
100k
20
217
10k
20
19
1000k
40
3968
1000k
50
0! error
Note: max. recombinaton rate is 45 cM/Mbp!