In this part of the documentation we will demonstrate how to use real genomic data of bacterial organisms to perform simulations of haploid populations with MimicrEE2. First, we will download the genomic sequence of Escherichia coli (in fasta format) and produce random haplotypes compatible to MimicrEE2. Then we will mimic the typical evolutionary trajectory of microbial organisms by allowing changes in the population size, and occurrence of new beneficial or detrimental mutations.
Before we start, download the complete E. coli genome (in Fasta format) following this link. Save the sequence in a file with name Ecoli.fasta.
Next we are going to use a Python 2.7 script in order to generate haplotypes based on this reference sequence.
The Python script can be found here.
You can simply type:
python make_haps_from_fasta.py -h
in order to obtain information regarding the input files and necessary parameters that have to be specified in order to generate the haplotypes.
The output from the command above should be:
Usage: make_mim_from_fast.py [options]
Options:
-h, --help show this help message and exit
--fasta-file=FASTA Input file in fasta format
--haplotypes=HAPS number of haplotypes
We only have to set two parameters: the first is the fasta file with the reference genome (Ecoli.fasta) and the second is the number of haplotypes to be produced. We are going to produce one haploid clone.
To run the script you can simpy type:
python make_haps_from_fasta.py --fasta-file Ecoli.fasta --haplotypes 1 > Ecoli.haps
If everything worked fine, the output should look like this. The ancestral allele (eg. A for the first SNP) will always be identical to the reference genome. The derived allele (eg the G for the first SNP) is picked randomly.
head Ecoli.haps
chromosome 1 A A/G A
chromosome 2 G G/C G
chromosome 3 C C/G C
chromosome 4 T T/C T
chromosome 5 T T/C T
chromosome 6 T T/A T
chromosome 7 T T/G G
chromosome 8 C C/G C
chromosome 9 A A/C A
chromosome 10 T T/G G
...
...
Next, we are going to generate a file containg SNPs with positive or negative selection coefficients. The Python script can be found here.
As before try the -h options to access input requirements:
python make_random_selected.py -h
The output should look like this:
Usage: make_random_selected.py [options]
Options:
-h, --help show this help message and exit
--input=INPUT A file containing haplotypes (MimicrEE input)
--selected-snps=NUM_SEL
Number of SNPs with selection coefficient other than 0
--positive-mutations=POS_PROP
Probability that a mutation will get a positive
selection coefficient
We will use the haplotype file we just generated to pick randomly selected loci. The user can specify the probability that a selected locus will have a positive selection coefficient.
To run the script you can simpy type:
python make_random_selected.py --input Ecoli.haps --selected-snps 5000 --positive-mutation 0.8 > selected_snps
In this case we pick 5000 loci and set the probability to obtain a positive selection coefficient equal to 80% (in the script provided that maximum selection coefficient that can be returned is 0.1 and the minimum 0.001).
The file containing the selected snps should look like this:
head selected_snps
[s]
chromosome 1159426 T/G 0.0185 0.5
chromosome 5153351 T/C 0.0531 0.5
chromosome 2053035 A/G 0.0994 0.5
chromosome 1637589 T/A -0.0502 0.5
chromosome 3395795 G/C 0.0383 0.5
chromosome 227257 C/T 0.0914 0.5
chromosome 1315503 T/G 0.0215 0.5
chromosome 2988907 G/A 0.0249 0.5
chromosome 3305269 G/A 0.0102 0.5
Notice that the last column (dominance effect) is ignored as we are simulating haploid individuals.
The last file we will provide for our simulations will determine the population size during the course of the experiment. Lets call the file population-size
and for our example it looks like this:
cat population-size
1 1
2 5
3 10
4 20
5 28
10 50
30 250
60 400
80 500
Finally we are performing our simulations for 100 generations. 2 biologial replicates are assumed. As we request as output hapotypes (--output-dir), first we have to create a new folder to store the generated data:
mkdir haplotypes_ecoli
java -jar mim2-v204.jar w --haplotypes-g0 Ecoli.haps --population-size population-size --snapshots 100 --replicate-runs 2 --output-sync ecoli.sync --output-dir haplotypes_ecoli --fitness selected_snps --mutation-rate 0.0000025 --haploid --clonal --threads 6
You can easily access the output haplotypes and have a look at the evolved individuals:
gzip -dc haplotypes_ecoli/haplotypes.r1.g100.mimhap.gz| head -5
#sex H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H
chromosome 1 A A/G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G
chromosome 2 G G/A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A
chromosome 3 C C/G C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C
chromosome 4 T T/G T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T