Menu

Example of microbial evolution

Robert Kofler Christos Vlachos

Introduction

In this part of the documentation we will demonstrate how to use real genomic data of bacterial organisms to perform simulations of haploid populations with MimicrEE2. First, we will download the genomic sequence of Escherichia coli (in fasta format) and produce random haplotypes compatible to MimicrEE2. Then we will mimic the typical evolutionary trajectory of microbial organisms by allowing changes in the population size, and occurrence of new beneficial or detrimental mutations.

Requirements

Before we start, download the complete E. coli genome (in Fasta format) following this link. Save the sequence in a file with name Ecoli.fasta.

Next we are going to use a Python 2.7 script in order to generate haplotypes based on this reference sequence.
The Python script can be found here.

Materials and Methods

Running the script

You can simply type:

python make_haps_from_fasta.py -h

in order to obtain information regarding the input files and necessary parameters that have to be specified in order to generate the haplotypes.
The output from the command above should be:

Usage: make_mim_from_fast.py [options]

Options:
  -h, --help          show this help message and exit
  --fasta-file=FASTA  Input file in fasta format
  --haplotypes=HAPS   number of haplotypes

We only have to set two parameters: the first is the fasta file with the reference genome (Ecoli.fasta) and the second is the number of haplotypes to be produced. We are going to produce one haploid clone.

To run the script you can simpy type:

python make_haps_from_fasta.py --fasta-file Ecoli.fasta --haplotypes 1 > Ecoli.haps

If everything worked fine, the output should look like this. The ancestral allele (eg. A for the first SNP) will always be identical to the reference genome. The derived allele (eg the G for the first SNP) is picked randomly.

head Ecoli.haps
chromosome  1   A   A/G A 
chromosome  2   G   G/C G 
chromosome  3   C   C/G C 
chromosome  4   T   T/C T 
chromosome  5   T   T/C T 
chromosome  6   T   T/A T 
chromosome  7   T   T/G G 
chromosome  8   C   C/G C 
chromosome  9   A   A/C A 
chromosome  10  T   T/G G 
...
...

Next, we are going to generate a file containg SNPs with positive or negative selection coefficients. The Python script can be found here.

As before try the -h options to access input requirements:

python make_random_selected.py -h

The output should look like this:

Usage: make_random_selected.py [options]

Options:
  -h, --help            show this help message and exit
  --input=INPUT         A file containing haplotypes (MimicrEE input)
  --selected-snps=NUM_SEL
                        Number of SNPs with selection coefficient other than 0
  --positive-mutations=POS_PROP
                        Probability that a mutation will get a positive
                        selection coefficient

We will use the haplotype file we just generated to pick randomly selected loci. The user can specify the probability that a selected locus will have a positive selection coefficient.

To run the script you can simpy type:

python make_random_selected.py --input Ecoli.haps --selected-snps 5000 --positive-mutation 0.8 > selected_snps

In this case we pick 5000 loci and set the probability to obtain a positive selection coefficient equal to 80% (in the script provided that maximum selection coefficient that can be returned is 0.1 and the minimum 0.001).

The file containing the selected snps should look like this:

head selected_snps
[s]
chromosome  1159426 T/G 0.0185  0.5
chromosome  5153351 T/C 0.0531  0.5
chromosome  2053035 A/G 0.0994  0.5
chromosome  1637589 T/A -0.0502 0.5
chromosome  3395795 G/C 0.0383  0.5
chromosome  227257  C/T 0.0914  0.5
chromosome  1315503 T/G 0.0215  0.5
chromosome  2988907 G/A 0.0249  0.5
chromosome  3305269 G/A 0.0102  0.5

Notice that the last column (dominance effect) is ignored as we are simulating haploid individuals.

The last file we will provide for our simulations will determine the population size during the course of the experiment. Lets call the file population-size
and for our example it looks like this:

cat population-size
1   1
2   5
3   10
4   20
5   28
10  50  
30  250
60  400
80  500

Finally we are performing our simulations for 100 generations. 2 biologial replicates are assumed. As we request as output hapotypes (--output-dir), first we have to create a new folder to store the generated data:

mkdir haplotypes_ecoli
java -jar mim2-v204.jar w --haplotypes-g0 Ecoli.haps --population-size population-size --snapshots 100 --replicate-runs 2 --output-sync ecoli.sync --output-dir haplotypes_ecoli --fitness selected_snps --mutation-rate 0.0000025 --haploid --clonal --threads 6

Results

You can easily access the output haplotypes and have a look at the evolved individuals:

gzip -dc haplotypes_ecoli/haplotypes.r1.g100.mimhap.gz| head -5
#sex
chromosome
chromosome
chromosome
chromosome

Related

Wiki: Home