We provide some Python scripts that allow generating very simple haplotypes that could be used as input for MimicrEE2.
For simulating more realistic haplotypes that take the actual recombination rate and complex demographic scenarios into account we refer to the next tutorial [GeneratingHaplotypesWithFastsimcoal].
Alternatively it is also possible to use real haplotypes that have been inferred from sequencing multiple individuals of a population [UsingExistingHaplotypesEgDGRP]
First download the script: https://sourceforge.net/projects/mimicree2/files/scripts/simple-haplotype.py/download
This script allows to simulate the genomes for a population consisting of two haplotypes. All SNPs are completely linked.
The script accepts the following parameters
Diploid genomes may be obtained with the following command
python simple-haplotype.py --N 20 --S 10 --chr-name 2L --chr-len 1000 --freq 0.3 > haplotypes.mimhap
we obtain the following haploytpes
2L 218 G G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA 2L 286 G G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA 2L 343 G G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA 2L 368 G G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA 2L 375 G G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA 2L 612 G G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA 2L 619 G G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA 2L 847 G G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA 2L 888 G G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA 2L 954 G G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA
Haploid genomes can be simulated with the following command
python simple-haplotype.py --N 20 --S 10 --chr-name 2L --chr-len 1000 --freq 0.3 --haploid > haplotypes.mimhap
An example output:
2L 13 G G/A G G G G G G G G G G G G G G A A A A A A 2L 98 G G/A G G G G G G G G G G G G G G A A A A A A 2L 135 G G/A G G G G G G G G G G G G G G A A A A A A 2L 154 G G/A G G G G G G G G G G G G G G A A A A A A 2L 354 G G/A G G G G G G G G G G G G G G A A A A A A 2L 490 G G/A G G G G G G G G G G G G G G A A A A A A 2L 498 G G/A G G G G G G G G G G G G G G A A A A A A 2L 548 G G/A G G G G G G G G G G G G G G A A A A A A 2L 587 G G/A G G G G G G G G G G G G G G A A A A A A 2L 695 G G/A G G G G G G G G G G G G G G A A A A A A
Finally it is possible to simulate diploid haplotypes in Hardy-Weinberg equilibrium with the following command.
python simple-haplotype.py --N 20 --S 10 --chr-name 2L --chr-len 1000 --freq 0.5 --hardy-weinberg > haplotypes.mimhap
which for example yields:
2L 79 G G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA 2L 190 G G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA 2L 215 G G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA 2L 239 G G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA 2L 342 G G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA 2L 455 G G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA 2L 610 G G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA 2L 677 G G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA 2L 874 G G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA 2L 953 G G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA
We provide a script that generated haplotypes based on a site frequency spectrum (sfs).
The alleles are distributed randomly among haplotypes thus the SNPs are mostly in linkage equilibrium (except for random LD). In case more realistic haplotypes taking population history the recombination rate and many other parameters into account we refer to the next tutorial [GeneratingHaplotypesWithFastsimcoal]
The script accepts the following parameters
Download the script here https://sourceforge.net/projects/mimicree2/files/scripts/sfs-haplotype.py/download
The --sfs is a comma separated list. Each value (integer or float) represents the relative abundance of a site population frequency class.
This is best explained using an example:
Assuming --sfs "10,1,1,1" the population frequency of the SNPs is provided for 4 different frequency classes:
When only a single frequency class is provided each population frequency has an equal probability (uniform distriubtion of population frequencies).
For example given the command:
python sfs-haplotype.py --chr-name 2L --chr-len 1000 --S 10 --N 10 --sfs 1
Note --sfs 1 specifies a single frequency class: 0.0 - 1.0; Within the class each population frequency has an equal probability of being randomly picked.
We obtain the following output:
2L 62 G G/A AG GG GA GA GA GA GG GG GG AA 2L 118 G G/A AG AA AA AA GG AG AG GA GA GA 2L 126 G G/A AA GA AG AA GA AA AA AA AA AA 2L 224 G G/A GA GG GG GG GG GG GA GG GG AG 2L 256 G G/A AA GA AA GG GA GA GA AG GA GA 2L 508 G G/A AA AA GG AA GA AA AA AA AG AA 2L 693 G G/A GG GG GG GG GG GG GG GG GA GG 2L 758 G G/A GG AG GG GA GA GG GG GA GG GG 2L 780 G G/A GA GA AG GG GG AA AA AG AG GG 2L 862 G G/A GG GA GG GG GG GG GG GG GG GG
We may obtain a sfs that follows the neutral expectations 1/x
using a command like the following
python sfs-haplotype.py --chr-name 2L --chr-len 1000 --S 10 --N 10 --sfs 10,5,3.3,2.5,2,1.67,1.43,1.25,1.1,1
which for example gives:
2L 31 G G/A AA AG GA GA GA GG GA GG GG GG 2L 274 G G/A GG GG GG GG GA GG GA AG GG GA 2L 310 G G/A AG GG GG GA AG AG GG GG GG AG 2L 517 G G/A GG GG GG GG GG GG AG GG GG GA 2L 631 G G/A GG GG GG GG AG GG AG GG AA GG 2L 752 G G/A AG GG GA AG AG GG GA GG AA GA 2L 764 G G/A GG GG AG GG GG GG AG AG GG GG 2L 770 G G/A GG GG GG GG AA GG GG GG GG GA 2L 914 G G/A GA AG AA AG AA AG GG AG AA GA 2L 955 G G/A GG GG GG GG GG GG GG GG GG GA
This script also works for haploids
Given the command:
python sfs-haplotype.py --chr-name 2L --chr-len 1000 --S 10 --N 10 --sfs 10,3,2,1,1,1 --haploid
We for example obtain
2L 146 G G/A A A A G G G A G G A 2L 156 G G/A G G G G A G G G G G 2L 177 G G/A G G G G A G G G G A 2L 305 G G/A G A G G G G G G G G 2L 320 G G/A G G G G G G G G A G 2L 755 G G/A G G G G G A G G G G 2L 816 G G/A G G G G G A A G G G 2L 867 G G/A A G G G G G G G G G 2L 894 G G/A G G G G G G G G A A 2L 901 G G/A G G G G G A G G G G
Wiki: GeneratingHaplotypesWithFastsimcoal
Wiki: Home
Wiki: UsingExistingHaplotypesEgDGRP