Menu

GeneratingSimpleHaplotypes

Robert Kofler Christos Vlachos

1 Introduction

We provide some Python scripts that allow generating very simple haplotypes that could be used as input for MimicrEE2.
For simulating more realistic haplotypes that take the actual recombination rate and complex demographic scenarios into account we refer to the next tutorial [GeneratingHaplotypesWithFastsimcoal].
Alternatively it is also possible to use real haplotypes that have been inferred from sequencing multiple individuals of a population [UsingExistingHaplotypesEgDGRP]

2 Very simple haplotypes

First download the script: https://sourceforge.net/projects/mimicree2/files/scripts/simple-haplotype.py/download
This script allows to simulate the genomes for a population consisting of two haplotypes. All SNPs are completely linked.

The script accepts the following parameters

  • --N population size
  • --S sites (number of SNPs)
  • --chr-name the name of the chromosome
  • --chr-len the length of the chromosome
  • --freq the frequency of haplotype derived allele; the ancestral allele will have the frequency= 1.0-freq
  • --haploid simulate haploids
  • --hardy-weinberg simulate the diploids in Hardy-Weinberg equilibrium

diploids

Diploid genomes may be obtained with the following command

python simple-haplotype.py --N 20 --S 10 --chr-name 2L --chr-len 1000 --freq 0.3 > haplotypes.mimhap

we obtain the following haploytpes

2L  218 G   G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA
2L  286 G   G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA
2L  343 G   G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA
2L  368 G   G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA
2L  375 G   G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA
2L  612 G   G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA
2L  619 G   G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA
2L  847 G   G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA
2L  888 G   G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA
2L  954 G   G/A GG GG GG GG GG GG GG GG GG GG GG GG GG GG AA AA AA AA AA AA

haploids

Haploid genomes can be simulated with the following command

python simple-haplotype.py --N 20 --S 10 --chr-name 2L --chr-len 1000 --freq 0.3 --haploid > haplotypes.mimhap

An example output:

2L  13  G   G/A G G G G G G G G G G G G G G A A A A A A
2L  98  G   G/A G G G G G G G G G G G G G G A A A A A A
2L  135 G   G/A G G G G G G G G G G G G G G A A A A A A
2L  154 G   G/A G G G G G G G G G G G G G G A A A A A A
2L  354 G   G/A G G G G G G G G G G G G G G A A A A A A
2L  490 G   G/A G G G G G G G G G G G G G G A A A A A A
2L  498 G   G/A G G G G G G G G G G G G G G A A A A A A
2L  548 G   G/A G G G G G G G G G G G G G G A A A A A A
2L  587 G   G/A G G G G G G G G G G G G G G A A A A A A
2L  695 G   G/A G G G G G G G G G G G G G G A A A A A A

diploids in Hardy-Weinberg equilibrium

Finally it is possible to simulate diploid haplotypes in Hardy-Weinberg equilibrium with the following command.

python simple-haplotype.py --N 20 --S 10 --chr-name 2L --chr-len 1000 --freq 0.5 --hardy-weinberg > haplotypes.mimhap

which for example yields:

2L  79  G   G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA
2L  190 G   G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA
2L  215 G   G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA
2L  239 G   G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA
2L  342 G   G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA
2L  455 G   G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA
2L  610 G   G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA
2L  677 G   G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA
2L  874 G   G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA
2L  953 G   G/A GG GG GG GG GG GA GA GA GA GA GA GA GA GA GA AA AA AA AA AA

3 Haplotypes based on a given SFS

We provide a script that generated haplotypes based on a site frequency spectrum (sfs).
The alleles are distributed randomly among haplotypes thus the SNPs are mostly in linkage equilibrium (except for random LD). In case more realistic haplotypes taking population history the recombination rate and many other parameters into account we refer to the next tutorial [GeneratingHaplotypesWithFastsimcoal]

The script accepts the following parameters

  • --N population size
  • --S sites (number of SNPs)
  • --chr-name the name of the chromosome
  • --chr-len the length of the chromosome
  • --haploid simulate haploids
  • --sfs the site frequency spectrum; for example possible arguments are "10,8,7,5,1,1,1,4" or "1,1,1,1,10" or "1,10"

Download the script here https://sourceforge.net/projects/mimicree2/files/scripts/sfs-haplotype.py/download

sfs

The --sfs is a comma separated list. Each value (integer or float) represents the relative abundance of a site population frequency class.

This is best explained using an example:
Assuming --sfs "10,1,1,1" the population frequency of the SNPs is provided for 4 different frequency classes:

  • 0.00 - 0.25: with a relative abundance of 10 most SNPs (10 out of 13; where 13=10+1+1+1) will be in this frequency class
  • 0.25 - 0.50: with a relative abundance of 1 only 1/13 SNPs will be in this class
  • 0.50 - 0.75: with a relative abundance of 1 only 1/13 SNPs will be in this category
  • 0.75 - 1.0: with a relative abundance of 1 only 1/13 SNPs will be in this category

example equal distribution

When only a single frequency class is provided each population frequency has an equal probability (uniform distriubtion of population frequencies).
For example given the command:

python sfs-haplotype.py --chr-name 2L --chr-len 1000 --S 10 --N 10 --sfs 1

Note --sfs 1 specifies a single frequency class: 0.0 - 1.0; Within the class each population frequency has an equal probability of being randomly picked.

We obtain the following output:

2L  62  G   G/A AG GG GA GA GA GA GG GG GG AA
2L  118 G   G/A AG AA AA AA GG AG AG GA GA GA
2L  126 G   G/A AA GA AG AA GA AA AA AA AA AA
2L  224 G   G/A GA GG GG GG GG GG GA GG GG AG
2L  256 G   G/A AA GA AA GG GA GA GA AG GA GA
2L  508 G   G/A AA AA GG AA GA AA AA AA AG AA
2L  693 G   G/A GG GG GG GG GG GG GG GG GA GG
2L  758 G   G/A GG AG GG GA GA GG GG GA GG GG
2L  780 G   G/A GA GA AG GG GG AA AA AG AG GG
2L  862 G   G/A GG GA GG GG GG GG GG GG GG GG

example skewed sfs

We may obtain a sfs that follows the neutral expectations 1/x
using a command like the following

python sfs-haplotype.py --chr-name 2L --chr-len 1000 --S 10 --N 10 --sfs 10,5,3.3,2.5,2,1.67,1.43,1.25,1.1,1

which for example gives:

2L  31  G   G/A AA AG GA GA GA GG GA GG GG GG
2L  274 G   G/A GG GG GG GG GA GG GA AG GG GA
2L  310 G   G/A AG GG GG GA AG AG GG GG GG AG
2L  517 G   G/A GG GG GG GG GG GG AG GG GG GA
2L  631 G   G/A GG GG GG GG AG GG AG GG AA GG
2L  752 G   G/A AG GG GA AG AG GG GA GG AA GA
2L  764 G   G/A GG GG AG GG GG GG AG AG GG GG
2L  770 G   G/A GG GG GG GG AA GG GG GG GG GA
2L  914 G   G/A GA AG AA AG AA AG GG AG AA GA
2L  955 G   G/A GG GG GG GG GG GG GG GG GG GA

example skewed sfs for haploids

This script also works for haploids
Given the command:

python sfs-haplotype.py --chr-name 2L --chr-len 1000 --S 10 --N 10 --sfs 10,3,2,1,1,1 --haploid

We for example obtain

2L  146 G   G/A A A A G G G A G G A
2L  156 G   G/A G G G G A G G G G G
2L  177 G   G/A G G G G A G G G G A
2L  305 G   G/A G A G G G G G G G G
2L  320 G   G/A G G G G G G G G A G
2L  755 G   G/A G G G G G A G G G G
2L  816 G   G/A G G G G G A A G G G
2L  867 G   G/A A G G G G G G G G G
2L  894 G   G/A G G G G G G G G A A
2L  901 G   G/A G G G G G A G G G G

Related

Wiki: GeneratingHaplotypesWithFastsimcoal
Wiki: Home
Wiki: UsingExistingHaplotypesEgDGRP

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.