Home
Name Modified Size InfoDownloads / Week
sim-ped103.pl 2024-05-22 7.8 kB
README.md 2024-05-22 3.9 kB
Totals: 2 Items   11.6 kB 0

Sim-Ped 1.03

A software to estimate the expected proportion of homozygous and heterozygous mutations in pedigrees.

Why using sim-ped

In populations of diploid species, mutations that emerge de novo can be randomly lost before becoming fixed. In mutation accumulation (MA) experimental designs aimed at investigating the rate of new mutations this poses a problem, since a fraction of the mutations that arose in the ancestors of the sequenced samples may be lost, leading to an underestmation of mutation rates.

The software Sim-Ped allows estimation of the expected proportion of mutations that are inherited by a sample of sequenced individuals, including estimates for both homozygous and heterozygous mutations, provided their pedigree.

How it works

At the core of sim-ped, a gene-dropping approach is used. This involves the random appearance of a mutation in one individual of the pedigree and its segregation following the rules of Mendelian inheritance. As the mutation segregates, it may become lost, fixed, or continue segregating. At the end of the simulation process, the state of the mutation is evaluated in a census of individuals of interest. These individuals represent represent a sample of genomes that have been sequenced and where actual mutations may be detected.

Since the simulation process is stochastic, it is repeated through several iterations. This way, when the mean proportion of homozygous and heterozygous mutations is calculated, it provides a good estimate of the their expected values, conditional to the pedigree used.

How to execute sim-ped

Run the script from the GNU/Linux command line terminal using perl (v5.30.0):

perl sim-pedv103.pl -pedfile <pedigree file> -cfile <census file> -reps <n_mut> [ -allow_clusters ]

Mandatory arguments:

  • <pedigree file>: Tab-separated values (TSV) pedigree file with columns for mother ID, father ID and individual ID in that order, followed by three columns of zeros. All six columns must include a header.

  • <census file>: Text file with a space-separated list of focal individuals that are used to detect mutations.

  • <n_iter>: Number of mutations to simulate (i.e. number of gene-dropping iterations to run).

Optional argument:

  • -allow_clusters: enables inheritance of new mutations generated in a parent germline by multiple offspring.

Understanding the output

The output from sim-ped is generated directly into the standard output of the command line terminal. There are two main lines of output:

mutations <n_mut> n_single_het <n_het> n_single_hom <n_hom> n_multiple <n_mul>
p(single_het) <p_het> p(single_hom) <p_hom> p(multiple) <p_mul>

Where:

  • <n_het> and p_het: are the number and proportion of heterozygous mutations recovered uniquely in one of the censused individuals.
  • <n_hom> and p_hom: are the number and proportion of homozygous mutations recovered uniquely in one of the censused individuals.
  • <n_mul> and p_mul: are the number and proportion of mutations recovered in more than one of the censused individuals.

Citation

If you use sim-ped.pl in your research, please cite:

  • REFERENCE

License

Copyright (C) 2023 Peter Keightley (University of Edinburgh, UK).

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Source: README.md, updated 2024-05-22