Home
Name Modified Size InfoDownloads / Week
fitGCP-VM.zip 2013-03-27 1.0 GB
README.txt 2013-03-26 3.8 kB
Totals: 2 Items   1.0 GB 0
Help on python script fitGCP.py

Fits mixtures of probability distributions to genome coverage profiles using an
EM-like iterative algorithm.

The script uses a SAM file as input and parses the mapping information and
creates a Genome Coverage Profile (GCP). The GCP is written to a file, such that
this step can be skipped the next time.
The user provides a mixture model that is fitted to the GCP. Furthermore, the
user may specify initial parameters for each model.

As output, the script generates a text file containing the final set of fit
parameters and additional information about the fitting process. A log file
contains the the current set of parameters in each step of the iteration. If
requested, a plot of the GCP and the fitted distributions can be created.

REQUIREMENTS:
-------------------------------------------------------------------------------
Python 2.7
Python packages numpy, scipy, pysam


USAGE:
-------------------------------------------------------------------------------
fitGCP runs on the command line. The following command describes the general
structure:

python fitGCP.py [options] NAME

fitGCP fits mixtures of probability distributions to genome coverage profiles using
an EM-like iterative algorithm.

The script uses a SAM file as input and parses the mapping information and
creates a Genome Coverage Profile (GCP). The GCP is written to a file, such that
this step can be skipped the next time.
The user provides a mixture model that is fitted to the GCP. Furthermore, the
user may specify initial parameters for each model.

As output, the script generates a text file containing the final set of fit
parameters and additional information about the fitting process. A log file
contains the the current set of parameters in each step of the iteration. If
requested, a plot of the GCP and the fitted distributions can be created.

Parameters:
NAME: Name of SAM file to analyze.


Options:
  -h, --help            show this help message and exit
  -d DIST, --distributions=DIST
                        Distributions to fit. z->zero; n: nbinom (MOM); N:
                        nbinom (MLE); p:binom; t: tail. Default: zn
  -i STEPS, --iterations=STEPS
                        Maximum number of iterations. Default: 50
  -t THR, --threshold=THR
                        Set the convergence threshold for the iteration. Stop
                        if the change between two iterations is less than THR.
                        Default: 0.01
  -c CUTOFF, --cutoff=CUTOFF
                        Specifies a coverage cutoff quantile such that only
                        coverage values below this quantile are considered.
                        Default: 0.95
  -p, --plot            Create a plot of the fitted mixture model. Default:
                        False
  -m MEAN, --means=MEAN
                        Specifies the initial values for the mean of each
                        Poisson or Negative Binomial distribution. Usage: -m
                        12.4 -m 16.1 will specify the means for the first two
                        non-zero/tail distributions. The default is calculated
                        from the data.
  -a ALPHA, --alpha=ALPHA
                        Specifies the initial values for the proportion alpha
                        of each distribution. Usage: For three distributions
                        -a 0.3 -a 0.3 specifies the proportions 0.3, 0.3 and
                        0.4. The default is equal proportions for all
                        distributions.
  -l, --log             Enable logging. Default: False
  --view                Only view the GCP. Do not fit any distribution.
                        Respects cutoff (-c). Default: False
Source: README.txt, updated 2013-03-26