dspchip command line optionsdspchip comes with a number of options to perform signal analysis of a ChIP-seq experiments. These values apply to dspchip 0.8.2
The first signal (A) to analyze, typically the ChIP file
The second signal (B) to analyze, typically a control
The file format for signal A. Accepted format are bam, bw (for bigwig), bed, bar or wig. Default is bam.
The file format for signal B. If not specified it is set equal to format A
The name of the experiment. This will be the prefix for all the output files. Default is default
The analysis pipeline. Pipeline steps is a string made of the following letters: F W G A E N C L S R X V T Z. Each letters mean a step:
Note that L, S, X, V and R reduce the number of signals to one. Default is NFTZ
The windowing function for profile output (that is bedgraph). This can be mean, med or max (for average, median or max). Default is mean.
The FIR window name. This can be flat, hanning, hamming, bartlett, blackman or gauss. Default is flat.
Choose between thresholding approach. Thresholding happens at peak finding time. When thresholding is engaged, peaks won't be searched using canny-like edge detector. Accepted values are otsu and mad. This value can be specified with a keyword global or local to perform thresholding on the whole chromosome or local windows; in order to use this, the keyword should be separated by a comma (i.e. local,otsu). Default value global,otsu.
The file format for profile output. Can be bdg (for gzipped bedgraph) or bw (for bigwig). Default is bw
The name of the wavelet to use. This has to be defined in pywt package (check here). Default is db5
Whether to perform correlation analysis. This will plot a chart of correlation between two signals
Looks for original data range (min, max) and rescales processed data to a similar interval. Useful to avoid meaningless values such as, say, 0.00001231.
The expected size of the feature under investigation. Default 1000
The step size for windowing functions. Default 100
The step size for output profile. Default 100
The ratio between the minimum value and the maximum value in a peak to consider it a single peak or two separate peaks. This essentially checks the depth of the "valley" separating peaks. Default 0.4
A comma separated list of chromosomes to be included in the analysis
A comma separated list of chromosomes to be excluded from the analysis
Whether to consider all the chromosomes in file. This included _random chromosomes which are usually excluded _
When bed, bar or wig file is specified, a chromosome size table is needed to allocate space for each chromosome. bigwig and bam files already contain a chromosome size index. The file has to be in the format
chrN N
where chrN is any chromosome and N is its size. If you have 2bit files from UCSC genome browser you can get the size just issuing
$ twoBitInfo genomefile.2bit chrlen.csv
where genomefile.2bit is unique to each genome build (e.g. hg18.2bit, mm9.2bit)
Whether to call peaks. This is not set by default.
Specify the feature to model for p-value estimation. Can be area, height or length. The peak finding will be looking at that feature to build value distribution. Default is area
The p-value filter for peaks. Default 1.0
Tries to infer the proper distribution fitting on data. Distributions evaluated are: Normal, LogNormal, Gamma, PowerLaw. Most likely will be a Gamma distribution Default hist.
In order to define a max window to define separate peaks, this parameters will set the max distance as K ** expWindow. Default is 0.5. **
When input data is BAM, filter alignments with quality less than the specified threshold. Default is 0.
The ratio for downsample data from signal A. This is useful when data are not balanced. Downsample is performed using an uniform distribution. Works only with interval files (bam, bed). Default is 1.0.
The ratio for downsample data from signal B.
A naive deduplication filter for sorted interval files (bam, bed).
Do not perform signal analysis.
Do not output a profile output. Useful when looking for peaks only.
Save numpy arrays for each step. For debugging purposes.
Save data statistics. For debugging purposes. This is likely to be removed in the future.
View and moderate all "wiki Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Wiki"
Originally posted by: Vladimir...@gmail.com
My code to get the chromosome sizes:
-Tao