ChIP-BIT2
===========
Different from traditional peak callers that predict strong peaks only, ChIP-BIT2 detects both strong and weak binding sites at gene promoters, enhancers or the whole genome using a pair of sample and input ChIP-seq profiles.
ChIP-BIT2 is an extended version of the ChIP-BIT method; the method of ChIP-BIT is designed mainly for detecting binding events in promoter regions as described in the following paper:
Xi Chen, Jin-Gyoung Jung, Ayesha N. Shajahan-Haq, Robert Clarke, Ie-Ming Shih, Yue Wang, Luca Magnani, Tian-Li Wang, Jianhua Xuan. "ChIP-BIT: Bayesian inference of target genes using a novel joint probabilistic model of ChIP-seq profiles", Nucleic Acids Res (2016) 44 (7): e65.
This ChIP-BIT2 package is implemented in C/C++ and it extends the capability of ChIP-BIT to identify ChIP-seq peaks of any size at any genomic locations, for either transcription factors or histone proteins. Data preprocessing of the ChIP-BIT2 package is developed based on PeakSeq V1.31, a referenced of which can be found in the following:
Rozowsky J, Euskirchen G, Auerbach R, Zhang Z, Gibson T, Bjornson R, Carriero N, Snyder M, Gerstein M. "PeakSeq Systematic Scoring of ChIP-Seq Experiments Relative to Controls", Nature Biotechnology 27, 66 - 75 (2009).
We want to thank developers of PeakSeq for their excellent work on ChIP-seq data preprocessing.
Installation
=============
Type 'make' to build the ChIP-BIT2 executable file. Note that ChIP-BIT2 will be built under the current directory.
Data preprocessing
=============
Sample and input ChIP-seq data can be downloaded from ENCODE data portal or GEO database. ChIP-BIT2 extracts mapped read locations from each ChIP-seq profile for fast read access during peak calling. This processing step can be done by running the following commands:
>> mkdir Sample_reads
>> ChIP-BIT2 -dumpreads SAM Sample.sam Sample_reads
>> mkdir Input_reads
>> ChIP-BIT2 -dumpreads SAM Input.sam Input_reads
Note, if only BAM format ChIP-seq data are available, users will need to convert BAM files to SAM files using the SAMtools, and then pipe SAM format reads into ChIP-BIT2 for read location extraction. SAMtools can be downloaded from http://www.htslib.org/ and a recent release was provided together with ChIP-BIT2.
>> samtools view Sample.bam | ChIP-BIT2 -dumpreads SAM stdin Sample_reads
>> samtools view Input.bam | ChIP-BIT2 -dumpreads SAM stdin Input_reads
Peak Calling
=============
ChIP-BIT2 has three different running modes as 'promoter | -enhancer | -WG'. Users should specify the running mode before calling peaks. As some DNA-proteins have different binding tenddency to proximal promoters or distal enhancers, we recommend users to check the biology sense of each protein before running a specific mode. For general application, ChIP-BIT2 can detect peaks from the whole genome directly. ChIP-BIT2 uses a sliding window and is able to detect peaks of any size.
./ChIP-BIT2 -callpeaks -promoter | -enhancer | -WG (this must be specified right after -callpeaks option)
-n Experiment_name
-t Sample ChIP seq file after preprocessing
-c Input ChIP seq file after preprocessing
-m mappability file
-a promoter or enhancer annotation file (for gene annotation file, ChIP-BIT can also accept transcript annotation file with duplicated gene symbols)
-s (optional)searching scale around TSS or center of enhancer, default +-10k around TSS and +-1k around enhancer center
-w (optional)partition window size, default 200 bps
-p (optional)probability threshold, default 0.9
-EM (optional)Number of EM iterations, default 100
'Promoter' mode command example
=============
./ChIP-BIT2 -callpeaks -promoter -n Peaks -t Sample_reads -c Input_reads -m hg19_mappability_20k.txt -a hg19_RefSeq.txt -s 10000 -w 200 -p 0.9 -EM 100
'Enhancer' mode command example
=============
./ChIP-BIT2 -callpeaks -enhancer -n Peaks -t Sample_reads -c Input_reads -m hg19_mappability_20k.txt -a MCF7_enhancer_like_regions.txt -s 1000 -w 200 -p 0.9 -EM 100
'WG' mode command example
=============
./ChIP-BIT2 -callpeaks -WG -n Peaks -t Sample_reads -c Input_reads -m hg19_mappability_20k.txt -w 200 -p 0.9 -EM 100
Output
=============
ChIP-BIT2 finally will output all peaks in "ChIP-BIT_peaks.txt". Under '-promoter' mode, the most likely target gene of each peak is provided.
Running ChIP-BIT2 using the preprocessed demo sample and input ChIP-seq files under Linux envrironment
=============
>> ./ChIP-BIT2 -callpeaks -WG -n Peaks -t Demo_sample_reads -c Demo_input_reads -m hg19_mappability_20k.txt -w 200 -p 0.9 -EM 100
Experimental ID: Peaks
Sample ChIP-seq read path: Sample_reads
Input ChIP-seq read path: Input_reads
Mappability map file: hg19_mappability_20k.txt
Detect peaks from whole genome
Window size: 200.0 bps
Posterior probability threshold: 0.90
EM iterations: 100 rounds
processing Chr1...
ChIP-BIT process starts!
......
Number of candidate ChIP-seq regions: 24451
......
Detect peaks from whole genome!
......
Calculate read intensity of each window!
......
Number of candidate windows: 26580
......
......
EM iteraction to estimate parameters and posterior probability of each window!
Prior probability of 'foreground' TFBS: 0.538691
Prior probability of 'background' events: 0.461309
TFBS_mean: 2.83549
TFBS_variance: 0.217784
Background_mean: 2.12993
Background_variance: 0.425577
Probability_threshold: 0.9
......
Write ChIP-BIT results to file!
......
Merge 2311 significant ChIP-BIT windows to peaks!
......
1760 significant peaks!
......
Done!
Any feedback, comments and questions are always welcome. Please address them to Xi Chen (xichen86@vt.edu).