ChIP-Seq Code

Brought to you by: gio31415

Tree [b96f3e] master /

History

HTTPS access

File	Date	Author	Commit
data	2018-12-03	Giovanna Ambrosini	[6ac7a9] deleted: .README.swp
doc	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
share	2019-02-15	Giovanna Ambrosini	[305e20] modified: Makefile
tools	2019-02-15	Giovanna Ambrosini	[305e20] modified: Makefile
COPYING	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
ChIP-Seq-logo.png	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
INSTALL	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
Makefile	2019-02-21	Giovanna Ambrosini	[b96f3e] ChIP-seq version 1.5.5
README	2019-02-15	Giovanna Ambrosini	[305e20] modified: Makefile
README_man	2019-02-15	Giovanna Ambrosini	[305e20] modified: Makefile
chipcenter.1.gz	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
chipcenter.c	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
chipcor.1.gz	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
chipcor.c	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
chipextract.1.gz	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
chipextract.c	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
chippart.1.gz	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
chippart.c	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
chippeak.1.gz	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
chippeak.c	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
chipscore.1.gz	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
chipscore.c	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
chipseq.spec	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
debug.h	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
hashtable.c	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
hashtable.h	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
header_bkgd.png	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
index.html	2018-10-08	Giovanna Ambrosini	[c905f6] modified: .README.swp
sflogo.png	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
sib_body_bkgd.png	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
style.css	2018-10-08	Giovanna Ambrosini	[5262d5] Initial commit
version.h	2019-02-15	Giovanna Ambrosini	[305e20] modified: Makefile

Read Me

ChIP-Seq
============================================================================
The ChIP-seq software provides methods for the analysis of ChIP-seq data and
other types of mass genome annotation data.


Description
============================================================================

DNA sequencing has recently been pushed to a new era with the development of massively parallel sequencing technologies. Chromatin Immuno Precipitation (ChIP) allows the enrichment of genomic DNA fragments based on their interaction with specific proteins. In combination with high-throughput sequencing (ChIP-seq) of these fragments, the technique generates millions of short sequence reads (generally 30 to 50 bp in length) that are subsequently mapped back to the reference genome.
The ChIP-seq protocol generates thereby a comprehensive definition of genomic loci sharing a common binding site or a particular epigenetic modification.
The exploitation of such high-throughput experiments calls consequently for the development of new computational tools for handling ChIP-Seq data as well as other types of next generation sequencing (NGS) data.

We propose a set of useful tools performing common ChIP-Seq data analysis tasks, including positional correlation analysis, peak detection, and genome partitioning into signal-rich and signal-poor regions. 

These tools exist as stand-alone programs and perform the following tasks: 

	1. Positional correlation and generation of an aggregation plot for two genomic features (chipcor); 
	2. Extraction of specific genome annotation features around reference genomic anchor points (chipextract); 
	3. Read shifting (chipcenter);
	4. Narrow peak caller that uses a fixed width peak size (chippeak); 
	5. Broad peak caller algorithm used for broad regions of enrichment (i.e. histone marks) (chippart);
        6. Feature selection tool based on a read count threshold (chipscore).

The C programs are primarily optimized for speed. 
For this reason, they use their own compact format for ChIP-Seq data representation called SGA (Simplified Genome Annotation). 
SGA is a single-line-oriented and tab-delimited format, very similar to BED, with the following five obligatory fields:

	1. Sequence name/ID (Char String), 
	2. Feature (Char String),
	3. Sequence Position (Integer), 
	4. Strand (+/- or 0),
	5. Read Counts (Integer).

Additional fields may be added containing application-specific information used by other programs.
In the case of ChIP-seq data, SGA files represent genome-wide read count distributions from one or several experiments.
The 'feature' field (identified by field 2) contains a short code which identifies an experiment. 
It often corresponds to the name of the molecular target of a ChIP-seq experiment.
Sequences are identified by NCBI/RefSeq chromosome IDs, which are assembly specific in order to prevent mixing of different assemblies.
The position field (field 3) represents the start position of the sequence read. 
The strand field indicates the strand to which the feature has been mapped.
Read counts represent the number of sequence reads that have been mapped to a specific position in the genome.

Input features may be ChIP-seq read positions, peaks identified by ChIP-peak, or any type of genome annotation that can be mapped to a single base on a chromosome.

An example of SGA-formatted file is shown here below:

NC_000001.9     H3K4me3 4794    +       1
NC_000001.9     H3K4me3 6090    +       1
NC_000001.9     H3K4me3 6099    +       1
NC_000001.9     H3K4me3 6655    +       1
NC_000001.9     H3K4me3 18453   -       1
NC_000001.9     H3K4me3 19285   +       1
NC_000001.9     H3K4me3 44529   +       1
NC_000001.9     H3K4me3 46333   +       1
NC_000001.9     H3K4me3 46349   -       1
NC_000001.9     H3K4me3 52929   +       1
NC_000001.9     H3K4me3 59412   +       1

...
	
Chip-Seq programs require SGA intput files to be sorted by sequence name, position, and strand. 
In the UNIX environment, the command to properly sort SGA files is the following:

    sort -s -k1,1 -k3,3n -k4,4 <SGA file>
 
SGA is a generic format can be used to represent a large variety of genome annotations, e.g. the location of transcription start sites (TSS), matches to consensus sequences, or sequence conservation scores.
Orientation-less features will be associated with a strand value of 0.

An example of use of the ChIP-Seq correlation tool (chipcor) is the following:

    chipcor -A "H3K4me3 +" -B "H3K4me3 -" -b -1000 -e 1000 -w 1 -c 20 -n 1 H3K4me3.sga > H3K4me3_fc_n1.out

Where 'H3K4me3.sga' is the file containing the ChIP-Seq sequence read distribution, which correspond to the H3K4me3 histon modification data. 
The '-c' option specifies the cut-off in input counts.

Reads corresponding to histone modifications along the positive strand (option '-A "H3K4me3 +"') are correlated with reads corresponding to the same histone modification pattern on the opposite strand (option '-B "H3K4me3 -"'), and their relative distances are distributed in a histogram within the range [- 1000; + 1000] (options:  '-b -1000', '-e 1000'). 

The output file (H3K4me3_fc_n1.out) contains all histogram entries in simple text format.
Histogram entries show count density values (option '-n 1') of the target feature at relative distances to the reference features, namely all bin entries are normalized by the total number of reference read counts and the histogram window width.

Such types of histograms are also called aggregation plots (APs).
An aggregation plot shows the distribution of a particular genomic feature  (e.g. a  ChIP-seq  signal) relative to a specified anchor point (e.g. a transcription start site) within a set of genomic regions.

----------------------------------------------------------------------------

ChIP-Seq has a web interface which is freely available at:

   http://ccg.vital-it.ch/chipseq/


Program Installation
============================================================================

For code compilation and data/code installation a suitable makefile is provided.

- To create the executable files, type:

make

- To install the man pages you should have root permissions and type:

make man

- To install the executable files (default $binDir is ./bin), type:

make install

- To install the executable files system-wide (e.g. in /usr/lcal/bin), type:

sudo make prefix=/usr/local install

- To delete the excutable files and all the object files from the compilation directory, type:

make clean

- To delete the excutable files and all the object files from the $binDir directory, type:

make uninstall

# Man Pages

- To install man pages system-wide, type:

sudo make prefix=/usr/local install-man

This command installs the chip-seq man pages in /usr/local/share/man/chip-seq/man1.

# Data files needed for format-conversion tasks

- To install data files needed for some format conversion programs, type:

sudo make prefix=/usr/local install-dat

This command installs the chr_NC_gi, chro_idx.nstorage, and chr_size files in /usr/local/share/chip-seq/.

# ChIP-Seq User's Manual

- To install the User's manual system-wide, please type:

sudo make prefix=/usr/local install-doc

This command will install the ChIP-Seq_Tools-UsersGuide.pdf file in /usr/local/share/chip-seq/doc/.


The DATA Sub-directory
============================================================================

This directory contains a few data sets that can be used to run some tests.
Examples on how to use the ChIP-Seq tools with these data are described in the user's guide (doc/ChipSeq_Tools-UsersGuide.pdf).

From release 1.5.5, the data referred to in the user's guide have been put on our FTP-Site at:

ftp://ccg.epfl.ch/chip-seq/data

ChIP-Seq Code

Branches

Tree [b96f3e] master / Download Snapshot History

Read Me

Tree [b96f3e] master /

History