Download Latest Version focisDatabase_2013_05_10.txt.gz (2.8 GB)
Email in envelope

Get an email when there's a new version of FOCIS

Home
Name Modified Size InfoDownloads / Week
tableOfFeaturesWithExplanatoryText.xlsx 2014-01-21 89.8 kB
focisDatabase_2013_05_10.txt.gz 2014-01-20 2.8 GB
featureList_2013_05_10.txt.gz 2014-01-20 13.3 kB
generateFeatureList.pl 2014-01-20 2.5 kB
README 2014-01-20 5.7 kB
FOCIS_v1.2.pl 2013-06-25 16.8 kB
Totals: 6 Items   2.8 GB 0
FOCIS: Feature Overlapper for Chromosome Interval Subsets
Written by Dan E. Webster, Stanford University, Khavari Lab

INSTALLATION AND PRE-REQUISITES:

FOCIS is comprised of a single Perl script, so it can be run anywhere that Perl is installed (comes pre-installed on all Mac machines, ActivePerl is a useful tool for other platforms). However, FOCIS requires that another software tool, BedTools, be installed and executable from whatever directory where you are executing FOCIS.

BedTools is an excellent genomics toolset developed and actively maintained by Aaron Quinlan and his lab, and information relating to BedTools installation, how overlaps are calculated within FOCIS, etc. can be found at http://bedtools.readthedocs.org/en/latest/ .

DESCRIPTION:

The purpose of FOCIS is to provide potential candidates for functional regulators present at a given subset of genomic intervals, potentially contributing to a unique function or behavior at those intervals relative to a user-defined background within the same dataset.

Given two sets of genomic intervals: a defined subset (SUBSET) of intervals and the rest of the intervals from this dataset that are not in this subset (BACKGROUND), this script will overlap these two sets with the intervals of features in the genome (e.g. ENCODE ChIP-seq data, motifs reverse-searched, etc) to establish a ratio of how many overlaps between your subset and your background exist among this feature's intervals.
  
The algorithm to provide an enrichment score follows below: 
log2( (subset intervals overlapping feature / subset total) / (background intervals overlapping candidate) / background total) )

As an example, say your are comparing ChIP-seq peaks that are lost during a given stimulus.  You would input the interval coordinates of all the lost peaks and the background would be all of the peaks for that ChIP experiment that are NOT lost (rather than just all of the peaks as background, which would include your subset)

What this boils down to biologically is that a large positive number means you have many more overlaps than you would expect by chance in your subset (this candidate factor is highly correlated with your lost peaks), and a large negative number is many less overlaps than you would expect by chance for this factor (this candidate factor is anticorrelated with your change, or in other words, is responsible for maintaining the factor, not losing it)

NOTE: This script requies that BedTools be executable from the directory in which this script is called. The same is true for the directories containing the feature database.  To put it another way, don't move things around too much, or the script won't know where to find them!

FOCIS can take up to 45 minutes to run depending on the input data size, so it may be advisable to run with nohup or screen if you are working over a network.

For help, run FOCIS without any parameters.

TROUBLESHOOTING:

**Please note: input files must have properly formatted newlines (don't just copy from excel, need to have \n).
**Please note: BedTools must be executable in the directory in which FOCIS is running
**Please note that it is VITAL that the feature file is sorted properly beforehand (sort -k 1,1 -k2,2n). This should be true if using the default database file

USAGE AND PARAMETERS:

perl FOCIS_v1.2.pl -s SUBSET_FILE.bed4 -b BACKGROUND_FILE.bed4 -o OUTPUT_FILE.txt -f focisDatabase_2013_05_10.bed4 -l featureList_2013_05_10.txt

-b BACKGROUND_FILE  This needs to be a bed-like file (needs to be recognized by BedTools) with chromosome,start,stop (tab delimited)
Can have extra fields on the end, but these will be ignored for the overlap. MUST NOT HAVE HEADER. Stop must be greater than or equal to start.
Does not need to be sorted, script will do that for you.
	Example: 
	chr6	132088403	132088887
	chr11	18333006	18333481
	chr3	149308016	149309221

-s SUBSET_FILE  This needs to be a bed-like file (needs to be recognized by BedTools) with chromosome,start,stop (tab delimited).
Can have extra fields on the end, but these will be ignored for the overlap. MUST NOT HAVE HEADER. Stop must be greater than or equal to start
Does not need to be sorted, script will do that for you. 
	Example: 
	chr6	132088403	132088887
	chr11	18333006	18333481
	chr3	149308016	149309221 

-f FEATURES_FILE These are bed4-like files in the same chrom,start,stop,name format with optional fields afterwards (must be recognized by BedTools). 
	The name field of the file will be used to store information about that overlap. 
	Example
	chr1	38809	38829	YY1_encodeChIPseq
	chr1	1102976	1102996	YY1_encodeChIPseq
	chr1	1209201	1209221	YY1_encodeChIPseq
	
-l FEATURELIST_FILE This is a unique list of all of the different features present in the features file.  Please pre-compute this if you have not already done so
	Example
	YY1_encodeChIPseq
	NHEK_H3K27ac_encodeChIPseq
	
-o OUTPUT_FILENAME

-flank FLANK_SIZE (optional; default = 0) This is a flag that can be set to make the subset and background intervals also include a flanking size up and downstream from the actual called peak or interval that will be used to broaden the potential overlap region.  This is particularly useful for SNP regions with a length of 0 or 1.
	

-c OVERLAP CUTOFF (optional, default = 1) For positive overlaps, the number of overlapping peaks must account for at least N % of the total.  
	For example, if there are 1000 subset intervals, for a candidate overlapping feature to be considered significant, there must be at least 10 overlaps. 
	This measure prevents extreme enrichment scores at the low end and maintains a distribution of enrichment scores that approximates a normal distribution.
Source: README, updated 2014-01-20