Menu

Tree [b6f694] master /
 History

HTTPS access


File Date Author Commit
 binding_event_pvalue 2014-08-01 Antonio Gomes Antonio Gomes [b6f694] Changes to be committed:
 cooperative_interaction 2014-08-01 Antonio Gomes Antonio Gomes [b6f694] Changes to be committed:
 general_function 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 .DS_Store 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 BRACIL_pipeline_rescue.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 BRACIL_pipeline_rescue_motif_input.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 BRACIL_pipeline_rescue_nth_round.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 BRACIL_pipeline_rescue_trf.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 BRACIL_post_motif_and_post_training_pipeline.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 BRACIL_post_motif_and_post_training_pipeline_pvalue_Shuffling.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 README.txt 2014-07-10 Antonio Gomes Antonio Gomes [8c02ad] Changes to be committed:
 all_region_obj_fun_covpos.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 all_regions_obj_covpos.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 bsbdf_region_obj_pointfit.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 cauchy.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 chi2testv2.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 choose_genome_gt.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 choosefunction.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 chr_struct2csdeconv_covc.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 closestinterval.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 closestinterval2.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 count_npeaks.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 create_regionsfasta_eukarya_covc.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 createfastasegments_generalized_covc_gt.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 createfastasegments_regions_covc.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 db_bsbdf.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 db_bsbdf_posterior.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 db_bsbdf_posterior_rescue_pointfit.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 db_bsbdf_posterior_rescue_pointfit_faster.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 double_binding_2_single_binding.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 dsdeconv_bs_center_constraint.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 dsdeconv_estimate_all_regions.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 dsdeconv_estimate_all_regions_covpos.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 dsdeconv_estimate_regions.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 dsdeconv_plot_covc.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 expected_coverage_region.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 fimo_module.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 fimo_module_load.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 fimo_module_motif_input.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 fimo_parse_eukarya_covc.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 fimo_set_create.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 fimo_type.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 final_impulses.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 final_impulses_complement.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 final_impulses_complement_minf.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 fit_function_to_all_regions_covpos.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 fit_region_cauchy.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 fit_region_gaussian.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 fit_region_to_function_covpos.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 gaussian.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 general_deconvolution.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 get_fit_value_from_parameters.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 getpeaks_posterior_compute_sigma.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 getpeaks_posterior_eukarya.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 gumbel.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 impulse_classification.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 impulse_classification_rescue.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 initial_x_guess.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 intervalcluster2.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 keys_function.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 license.txt 2014-07-02 Antonio Gomes Antonio Gomes [e37aa8] license.txt
 list_main_m_files 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 load_covc_set.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 load_data_covc_set.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 load_regions_eukarya_set.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 load_regions_eukarya_set_list.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 load_tagalign_count_set.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 map_impulse_locus.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 meme_module.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 meme_type.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 motif_input_fasta_gt.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 motif_pipeline_covc_gt.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 motif_pipeline_covc_gt_motif_input.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 move_meme_figs.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 non_curated_package_files_list.txt 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 penalty_rescue.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 plot_dsdeconv.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 plot_feature.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 plot_feature_motifbox.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 plot_fit_covpos.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 plot_function_fit_covpos.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 plot_function_save_picture.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 plot_save_picture.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 post_motif_pipeline_covc.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 post_motif_pipeline_covc_a.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 post_motif_pipeline_covc_b.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 post_motif_pipeline_covc_b_height_rescaled.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 post_motif_pipeline_covc_b_rescue.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 post_motif_pipeline_covc_b_rescueV2.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 post_motif_pipeline_covc_height_rescaled.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 post_motif_pipeline_covc_rescue.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 post_motif_pipeline_covc_rescueV2_trf.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 post_motif_pipeline_covc_rescue_bracil_input.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 post_motif_pipeline_covc_rescue_trf.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 pre_deconvh_simple.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 pre_motif_pipeline_covc_a.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 pre_motif_pipeline_covc_b.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 pre_motif_pipeline_covc_b_height_rescaled.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 prepare_to_motif_pipeline.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 prepare_to_motif_pipeline_height_rescaled.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 read_covonly_file.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 read_fimoset.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 read_prefinalpeaks_file.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 readgff2.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 region_boundary.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 region_height_associated_to_impulse.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 region_obj.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 region_obj_fun_covpos.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 rescue_analysis_stats.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 sb_bsbdf.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 sb_bsbdf_posterior.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 site_switch.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit
 top_regions_adaptor_covc.m 2014-07-01 Antonio Gomes Antonio Gomes [dc0c73] Initial commit

Read Me

This README file explains how to use the BRACIL, developed by Gomes et al. 
This README file address the usage for `BRACIL_pipeline_rescue` and its variations: `BRACIL_pipeline_rescue_trf.m`, BRACIL_pipeline_rescue_nth_round.m`, `BRACIL_pipeline_rescue_motif_input.m`, `BRACIL_post_motif_and_post_training_pipeline`.


You need git to download the code. 
Type: 
git clone git://git.code.sf.net/p/bracil/code 

Sections:
0. Data set
I. Requirements
II. Background overview
III. Using the code
IV. Variations in the code
V. Usage Instance
VI. Computing binding event p-value 
VII. Cooperative interaction test

%%0. Data set

The folder (…)/testing/data contain the test dataset for BRACIL.
27_08_covset.txt : `covet` file. Two columns file. The first column indicates the coverage file and the second column the corresponding chromosome. Make sure to update path in this file. 
27_8_sample.cov : Text version of 27_8 coverage
27_8_sample.cov.mat : Binary version of 27_8 coverage. It is faster to read and requires less HDD memory. 
DosR_motif.meme : Instance of a DosR motif predicted by MEME.
h37rv_4thMarkovOrder.bfile : 4th Markov Order Background file of h37rv genome. It is required by MEME and FIMO.
h37rv_annot.gt : Annotation genome table. Two columns file. First column is a path for annotation file and second column is the corresponding chromosome. It is not essential.
h37rv.fa : Fasta file for genome sequence of h37rv.
h37rv.gff : annotation file for h37rv.
h37rv.gt : Genome table for h37rv. Two columns file, first column indicates path for fast file and second column indicates the corresponding chromosome.
reference_sites_chauhan.txt : Reference set of DosR binding sites described by Chauhan et al.
regions_chauhan.txt : Regions that contain the reference sites `reference_sites_chauhan`.
regions_set.txt : Set of enriched regions for the DosR dataset

%%I. Requirements

Our package requires that MEME and FIMO are installed in your system (http://meme.nbcr.net/meme/ )


%%II. Background overview

BRACIL considers the problem of refining the resolution of enriched regions by providing binding sites with high-resolution.
It does so by integrating ChIP-seq coverage with genome sequence and using
a blind-deconvolution algorithm that identifies the binding sites. The enriched regions to be refined can be obtained by any peak-callers, according to user preference.

The advantage of this integrated model is that it provides a more robust method to identify binding sites at high-resolution, with better sensitivity and specificity when compared to methods based only on peak callers or motif sequence.
It predicts binding sites that are consistent with sequence conservation and also with the ChIP-seq coverage.
Most peak callers are not able to predict multiple binding sites inside a binding region. In consequence of that, it has a limited sensitivity and also lower resolution.
BRACIL uses a blind-deconvolution algorithm to perform its task. 

The details about the method are in the manuscript Gomes et al., Decoding ChIP-Seq peaks with a double-binding signal refines binding peaks to single-nucleotide and predicts cooperative interaction. (in press at the time this README file was written).


%%III. Using the code

The main function to run the blind-deconvolution algorithm is `BRACIL_pipeline_rescue`.
It will output the files `BRACIL_cov_only.out` and `BRACIL_finalpeaks_mline%d.out` for binding site predictions. 
`BRACIL_cov_only.out` predicts binding site locations based only in the ChIP-seq coverage.
`BRACIL_finalpeaks_mline%d.out` predicts binding site locations refined by motif discovery. 

The function usage with its inputs is shown as following:

 BRACIL_pipeline_rescue(coverage_file_set, coverage_file_type, regions_set_file, output_path, output_tag, genome_table_file, mset_1, mset_2, mset_3, ftype, bg_file, annotation_table_file, weak_site_log10_threshold, strong_site_log10_threshold, alpha_rescue, d_lim)
 

The inputs for BRACIL_pipeline_rescue is explained in the comments for 
`BRACIL_pipeline`, but are repeated here to facilitate usage.

1. Coverage_file_set:
    The input for the coverage data. It accepts three formats: `tagAlign_count_set`, `cov_list`, and `tagAlign_count_set_list`. 
All formats can be used to perform analysis in multiple chromosome organisms
The `tagAlign_count_set` format is a 4 columns file, represented by:
                genome_position, strand, number_of_tags_count, chromosome.
Strand is `+` or `-` to indicate positive or negative strand.
The `cov_list` format is a two columns file with the following information:
                cov_file, chromosome_label.
The `cov_file` is indicates the coverage file for the corresponding chromosome indicated by `chromosome_label`. It contains 4 columns:
                genome_position, cov_total_count, count_negative_strand_tags, cov_positive_strand_tag
The cov_file can be a text file or a `.mat`, matlab binary file.
The `tagAlign_count_set_list` is a two columns file with a `tagAlign_count_set` file for each chromosome, as following:
                tagAlign_count_set_file, chromosome_label

2. Coverage_file_type: `tagAlign_count`, `cov` or `tagAlign_count_set_list`, according to the input used in coverage_file_set.

3. regions_file_set:
    At 10 columns format file (only the first 4 have some meaning).
%format:
    contiguous, start, stop, region_id nan nan nan nan nan nan

4. output_path
    - Path where output files will be saved
5. output_tag
    - An additional name that will be added to output_path. Warning: do not use the underline symbel "_" in this part. It might get you some conflict.

6. genome_table_file
    - A two columns file indicating where genome files are located:
    Col1=genome_file Col2=contigous.
    Genome_file is a fasta file, but can be used a saved fasta file `.mat` (matlab binary) for speed.

7. mset 
    A 3 columns vector to indicate how to execute motif finding. 
We use MEME as motif finding algorithm.
    mset = [mline top_perc extra_edges], where:
    mline: a number indicating the MEME query to be used (check `meme_type` for more options).
        10 : -dna -revcomp -minw  8 -maxw 30  -mod oops -bfile
        12 : -dna -minw  8 -maxw 30  -mod oops -bfile 
    top_perc : fraction of top enriched regions used to create pre-motif fasta file.
    extra_edges : each subsequence in `pre-motif` fasta file correspond to
the genome sequence that spans around plus/minus extra_edges bps of a predicted binding site.

NOTICE: the number in `mline` will be observed in output file
`BRACIL_finalpeaks_mline%d.out`.

8. ftype
    Defines the impulse function. As default, use "gumbel".

9. bg_file
    A background file for MEME query. It contains frequency of DNA letters 
in the queried genome. It has usually been used a 4th markov order file.

10. annotation_table_file
    - A two columns file indicating genome annotation to each chromosome. 
Similar to genome_table_file.
    Col1=annotation_file Col2=contigous.
    annotation_file is a `gff` file, where the collumns 4, 5, 7, 9 are
important to characterize the feature, corresponding to, respectively: 
feature_start_position feature_end_position feature_strand feature_gene_id_name.

11. weak_site_log10_threshold : minimum threshold to consider a motif as a potential bindnig site. Value is defined as `-log10 unit`. (e.g. weak_site_log10_threshold = 3 considers only motifs with p_value <= 10^-3 ). I have usually been using 2.5.

12. strong_site_log10_threshold : It defines a boundary for classification of weak and strong site. Important to define penalty classification. I have usually been using 4.

13. alpha_rescue : penalty for considering a weak site. It assumes a value from [0 1], with 0 indicating no penalty and 1 indicating a penalty proportional to sum of squares of coverage.
The penalty is proportional to number of weak sites used for deconvolution. I have usually been using a low value, e.g. 0.02 or 0.01.

14. d_lim : Maximum distance to consider `double binding` signal. Defining d_lim=0 indicates only `single binding` signal. I have considered d_lim=50 for cases with double binding signal.

%%IV. Variations in the code.

The code is available in three different variations: BRACIL_pipeline_rescue_trf, BRACIL_pipeline_rescue_motif_input and BRACIL_pipeline_rescue_nth_round.

BRACIL_pipeline_rescue_trf:
	This version requires an extra input, training_regions_set_file, that represents a set of regions where the impulse response parameters will be trained. As default, our code uses only the top 16 enriched regions.

BRACIL_pipeline_rescue_motif_input:
	This version requires a motif file, `memefile`, as input. It uses this motif file to predict the location of the potential binding sites that will be used for deconvolution.

BRACIL_pipeline_rescue_nth_round:
	This version re-iterates motif discovery and deconvolution pipeline. It requires as input the round number and a `pre_final_peaks` file. The pre_final_peaks file represent the refined file from a previous round.

BRACIL_post_motif_and_post_training_pipeline:
	This version skip step that trains impulse response parameters. It requires as input a `bracil_covonly` file, which represents BRACIL prediction based only in ChIP-seq coverage, a `bracil_final_peaks` file, which represents BRACIL prediction that is refined by motif discovery, and a motif input.


%%V. Usage Instance

The File BRACIL_demo_pipeline shows an example of BRACIL pipeline applied to the M. tuberculosis transcription factor DosR. 
This demo file predicts binding site with three possibilities:

1. Considering only one round of motif discovery to refine the model (i.e. BRACIL_pipeline_rescue ).

2. Considering a second round of motif discovery to refine the model. (i.e. BRACIL_pipeline_rescue_nth_round).

3. According to some user input motif (i.e. BRACIL_pipeline_rescue_motif_input).

It outputs at `performance` directory a set of summary files showing the performance of our algorithm to predict the reference sites from Chauhan et al. paper.

The summary files compare performance of BRACIL with motif finding. The statistics are computed from using impulse magnitude and motif pvalues. BRACIL_normalized uses impulse magnitude normalized by total magnitude inside enriched region. BRACIL uses absolute impulse magnitude. Motif3p0 uses motif cutoff threshold -log10(pvalue) > 3 and Motif2p5 uses cutoff threshold -log10(pvalue) > 2.5.

Variation in the results occurs at different trial because the step that predicts the shape of the impulse response is not deterministic. This occurs because a monte carlo step is used to search for global maximum in the likelihood maximization step.


%%VI. Computing p-value.

BRACIL can output a p-value for binding event prediction.
This is obtained by performing the deconvolution step in a randomly shuffled (with repetition) coverage.
The p-value of a binding event indicates the probability an impulse magnitude equal or greater than the observed one would happen by chance (See manuscript for details).

P-value is computed in three steps:
(i)  Generate region with random coverage ( `coverage_shuffler` function)
(ii) Run BRACIL on randomized enriched regions.
(iii) Compute p value from the magnitude of the impulse response

The script pvalue_INSTANCE.m illustrates the steps for it. Make sure to adapt the path for input and output files. 

VII. Cooperative interaction test

BRACIL is able to test if two close binding sites perform cooperative interaction or not.
(see manuscript for details)

Cooperative interaction test is performed by the function `cooperative_interaction_test_bracil_file_input` 
An instance of it is illustrated in BRCIL_cooperative_test_INSTANCE.m`.