Download Latest Version CrossHub.1.3.5.plus.db.tar.gz (771.0 MB)
Email in envelope

Get an email when there's a new version of Crosshub

Home / Sample.results
Name Modified Size InfoDownloads / Week
Parent folder
BCL3.scatterplot.with.labels.png 2015-07-13 635.1 kB
BCL11A.scatterplot.with.labels.png 2015-07-13 624.6 kB
CBX3.scatterplot.with.labels.png 2015-07-13 645.9 kB
COAD_RNA-Seq.png 2015-07-13 398.8 kB
LUSC_RNA-Seq.png 2015-07-13 330.4 kB
Methyl_PromoterCpG_profile.png 2015-07-13 174.7 kB
Methyl_PromoterCpG_scatterplot.clean.png 2015-07-13 3.5 MB
COAD_ChrRem_corr_RNA-Seq_RNA-Seq_corr.xlsx 2015-05-04 288.1 kB
COAD_Cofactors_corr_RNA-Seq_RNA-Seq_corr.xlsx 2015-05-04 645.4 kB
COAD_EncodeTF_corr_RNA-Seq_RNA-Seq_corr.xlsx 2015-05-04 691.9 kB
COAD_miRNA-Seq.xlsx 2015-05-04 67.7 kB
COAD_RNA-Seq.xlsx 2015-05-04 46.0 kB
COAD_RNA-Seq_miRNA-Seq_corr.xlsx 2015-05-04 1.0 MB
COAD_TF_corr_RNA-Seq.xlsx 2015-05-04 857.3 kB
COAD_TF_corr_RNA-Seq_RNA-Seq_corr.xlsx 2015-05-04 2.5 MB
Totals: 15 Items   12.4 MB 0

CrossHub

CrossHub is a Python tool allowing multi-way analysis of The Cancer Genome Atlas (TCGA) data:

• Gene differential expression (TCGA RNA-Seq) with correlations to clinical characteristics (tumor stage, TNM, follow-up)

• Alternatively spliced transcripts differential expression (TCGA RNA-Seq) with correlations to clinical characteristics (tumor stage, TNM, follow-up)

• miRNA differential expression (TCGA miRNA-Seq)

• Methylation profiling (TCGA Illumina Infinium HumanMethylation450 BeadChips)

• Prediction of regulatory microRNA based on gene-miRNA co-expression analysis and miRNA target prediction algorithms (TargetScan, DIANA microT, etc.)

• Prediction of regulatory transcription factors (TF) based on gene-TF co-expression analysis and ENCODE ChIP-Seq data.

• Gene expression – methylation correlation analysis for CpGs located in promoter and enhancer regions

CrossHub generates pretty Excel workbooks with the results.

Quick start. Prerequisites

It is recommended to install Anaconda Python3 since it includes all the needed packages, http://continuum.io/downloads#py35

Otherwise, you can install these packages separately:

sudo apt-get install python3 python3-dev python3-scipy python3-xlsxwriter python3-pyparsing python3-matplotlib python3-pandas python3-psutil python3-setuptools python3-pip
sudo easy_install3 -U statsmodels

Download and unpack one of following CrossHub releases:

CrossHub.1.3.4.plus.db.tar.gz (~800 Mb)

CrossHub.1.3.4.plus.db.TCGA.example.tar.gz (~5 Gb)

The second one includes an example of TCGA colon adenocarcinoma (COAD) TCGA_data_path including RNA-Seq, miRNA-Seq and methylation profiling data.

Download TCGA data

The easiest way to obtain TCGA data is to use DownloadData.py script. This allows easy downloading regularly updated TCGA data from our server

python3 DownloadData.py

Otherwise, visit TCGA Data Portal https://tcga-data.nci.nih.gov/tcga/dataAccessMatrix.htm and download data manually. The example of colon adenocarcinoma (COAD) TCGA data files are provided within CrossHub.*.plus.db.TCGA.example.tar.gz

Run CrossHub

We have provided several scripts for typical tasks (gene expression, methylation etc.).

General script syntax looks like:

bash task.sh TCGA_data_path gene1,gene2,gene3 Output_dir

Gene range may be declared using Gene Ontology keywords:

bash ./RNA-Seq.diff.exp.sh ./TCGA_data_path "ACTB,GAPDH,[apoptosom* or 'mitotic spindle']" ./Output_dir

Here is the complete list of scripts:

• Complete analysis - differential expression, co-expression with ENCODE TF + ChIP-Seq, co-expression with miRNA + target prediction databases, methylation profiling + ENCODE genome segmentation:

bash ./Complete.Analysis.sh TCGA_data_path gene1,gene2,gene3 Output_dir

• Gene differential expression profiling

bash ./RNA-Seq.diff.exp.sh TCGA_data_path gene1,gene2,gene3 Output_dir

• Expression correlation between miRNA and genes:

bash ./miRNA-Seq-RNA-Seq.co-expr.sh TCGA_data_path RASSF1,TP53,VHL Output_dir
bash ./miRNA-Seq-RNA-Seq.co-expr.sh TCGA_data_path RASSF1,TP53+mir-182,mir-183 Output_dir
bash ./miRNA-Seq-RNA-Seq.co-expr.sh TCGA_data_path mir-182,mir-183 Output_dir

In order to save memory, you can launch the analysis without miRNA target prediction databases:

bash miRNA-Seq-RNA-Seq.co-expr.without.DB.sh TCGA_data_path gene1,gene2,gene3 Output_dir

• Methylation profiling of the genes involved in apoptosis with description of chromatin state of CpG sites according to ENCODE Chromatin State Segmentation database (promoter, enhancer, polycomb repressed etc.). Methylation-expression correlation analysis will be also performed.

bash Methylation.and.RNA-Seq.sh TCGA_data_path gene1,gene2,gene3 Output_dir

In order to reduce RAM usage one can run this analysis without ENCODE Chromatin State Segmentation DB:

bash Methylation.and.RNA-Seq.without.ENCODE.sh TCGA_data_path gene1,gene2,gene3 Output_dir

• Expression correlation between genes and transcription factors studied in the ENCODE project. This analysis includes highlighting gene-TF interactions found by ENCODE ChIP-Seq:

bash RNA-Seq.co-expr.with.ENCODE.TF.sh TCGA_data_path gene1,gene2,gene3 Output_dir

• Expression correlation between genes and transcription factors, cofactors and genes of chromatin remodeling complexes. This analysis includes highlighting gene-TF interactions found by ENCODE ChIP-Seq:

bash RNA-Seq.co-expr.TF.ChrRem.Cofactors.plus.ENCODE.ChIP-Seq.sh TCGA_data_path gene1,gene2,gene3 Output_dir

The same analysis supplemented with Jaspar predictions:

bash RNA-Seq.co-expr.TF.ChrRem.Cofactors.plus.ENCODE.ChIP-Seq.Jaspar.sh TCGA_data_path gene1,gene2,gene3 Output_dir

Co-expression analysis without Jaspar and ENCODE ChIP-Seq, to save maximum the memory:

bash RNA-Seq.co-expr.TF.ChrRem.Cofactors.without.DB.sh TCGA_data_path gene1,gene2,gene3 Output_dir

• Differential expression of miRNA:

bash miRNA-Seq.diff.exp.sh TCGA_data_path Output_dir

You can use complex queries to define gene range to analyze. In the following example, we reveal genes with GO annotation matching the pattern 'apopto*', e.g. apoptosis, apoptotic, apoptosome:

bash ./RNA-Seq.diff.exp.sh ./COAD [apopto*] Output_dir

You can use brackets and logical operators 'or', 'and'. This example describes the selection of apoptosis-related and glycolytic process-related genes, but not encoding extracellular proteins:

bash ./RNA-Seq.diff.exp.sh ./COAD "[(apoptosis or glycolytic) -extracellular]" Output_dir

Also you can use expression-matching:

bash RNA-Seq.diff.exp.sh ./COAD "['negative regulation of glycolytic process']" Output_dir

Citation

If you have found CrossHub useful, please cite

Krasnov G.S., Dmitriev A.A. et al. CrossHub: a tool for multi-way analysis of The Cancer Genome Atlas (TCGA) in the context of gene expression regulation mechanisms. Nucleic Acids Res. 2016 Jan 14. pii: gkv1478. PMID 26773058, doi: 10.1093/nar/gkv1478

Source: README.md, updated 2016-06-03