| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| GUI | 2013-10-18 | ||
| Data | 2013-10-18 | ||
| readme.txt | 2013-10-24 | 5.7 kB | |
| GeneIDURN.Symb9.dat | 2013-10-18 | 1.4 MB | |
| params.txt | 2013-10-18 | 505 Bytes | |
| bSNEA2.r | 2013-10-18 | 18.7 kB | |
| bSNEAutils.r | 2013-10-18 | 19.4 kB | |
| misc.r | 2013-10-18 | 35.9 kB | |
| pyadraw.r | 2013-10-18 | 17.1 kB | |
| Totals: 9 Items | 1.5 MB | 0 |
These files are supporting material for paper
"Clustering Gene Expression Regulators: New Approach to Disease Subtyping"
by M Pyatnitskiy, I Mazo, M Shkrob, ESchwartz and E Kotelnikova.
The purpose of software is to post-process results of Subnetwork Enrichment Analysis
implemented in Pathway Studio from Elsevier. As an output scripts generate a number of
plots and tables which contain the detailed description of the obtained clustering of
regulators and samples.
bSNEA (batch SNEA) program takes as input results of SNEA performed on several samples
together with log-ratio values for those samples. Subnetwork Enrichment Analysis should
be done separately in Pathway Studio application. bSNEA performs unsupervised clustering
of samples based on identified subnetworks activated in each sample and outputs the set of
plots and tables, describing found clustering of samples and regulators.
Software consists of a set of R scripts. Main script is called 'bSNEA2.r' and can be called
directly from R given that parameters are stored in file 'params.txt'. We also have developed
simple GUI application (Windows only) which allows user-friendly definition of algorith parameters.
The GUI application fills 'params.txt' with specified values and uses local installation of R to
run 'bSNEA2.r'.
INSTALLATION
Download all files from SourceForge and put them in the same directory.
File 'R.zip' must be uncompressed so that subdirectory 'R' is created which contains
local installation of R system.
We supply two sample data files with expected data format:
* GSE4183.logratio.tab - expression data from NCBI GEO GSE4183
* GSE4183.resSNEA.tab - results of SNEA
RUNNING ANALYSIS
Run file 'bSNEA.exe' and specify the following parameters.
Global options
* Analysis type – should be set to “batchSNEA”.
* Analysis mode – can be set to “SNEA” (each regulator is considered separately) and
“SNEA.cluster”(similar regulators are clustered together, default option). Also both
modes can be complemented with log-transformation.
* SNEA results – specify tab-separated file with results of SNEA (see sample file 'GSE4183.logratio.tab').
* Expression data – specify tab-separated file with log-ratio values for all samples (see sample file 'GSE4183.logratio.tab').
* Output subdir name – name of subdirectory, where all output will be done. Can be leaved blank(default). In this case
output directory will be called “Result1”, for the next run – “Result2” and etc.
* Organism – “Homo Sapiens” (default).
Samples
* Distance – specify how to calculate distance between samples in order to group them. Pearson correlation is set as default.
* Clustering – specify the method of cluster analysis to group samples. Ward’s method (default), single average and complete
linkage are available.
* #clusters – specify number of clusters for grouping samples. This has no default option and may be varied.
* Column with class label – specify which column in file with file with SNEA results contains sample class label.
Regulators
* Distance - specify how to calculate distance between regulators in order to group them. The only option available
now – Jaccard distance.
* Clustering – specify the method of cluster analysis to group regulators. Ward’s method (default), single average
and complete linkage are available.
* #clusters – specify number of clusters for grouping samples. This has no default option and may be varied.
* External list (optional) - specify file with list of regulators to be used instead of SNEA results. Each
regulator should be present on separate line.
Options
* Use gene annotations – print gene descriptions in output tables
* Determine # clusters – perform exhaustive search for optimal number of clusters of regulators. The output is
the dependence between average silhouette value and number of clusters of regulators, which is plotted to
'dnd.regulators.avgsil.pdf'. NB: this can significantly slow down the overall pipeline
* Select regulators with p-value smaller than – filter subnetworks with p-values greater than specified threshold.
* Select regulators with rank smaller than – filter subnetworks with rank greater than specified threshold. Disabled.
* Delete regulators encountered in less than % of samples
* Open results in Explorer – after all pipeline is finished, folder containg results is opened in Explorer.
* Bootstrap runs – if set to >1, then p-values for hierarchical sample clustering are calculated via multiscale
bootstrap resampling and plotted on dendrogram.
* Print summary table – print table “megatable.tab”. Separate line for each downregulated gene from each subnetwork
identified in each patient. This leads to significant size of the resulting table and can slow down the overall pipeline
* Plot sample names – if checked, sample names (usually of the form GSMXXX) are plotted on figures. Otherwise only class
labels are plotted.
Only two options are absolutely mandatory: tab-separated file with results of SNEA (SNEA results) and tab-separated file
with log-ratio values for all samples (Expression data).
Press “Run” button and wait until calculation is finished. This can take several minutes, depending on your CPU speed.
All tables and plots will be stored in “Output subdirectory” location.