Download Latest Version R.zip (43.1 MB)
Email in envelope

Get an email when there's a new version of bSNEA

Home
Name Modified Size InfoDownloads / Week
GUI 2013-10-18
Data 2013-10-18
readme.txt 2013-10-24 5.7 kB
GeneIDURN.Symb9.dat 2013-10-18 1.4 MB
params.txt 2013-10-18 505 Bytes
bSNEA2.r 2013-10-18 18.7 kB
bSNEAutils.r 2013-10-18 19.4 kB
misc.r 2013-10-18 35.9 kB
pyadraw.r 2013-10-18 17.1 kB
Totals: 9 Items   1.5 MB 1
These files are supporting material for paper 
"Clustering Gene Expression Regulators: New Approach to Disease Subtyping"
by M Pyatnitskiy, I Mazo, M Shkrob, ESchwartz and E Kotelnikova.
The purpose of software is to post-process results of Subnetwork Enrichment Analysis 
implemented in Pathway Studio from Elsevier.  As an output scripts generate a number of
plots and tables which contain the detailed description of the obtained clustering of 
regulators and samples.

bSNEA (batch SNEA) program takes as input results of SNEA performed on several samples 
together with log-ratio values for those samples. Subnetwork Enrichment Analysis should 
be done separately in Pathway Studio application.  bSNEA performs unsupervised clustering 
of samples based on identified subnetworks activated in each sample and outputs the set of
plots and tables, describing found clustering of samples and regulators.

Software consists of a set of R scripts. Main script is called 'bSNEA2.r' and can be called 
directly from R given that parameters are stored in file 'params.txt'. We also have developed 
simple GUI application (Windows only) which allows user-friendly definition of algorith parameters. 
The GUI application fills 'params.txt' with specified values and uses local installation of R to
run 'bSNEA2.r'.


INSTALLATION
Download all files from SourceForge and put them in the same directory.
File 'R.zip' must be uncompressed so that subdirectory 'R' is created which contains 
local installation of R system.

We supply two sample data files with expected data format:
 * GSE4183.logratio.tab - expression data from NCBI GEO GSE4183
 * GSE4183.resSNEA.tab  - results of SNEA 
 


RUNNING ANALYSIS 

 Run file 'bSNEA.exe' and specify the following parameters.

 Global options
   * Analysis type – should be set to “batchSNEA”.

   * Analysis mode – can be set to “SNEA” (each regulator is considered separately) and 
       “SNEA.cluster”(similar regulators are clustered together, default option). Also both 
        modes can be complemented with log-transformation. 

   * SNEA results – specify tab-separated file with results of SNEA (see sample file 'GSE4183.logratio.tab').

   * Expression data – specify tab-separated file with log-ratio values for all samples (see sample file 'GSE4183.logratio.tab').

   * Output subdir name – name of subdirectory, where all output will be done. Can be leaved blank(default). In this case 
     output directory will be called “Result1”, for the next run – “Result2” and etc.

   * Organism – “Homo Sapiens” (default). 

 Samples
   * Distance – specify how to calculate distance between samples in order to group them. Pearson correlation is set as default.

   * Clustering – specify the method of cluster analysis to group samples. Ward’s method (default), single average and complete 
     linkage are available.

   * #clusters – specify number of clusters for grouping samples. This has no default option and may be varied.

   * Column with class label – specify which column in file with file with SNEA results contains sample class label. 

 Regulators
   * Distance - specify how to calculate distance between regulators in order to group them. The only option available
     now – Jaccard distance.

   * Clustering – specify the method of cluster analysis to group regulators. Ward’s method (default), single average 
      and complete linkage are available.

   * #clusters – specify number of clusters for grouping samples. This has no default option and may be varied. 

   * External list (optional) - specify file with list of regulators to be used instead of SNEA results. Each 
     regulator should be present on separate line. 


 Options
  * Use gene annotations – print gene descriptions in output tables

  * Determine # clusters – perform exhaustive search for optimal number of clusters of regulators. The output is 
    the dependence between average silhouette value and number of clusters of regulators, which is plotted to 
   'dnd.regulators.avgsil.pdf'. NB: this can significantly slow down the overall pipeline

  * Select regulators with p-value smaller than – filter subnetworks with p-values greater than specified threshold.

  * Select regulators with rank smaller than – filter subnetworks with rank greater than specified threshold. Disabled.

  * Delete regulators encountered in less than % of samples 

  * Open results in Explorer – after all pipeline is finished, folder containg results is opened in Explorer.
  
  * Bootstrap runs – if set to >1, then p-values for hierarchical sample clustering are calculated via multiscale 
    bootstrap resampling and plotted on dendrogram.
  
  * Print summary table – print table “megatable.tab”. Separate line for each downregulated gene from each subnetwork 
    identified in each patient. This leads to significant size of the resulting table and can slow down the overall pipeline
  
  * Plot sample names – if checked, sample names (usually of the form GSMXXX) are plotted on figures. Otherwise only class 
    labels are plotted.

 
  Only two options are absolutely mandatory:  tab-separated file with results of SNEA (SNEA results) and tab-separated file 
  with log-ratio values for all samples (Expression data). 
  
  Press “Run” button and wait until calculation is finished. This can take several minutes, depending on your CPU speed. 
  All tables and plots will be stored in “Output subdirectory” location.
Source: readme.txt, updated 2013-10-24