Name | Modified | Size | Downloads / Week |
---|---|---|---|
GUI | 2013-10-18 | ||
Data | 2013-10-18 | ||
readme.txt | 2013-10-24 | 5.7 kB | |
GeneIDURN.Symb9.dat | 2013-10-18 | 1.4 MB | |
params.txt | 2013-10-18 | 505 Bytes | |
bSNEA2.r | 2013-10-18 | 18.7 kB | |
bSNEAutils.r | 2013-10-18 | 19.4 kB | |
misc.r | 2013-10-18 | 35.9 kB | |
pyadraw.r | 2013-10-18 | 17.1 kB | |
Totals: 9 Items | 1.5 MB | 1 |
These files are supporting material for paper "Clustering Gene Expression Regulators: New Approach to Disease Subtyping" by M Pyatnitskiy, I Mazo, M Shkrob, ESchwartz and E Kotelnikova. The purpose of software is to post-process results of Subnetwork Enrichment Analysis implemented in Pathway Studio from Elsevier. As an output scripts generate a number of plots and tables which contain the detailed description of the obtained clustering of regulators and samples. bSNEA (batch SNEA) program takes as input results of SNEA performed on several samples together with log-ratio values for those samples. Subnetwork Enrichment Analysis should be done separately in Pathway Studio application. bSNEA performs unsupervised clustering of samples based on identified subnetworks activated in each sample and outputs the set of plots and tables, describing found clustering of samples and regulators. Software consists of a set of R scripts. Main script is called 'bSNEA2.r' and can be called directly from R given that parameters are stored in file 'params.txt'. We also have developed simple GUI application (Windows only) which allows user-friendly definition of algorith parameters. The GUI application fills 'params.txt' with specified values and uses local installation of R to run 'bSNEA2.r'. INSTALLATION Download all files from SourceForge and put them in the same directory. File 'R.zip' must be uncompressed so that subdirectory 'R' is created which contains local installation of R system. We supply two sample data files with expected data format: * GSE4183.logratio.tab - expression data from NCBI GEO GSE4183 * GSE4183.resSNEA.tab - results of SNEA RUNNING ANALYSIS Run file 'bSNEA.exe' and specify the following parameters. Global options * Analysis type – should be set to “batchSNEA”. * Analysis mode – can be set to “SNEA” (each regulator is considered separately) and “SNEA.cluster”(similar regulators are clustered together, default option). Also both modes can be complemented with log-transformation. * SNEA results – specify tab-separated file with results of SNEA (see sample file 'GSE4183.logratio.tab'). * Expression data – specify tab-separated file with log-ratio values for all samples (see sample file 'GSE4183.logratio.tab'). * Output subdir name – name of subdirectory, where all output will be done. Can be leaved blank(default). In this case output directory will be called “Result1”, for the next run – “Result2” and etc. * Organism – “Homo Sapiens” (default). Samples * Distance – specify how to calculate distance between samples in order to group them. Pearson correlation is set as default. * Clustering – specify the method of cluster analysis to group samples. Ward’s method (default), single average and complete linkage are available. * #clusters – specify number of clusters for grouping samples. This has no default option and may be varied. * Column with class label – specify which column in file with file with SNEA results contains sample class label. Regulators * Distance - specify how to calculate distance between regulators in order to group them. The only option available now – Jaccard distance. * Clustering – specify the method of cluster analysis to group regulators. Ward’s method (default), single average and complete linkage are available. * #clusters – specify number of clusters for grouping samples. This has no default option and may be varied. * External list (optional) - specify file with list of regulators to be used instead of SNEA results. Each regulator should be present on separate line. Options * Use gene annotations – print gene descriptions in output tables * Determine # clusters – perform exhaustive search for optimal number of clusters of regulators. The output is the dependence between average silhouette value and number of clusters of regulators, which is plotted to 'dnd.regulators.avgsil.pdf'. NB: this can significantly slow down the overall pipeline * Select regulators with p-value smaller than – filter subnetworks with p-values greater than specified threshold. * Select regulators with rank smaller than – filter subnetworks with rank greater than specified threshold. Disabled. * Delete regulators encountered in less than % of samples * Open results in Explorer – after all pipeline is finished, folder containg results is opened in Explorer. * Bootstrap runs – if set to >1, then p-values for hierarchical sample clustering are calculated via multiscale bootstrap resampling and plotted on dendrogram. * Print summary table – print table “megatable.tab”. Separate line for each downregulated gene from each subnetwork identified in each patient. This leads to significant size of the resulting table and can slow down the overall pipeline * Plot sample names – if checked, sample names (usually of the form GSMXXX) are plotted on figures. Otherwise only class labels are plotted. Only two options are absolutely mandatory: tab-separated file with results of SNEA (SNEA results) and tab-separated file with log-ratio values for all samples (Expression data). Press “Run” button and wait until calculation is finished. This can take several minutes, depending on your CPU speed. All tables and plots will be stored in “Output subdirectory” location.