Menu

Heatmaps

Anonymous

AltAnalyze Hierarchical Clustering Heatmaps

Currently, AltAnalyze users can perform hierarchical clustering in one of two ways:

  1. Through analysis of raw array or RNA-Seq data (Workflow Analysis)
  2. Immediate analysis of filtered fold files (Independent Analysis)

For both approaches, a colored gene expression heatmap is produced with arrays clustered (columns) and genes clustered (rows) using basic methods found in other clustering programs, such as Cluster/TreeView. AltAnalyze makes this process extremely simple, with the user only needed to providing their transcriptome datasets.

Visualization Support

Support for heatmap visualization and hierarchical clustering is provided through the Python libraries matplotlib and scipy. These libraries are included in the operating system included bundles of AltAnalyze (binary versions). When compiling direct from source code, these libraries and Numpy must be installed by the user. The library fastcluster is optional and may improve efficiency (included with the binaries). See

Workflow Analysis

This analysis is the standard workflow for microarrays and RNA-Seq data (see tutorials). In this workflow, differentially expressed or outlier genes (defined based on the user's statistical filtering options) are clustered using default coloring and distance algorithms. In addition to genes, cell-type predictions from the new method LineageProfiler are also clustered using this method (see LineageProfiler description for details). When finished with an analysis, the user will be presented with these plots in the interface and can be also be found in the folder DataPlots. These input data matrices can also be re-clustered with different options following an analysis (Additional Analyses option - see below).

The three default plots for genes are:

  1. Significant regulated genes (relative to the row mean)
  2. Significant regulated genes (relative to the comparison group mean)
  3. Outlier regulated genes (relative to the row mean)

All of these plot the log2 fold changes relative to the row mean or comparison mean. The first of these clusters indicates how all samples compare to other samples, without considering specific user-comparisons (e.g., day 10 versus day 0). The second is for the same genes but with fold changes for each sample to the control group mean for any user-defined comparison (including control samples versus the control mean). The third is for any sample for any non-significant genes with a fold change greater than the user-defined fold threshold (default 2 fold) for a given sample versus the mean of all samples in that row.

Customized Independent Analysis

In addition to running the workflow version of hierarchical clustering, users can select the Additional Analyses option to run hierarchical clustering any tab-delimited text file. This text file must have a header row of column annotations (e.g. sample names) and the first column with row annotations (e.g. gene names) and the remaining by positive and negative values. Multiple options for positive and negative gradients can be selected, along with clustering algorithms for rows and columns.

Menu and Formatting Options

The below options are present in the program and can also be programmatically controlled using the command-line interface.

Biological Group Sample Visualization: When sample columns are formatted to include the sample group name (e.g., cancer:sample1 - separated by a colon), these group assignments will be colored in the top color bar. When groups are not present in the sample names, the colors of distinct empirically derived clusters will be show instead.

Clustering Metric: This indicates the distance similarity metric used for clustering. These include a large number of algorithms available in the scipy statistical library.

Clustering Method: This indicates the methods for displaying the distance between elements of each cluster for linkage. In addition, the option hopach will automatically call the R environment, when present, install the hopach library locally within the AltAnalyze directory and call this function automatically.

Choose a color scheme: A number of standard coloring options are provided for visualization (e.g., red-black-green).

Clustering rows or columns: Clustering of rows or columns can optionally be disabled, in which case the imported order will be displayed of rows or columns.

Normalize rows relative to: If imported data is not normalized, it can be normalized relative to the mean of all samples or median. This function assumes the data is already log transformed.

Transpose matrix: This option will transpose the columns and rows.

Heatmap color contrast level: This option allows the user to adjust the intensity of the displayed plot.

Select Gene Set/Ontology to filter: When loading a file containing IDs indicated in the Main Menu (Other IDs, platform specific or found in the AltAnalyze ExpressionOutput/Clustering folder), AltAnalyze can filter your list based on a large library of pathways, genesets and ontologies supported by GO-Elite. Hence, users can obtain pathway-specific cluster sets.

Select a specific Gene Set: Select a specific GeneSet or ontology term for the selected GO-Elite class indicated above.

Type a pathway ID or Ontology ID: Alternatively enter pathway or ontology ID for quicker selection.

Type a gene to get top correlated: Display just those genes that are most correlated with a gene or identifier listed in the input expression file.


Related

Wiki: AltAnalyze
Wiki: ConceptIntroduction
Wiki: LineageProfiler
Wiki: QualityControl
Wiki: Tutorials

MongoDB Logo MongoDB