The below tutorial applies to the analysis of RNA sequencing data. A separate tutorial for the analysis of Affymetrix Exon and Gene arrays can be found here. If using an Affymetrix junction array, the analysis steps below and results are analogous (load your arrays and select the proper platform where indicated).
AltAnalyze allows users to assess differential gene expression, sample quality, alternative exon expression, isoform functional relevance and enriched gene-sets. For RNA-Seq data, AltAnalyze takes pre-aligned exon-junctions and/or exon coordinates and read counts for gene and alternative isoform-level analyses (junction and/or exon). AltAnalyze makes this process relatively easy, with the user only required to download and extract the program and provide one set of basic files. In the following tutorial we will walk through these steps using a sample dataset.
Sample data can be downloaded here.
AltAnalyze can be run with junction and/or exon read counts from BED format (e.g. TopHat) or BioScope result files. Instructions for obtaining junction and exon-level input files from your sequencing experiments can be found here. If you already have junction.bed files, instructions for building exon.bed files can be found here. These instructions will have you:
AltAnalyze versiown 2.0 can be downloaded for multiple operating systems from http://www.altanalyze.org. Once you have downloaded the compressed archive to your computer, extract it to an accessible folder on your hard-drive using your built-in unzip or commercial packages. In addition to AltAnalyze, Cytoscape and DomainGraph are automatically downloaded when a species database is first installed (see below for details).
If your dataset has over 30 BED files or dozens of groups, it may save you time to make the groups and comps files in advance. Although not recommended when working with this sample dataset, go here if this applies to your own dataset.
Now you are ready to process your input files and obtain alternative exons with alternative splicing and functional annotations. If running through the graphical user interface follow the below directions, otherwise follow the commandline options for RNASeq analysis. To proceed:
When AltAnalyze was running it produced a number of output files, most to the folder AltResults/AlternativeOutput in the user output directory. These include:
These files are tab-delimited text files that can be opened in a spreadsheet program like Microsoft Excel, OpenOffice or Google Documents. In addition to these files, similar files will be produced with the algorithm "splicing-index" (replaces the filename ASPIRE above with splicing-index). These are similar format files with single junction and exon results (as opposed to reciprocal-junction pairs). These results allow users to examine the independent regulation of e exon junctions.
File #1 reports gene expression values for each sample and group based on junctions present in your input BED files. The values are derived from junctions that align to regions of a gene that are common to all transcripts and thus are informative for transcription (unless the option "known junctions" is selected – see “Select expression analysis parameters”, above) and expressed above specified background levels (minimum group average read count). Along with the raw gene expression values (mean read counts), statistics for each indicated comparison (mean expression, folds, t-test p-values) will be included along with gene annotations from Ensembl, including putative microRNA binding sites. This file is analogous to the results file you would have with a typical microarray experiment and is saved to the folder “ExpressionOutput”.
Results from files #2-5 are produced from all junctions that may suggest alternative splicing, alternative promoter regulation, or any other variation indicated by a reciprocal junction analysis for that gene. Each set of results correspond to a single pair-wise comparison (e.g., cancer vs. normal) and will be named with the group names you assigned (groups file). If analyzing a multiple groups, the two groups with the largest difference in reciprocal junction scores will be reported along with the conditions these occur in.
File #2 reports reciprocal junctions that are alternatively expressed, based on the user defined ASPIRE or LinearRegression scores and p-values. For each reciprocal junction has several statistics, gene annotations and functional predictions provided. A detailed description of all of the columns in this file is provided here.
File #3 is a summarization of reciprocal-junction results at the gene level from file #2. In addition to this summary, Gene Ontology terms and WikiPathways for that gene are reported.
Files #4 and #5 report over-representation results for protein domains (or other protein features) and microRNA-binding sites, predicted to be regulated by AltAnalyze. These files include over-representation statistics and genes associated with the different domains or features¸ predicted to be regulated.
File #6 compares the reciprocal-junction (e.g., ASPIRE) and exon-level results (splicing-index) to determine which splicing-events are indicated by multiple and independent lines of evidence. The direction of the fold change and algorithm detected by are indicated for each row.
More information about these files can be found in the AltAnalyze ReadMe (section 2.3).
AltAnalyze produces a large number of results, including tab-delimited spreadsheets, expression clustered heatmaps and colored pathways. A detailed description of these various outputs can be found here.
Isoform and domain-level results produced in AltAnalyze (AltResults/DomainGraph) can be directly loaded by DomainGraph. DomainGraph is a plugin for the Java program Cytoscape which can be immediately opened through AltAnalyze. Rather than visualizing junctions, however, DomainGraph currently only supports exon visualization. RNASeq highlighted exons (identified directly from exons or by reciprocal exon-junctions) are associated with Affymetrix Exon 1.0 identifiers to be loaded in DomainGraph. Instructions can be found here.
AltAnalyze includes the option to perform a powerful biological enrichment analysis, that includes multiple Ontologies, pathways and gene-set categories (e.g., transcription factor targets). This option can be selected from the Expression Analysis Parameters Window by selecting the option "Analyze ontologies and pathways with GO-Elite". From this menu, criterion for filtering your data for differentially expressed genes and visualizing WikiPathway enrichment results are included.
Wiki: AltAnalyze
Wiki: AlternativeOutput
Wiki: BAMtoBED
Wiki: BioScope
Wiki: GOElite
Wiki: LinearRegression
Wiki: ManualGroupsCompsCreation
Wiki: News
Wiki: ObtainingRNASeqInputs
Wiki: RNASeq
Wiki: RNASeqCommandLine
Wiki: RNASeq_sample_data
Wiki: RunningSourceCode
Wiki: SplicingForAnyPlatform
Wiki: TopHat
Wiki: Tutorial_AltExpressionAnalysis
Wiki: Tutorials
Wiki: WikiPathways