Menu

Interactive data analysis

Martin S. Lindner
Attachments
graph_mapping_error.png (25893 bytes)
graph_venn.png (31625 bytes)
related_genome.png (35835 bytes)
taxonomy_tree.png (20002 bytes)

[Home]

Interactive Data Analysis

Once you have processed your data with the MicrobeGPS [Pipeline], you can start exploring the results using the main [User interface]. In this section we show how to interpret the results shown in the Data panel, give examples which graphics can be created and demonstrate how to use the standard [Modules].

Data panel

One general rule in the data panel is: double-clicking expands and item and shows more information. For example, expanding a Candidate shows all References supporting the Candidate, expanding a Reference allows you to inspect the reads mapping to the Reference. In the screenshot below, you see the Candidate 'Cyanobacteria' and its supporting reference 'Cyanobium gracile PCC 6307'. Double-clicking on Related genomes showed a list of other references that support a different Candidate sharing reads with this reference. Color-coding (red) gives information about the fraction of all reads shared with the other reference. Double-clicking one of the related references would directly jump to the corresponding candidate in the list.

Graphics and trees

When you click on an element in the Data panel, MicrobeGPS automatically creates figures from the underlying data. Clicking on a Candidate displays a pie chart representing the number of unique reads of the supporting references:

When a reference is clicked, MicrobeGPS shows a histogram of the read mapping error. The unique reads are shown as yellow bars, all reads are shown as blue bars. This chart is particularly useful because it allows you to identify possibly good references; a good reference typically has many unique matches with low error. In the example below, we see that there is a considerable distance between the reference and the true organism in the sample, as most uniquely matching reads have a higher read mapping error. However, the high number of unique reads indicates that this reference is possibly the closest related reference genome in the database.

Selecting two or three references at once (by holding Ctrl and clicking the references) you can see the nubers of reads shared between the genomes in a Venn diagram.

If you switch from the 'Figure' tab to the 'Tree' tab, you can see a taxonomy tree containing all organisms found in your data. As in the data panel, you can collapse and expand the items. Double-clicking on a leaf directly jumps to the corresponding reference in the Data panel.

Selecting an element in the Data panel highlights the organism in the tree, where the color coding corresponds to the number of reads mapping to the taxonomic unit. Clicking on a Candidate highlights all references supporting the candidate in the tree while collapsing all other branches of the tree. Selecting all Candidates likewise expands the whole tree and color-codes the organisms by the fraction of all mapped reads.

Search module

The search module is a very simple module and allows you to search for names of organisms in references reported by MicrobeGPS. For example, you can type esch to find all references containing this string in their name, e.g. Escherichia coli. Search is not case sensitive.

This module also could have been implemented as a standard feature of MicrobeGPS, but due to its simplicity I thought it wouold be a good example and template if you want to write your own [Modules].

Export Table module

The export table module writes out the analysis results as plain text files. This allows you to further process your results with other software, e.g. Excel. You can select which columns from the data panel you want to export and if you only want to report the candidates or both the candidates and their corresponding supporting references.

Composition Analysis module


Related

Wiki: Home
Wiki: Pipeline
Wiki: User interface

MongoDB Logo MongoDB