Menu

Help Manual

john

(back to [Home])

TreeScope Help Manual (V0.01)

Project
The TreeScope software, help manual, source code (in future!), and other related information are available from the this TreeScope SourceForge project . The publication describing TreeScope (for citation purposes), that includes use-case scenarios based on sequence alignments representing RHDV and HIV will be available here (...). (Posters and presentations will be available here … e.g. student projects).

General Layout
TreeScope provides an interface for interactively exploring multiple sequence alignments, primarily through network and phylogenetic views. The main interface is divided into four sections:
Reference Tree: Displays relationships based on a user-defined portion of the alignment, and allows for the extraction of variant frequencies from selected lineages.

Tab Area: The main workspace, containing features such as network visualization, residue frequencies, tree tiling, polymorphism detection, and recombination analysis.

Navigation Area: Used to scroll through the alignment.

Top Menu: Used to load aligned sequences (in fasta format), accessing help, and information about the software development, licence agreement and version number.

This document describes each of the above areas in more detail.

1. Reference Tree

Overview
The Reference Tree shows a neighbour-joining tree that is initially constructed using sequence fragments spanning the first 25 sites of the alignment. This tree can be readily recreated to increase the number of sites used or to alter its position on the alignment. The location that the reference tree represents, relative to the overall alignment, is indicated by a small horizontal green bar on the Navigation area of the software.

To reposition the Reference Tree, and/or change the length of the window it is created from, the Draw on the bottom left of the tree can be used along with the window size slider. The position is dependent on the region of the alignment currently in view within the Navigation area (also represented by the first visible column of clusters (far left) within the Network tab).

Within the specified region, the tree collapses redundant fragments into single tips while retaining links to their original sequence IDs. For example, a tip representing 20 identical fragments will still allow access to all underlying sequences from which those fragments were extracted.

If a tree tip (or internal node) is selected using the mouse cursor, all represented sequence titles (or descendant sequence titles) within that tip (or lineage if it's an internal node) are listed on the Selected Sequences tab. Additionally, the paths that these sequences take through the alignment network are drawn on the Network View tab, and residue frequencies specific to these sequences are listed on the Residue Frequencies tab. A descriptive text above the tree indicates the current alignment range and basic characteristics of the tree.

User Interaction
(i) Click the Draw button (bottom left) to recreate the tree at the position in the alignment currently in view.

(ii) Select internal nodes to highlight all descendant tips (in green).

(iii) Data associated with highlighted tips will be highlighted within the relevant analysis tab areas (e.g., Residue Frequencies, Inter Cluster Polymorphisms, Tree Exploration, Network View, Selected Sequences).

(iv) Search the tree to find nodes containing specific text (e.g., a year or user-defined keyword). Data associated with tips identified through search will also be highlighted within the tab analysis as in (iii).

(v) Three additional sliders directly above the tree allow the user to adjust Title display width, X-axis scaling, and Y-axis scaling.

Output
Two buttons directly below the tree allow the tree to be saved as a PNG image and for variants used to construct the tree to be exported in tabular form. The latter is useful in classifying variant frequencies at any given alignment location.

The format of the exported variants is as follows is similar to that used within the CView software where the title of each variant follows the following example: >VARIANT_1_FREQUENCY_32, which means that there were 32 identical sequences across the user specified region. Variants are sorted in descending frequency. The sequence itself corresponding to variant_1 is written in fasta format. Once all variants have been listed in this manner the next part of the file lists the sequence titles that were associated with each one. For example, for the above >VARIANT_1_FREQUENCY_32 there would be 32 titles listed where each belonged to one of the sequences that were identical in the specified region. Finally, below these titles the variant sequences are repeated, but this time any character that is identical to the most frequency variant at the top of the list is replaced with a ‘|’ character. This way, these ‘|’ characters can easily be removed, and replaced with a blank space. When printed in currier formatted text (equal widths) and individual characters that were different to the most common variant are then easily identifiable by eye.

Additional Notes
The window size used for tree construction is directly linked to the window size used for network construction. This ensures that trees always align with the first vertical data column (left most column) used in network construction.

2. Tab Areas

Overview
The Tab Areas contains six tabs for in-depth exploration of the alignment: (I) Network View, (II) Residue Frequencies, (III) Inter Lineage Polymorphisms, (IV) Recombination Search, (V) Tree Exploration, (VI) Selected Sequences.

Each of these tabs interacts with the Reference Tree to varying degrees. For example, in the Selected Sequence tab, when lineages are selected on the Reference Tree, all tip titles are explicitly displayed. Within the Inter LineagePolymorphisms tab, the Reference Tree is used directly to select clusters for comparing residue frequencies and within the Residue Frequencies tab frequency of residues present within user a user defined linage (by clicking on an internal node of the Reference Tree) are displayed.

(I) Network View tab

Overview
This tab displays a cluster-based network representing the diversity present within a user-specified section of the alignment. The user specifies the section by selecting the length and number of neighbouring windows using the provided sliders at the bottom of the tab.
Windows are non-overlapping and are placed side by side without gaps. For example, if the user specifies 100 windows of length 100, a region spanning 10,000 sites will extend to the right of the current view position, and the sequence diversity present within each window will be clustered and displayed as a network.

The network is constructed by treating each window as a column of data (anchored on its specific location) and sequence fragments spanning the window are clustered. Clusters in adjacent columns are linked if they share fragments originating from the same underlying sequence (albeit different regions). If the clustering threshold, or the number and size of windows, is changed, the network (and its visualization) will be updated instantly.

Within the visualization each column corresponds to a window of the alignment, circles represent clusters, numbers inside circles indicate how many sequences pass through that cluster and grey lines show connectivity between adjacent windows.

User Interaction
(i) Click a cluster to highlight the path of each sequence through the rest of the network.

(ii)These sequences will also be highlighted on the Reference Tree and on any trees in the Tree Exploration tab.

(iii) Residue frequencies for the sequences within the selected cluster are displayed in the Residue Frequencies tab.

Output:
The network can be saved as a PNG image. Cluster membership relative to individual sequences can be exported under the Selected Titles tab.

(II) Residue Frequencies tab

Overview
Residue frequencies across sequential alignment sites are displayed where: (i) left-hand side are global residue frequencies across all sequences and (ii) right-hand side are frequencies from user-defined subsets of sequences (e.g., sequences descended from a selected node on the Reference Tree, sequences passing through a selected cluster or sequence selected using the search feature).

User Interaction
(i) Users select subsets of sequences via the search box, Reference Tree, or Network and displayed frequencies update instantly to reflect the selected subset.

Output
Per-site residue frequencies (both global and subset) can be saved as a text-based table.

(III) Inter Lineage Polymorphisms tab

Overview
Two user-defined lineages, based on the Reference Tree, can be compared in order to rapidly identify per-site residue frequency differences across sites of the alignment. Although lineages are defined by the user, based on the current Reference Tree, once the sequences have been grouped accordingly groups are maintained across the alignment (beyond the current region that the Reference Tree represents). However, regions and lineages can easily be changed by selecting new lineages on the tree and/or selecting a new location for the Reference Tree. Residues that shift from high frequency within one cluster to a low frequency in another (or vice versa) can then be easily identified within the region or beyond. For example, the user could define lineages based on a region of known divergence (e.g., CXCR4 vs. CCR5 phenotypes of the HIV-1 envelope gene) and then search for potential shifts outside this region that are yet to be characterized (see user-case scenario).

User Interaction
(i) Select two lineages by clicking internal nodes on the Reference Tree. Lineages are allocated according to the “Lineage 1” and “Lineage 2” check boxes at the bottom left hand side of the tab.

(ii) Adjust frequency shift thresholds can be adjusted using the Lower and Upper sliders (e.g., 70/30) located at the bottom right hand side of the tab.

(iii) Sites are annotated as HIGH TO LOW, LOW TO HIGH, or left blank if no significant shift is found.

Output:
Comparisons can be saved as a table.

(IV) Recombination

Overview:
This tab can be used to explore potential recombination breakpoints within individual sequences across the alignment. The user defines a window length as well as a step size using the sliders at the bottom of the tab. Windows are then placed across the alignment and the search for potential recombinant break points begins. Initially for each individual sequence, within each neighbouring pair of windows all pairwise distances relative to all other sequences are extracted and used to calculate a Pearson's correlation coefficient.

The Pearson correlation coefficient (r) measures how similar the pattern of distances are between windows for each sequence. If r = 1, the focus sequence maintains the same relationship to others: no sign of recombination. If r < 1, especially if it drops close to 0 or negative, the focus sequence changes how similar it is to other sequences: suggestive of potential recombination. In other words A low Pearson correlation between adjacent windows indicates that the genetic ancestry of the focus sequence has changed in that region. This supports the hypothesis of a recombination breakpoint around that position.

User Interaction
Use the sliders on the bottom left of the tab to define a window size and step size. Step size allows overlapping windows. Overlapping windows are useful for subtle detection, while non-overlapping windows are more suited for general diversity visualization. Default parameters are window size = 100 and step = 100 (non-overlapping). Once correlations are calculated, the information is displayed as a heatmap on the main tab area.
Use the Threshold slider to dynamically show/hide boxes with correlation values above the threshold.

Output
The heatmap can be saved as a PNG image. The underlying table of correlation values can also be saved in text format.

(V) Tree Tiling

Overview
The Reference Tree defines proportions of the alignment used in data analysis. The Tree Tiling tab allows users to view phylogenetic structure in other parts of the alignment independently from the Reference Tree. Users can scroll to any region of the alignment and create additional trees for visualization. Multiple trees are displayed side by side with each new tree being added to the right hand side of any existing ones. When lineages are selected on the Reference Tree, the sequences represented by the tips are highlighted on the tiled trees. This helps users check whether clusters from the Reference Tree remain consistent across other regions of the alignment. For example, a Reference Tree defined on a region with multiple genotypes (e.g., the V3 region of HIV-1 envelope) can be compared to trees outside this region in order to check if and where the distinct clustering based on genotype across a localized area breaks down. Additionally, users can choose to include all codon positions or restrict tree construction to specific codon positions.

User Interaction:
Trees can be added or removed using the ADD and DEL buttons. The tree window size can be redefined using the slider at the bottom of the tab in an independent manner to the Reference Tree window size. The supplied dropdown box can be used to select what codon positions and codon positions are used for the construction of each tree. Clicking internal nodes of the tiled trees highlights sequences across all trees in this tab, as well as in the Reference Tree, within the network tab and residue frequency tabs..

Output
All tiled trees can be saved as PNG images.

(VI) Selected Sequences tab

Overview

This tab displays sequence titles that have been selected in any part of the software interface. Whenever nodes or clusters are clicked (e.g., in the Reference Tree, Network, or Recombination tab), the titles of linked sequences are displayed here.

User Interaction
Click components in the Reference Tree or other tabs to display associated titles here.

Output
FASTA-formatted sequences associated with each title displayed can be saved to a text file. If a tree node represents 30 identical fragments, all 30 titles will be save along with the complete sequence of each.

3. Navigation Area

Overview
This section, located directly below the tabbed area, allows navigation through the alignment. The entire alignment is visible here, albeit in small text. The text size can be enlarged using the slider on the Residue Frequencies tab. The grey bar corresponds to the currently visible region. Its position is relative to the full alignment (shown by a horizontal grey line with tick marks). Clicking a tick jumps to that alignment position across all tabs. The red arrow buttons on the right step through the alignment incrementally. The size of the increment can be altered on the Residue Frequencies tab.. The yellow bar shows the region represented in the Network tab. This can be altered using the parameters on the Network view tab. The green bar shows the region represented in the Reference Tree, which updates whenever a new tree is generated. Whenever a time consuming action takes place anywhere in the software a progress bar appears in place of the red down arrow button.

4. Top Menu

Overview:
The top menu has been kept minimal as most actions (e.g., saving tables, trees) can be accessed directly within the interface where relevant. The current top menu contains three items: File, Help, and About, where File: loads a new alignment, Help: access documentation for interface components and About: display development information, project links, and the SourceForge repository for code, data, manuals, and manuscript prereleases.


Related

Wiki: Home

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.