metawatt Wiki

Binner for assembled metagenomes

Status: Beta

Brought to you by: kinestetika

User interface

Authors:

back to main page

User interface

Binning panel

There are two menus on the menubar at the top of the screen: File and View.

On the file menu you find the standard items for loading and saving project files. The project files have the xml format and the extension ".metawattproject". Here you also find an item "Check dependencies" where you can test whether Metawatt correctly calls all the external programs used by various [Pipeline modules].

In the view menu, you can change two options related to how many taxa are shown in pie charts.

Below the menus you find a toolbar where you can add or remove assemblies, readsets and HMM profiles for gene detection to the project. By clicking these items you can also select which assembly or readset is displayed. When you would like to see the coverages as they were parsed from the assembler output, you should select "Assembly" as the readset. Further to the right, you can start or stop the pipeline.

Below the toolbar you find four tabs: Binning, Genes, Contigs and Pipeline.

The binning tab

On the top left side you find a table that lists the bins and can be used to select bins for inspection and manipulation. The table has eight tabs: All contigs, Bins, Shortlist, phylum, class, order, family, genus. "All contigs" shows a single bin that contains all contigs. "Bins" shows the bins generated for each assembly, "Shortlist" shows the bins that you, manually, shortlisted. Shortlisted bins can originate from any assembly. These shortlisted bins can be edited, but you cannot combine contigs from different assemblies into a single bin. The next five tabs show bins automatically created for each taxon, at different taxonomic levels. Once you select one of these tabs, the taxonomic level for display of taxonomic information in other parts of the user interface is set to the level of the tab you last selected.

The table shows some properties of each bin, such as its size (Mb), N50 contig length, and coverage. You can sort the tables by any of the table columns by clicking on the column header. Some values are shown as xxx-xxx. These two values correspond to the lower and upper quartile of the distribution. That means that 50% of the data in the bin is within these two values. E.D. means edit distance, it is the average difference measured between the assembly sequence and the sequencing reads detected during read mapping. This distance could result from sequencing errors or microdiversity. Coverage, percent abundance and edit distance may be different for each read set and the values shown correspond to the readset selected.

When you click on a row (bin) in these tables, that bin gains the focus, and the content below and to the right of the table changes. When you right click a bin in the table a popup menu appears that allows you to shortlist bins and export data.

Shortlisted bins can be edited, and you can also use the menu to merge two bins. With the final item on the menu, "Undo", you can undo your latest edit to shortlisted bins. Metawatt keeps track of all your edit actions.

In the top-right side of the screen you find the GC versus coverage plot. This is a contour plot calculated with the distribution of the contigs on the plot. A "cloud" on this plot often (but not always) corresponds to a single bin. The contour lines indicate the "global" distribution of all contigs of the assembly and the blue filled area indicates the distribution of the contigs of the currently selected bin.

You can change the position of vertical separator between the GC coverage plot and the list of bins so that both have a size that works for your screen. Similarly, you can move the position of horizontal separator between the GC coverage plot/bin list and the panel below.

The plot has a number of sliders and options to the right that you can use to improve visualization of your data.

When you mouse over the contours, they become bold and Metawatt calculates and shows all properties of the contigs contained within the active contour. This way you can effectively explore your data.

By right clicking the plot, you can export it to create a figure and find various options for bin editing (only shortlisted bins).

The plot shows coverages for the currently selected readset. You can also plot coverages of one readset against coverages of another readset or plot against coding density.

In the lower half of the screen you find a panel that shows some details of the currently selected bin (or contour in the GC-versus-coverage plot). From left to right you find a bar diagram with the length distribution of the contigs, a pie diagram with the taxonomic distribution, a list with some properties of the bin, and the codon usage pattern inferred for the selected bin.

In the center panel you find the precise number of contigs, bin size (nt), N50 contig length, percent GC, the estimated degree of completeness, the number of transfer rNA detected, the coding density, and the mapped overlap between contigs in and outside the bin. The degree of completeness is estimated based on the presence of 139 conserved genes (see module [Six Frame PFAM]. The three values shown indicate the percentage of aminoacid positions detected for this set of genes, when the (1) threshold is taken into account and (2) when it is not taken into account, and (3) the percentage of aminoacid positions that was found more than once. The mapped overlap shows the number of kb that was found (during read mapping) to link contigs within this bin and the number of kb that links the contigs in this bin with contigs outside the bin.

Then, the coverage, edit distance and percent abundance are shown for each readset mapped.

Finally, the panel shows some statistics on how the bin was created during binning. It shows the relative importance of tetranucleotide and differential coverage binning, as well as the degree of disagreement between the two methods ("Ambiguous") and how this disagreement was resolved.

The genes tab

On the left side you find a table with the detected genes. On the right hand side, you find a panel that displays a phylogenetic tree calculated from a concatenated alignment of all the conserved single copy genes detected for the bin.

Detection of 16S rRNA genes

The contigs tab

This tab shows a table of all the contigs in the assembly, their properties and to which bin they belong. If you have a bin selected only those contigs that are in this bin are shown. If you select contigs, their position will be indicated with red contours on the GC versus coverage plot. With a popup menu you can rebin contigs to different shortlisted bins.

The pipeline tab

Here you can set the parameters of the [Pipeline modules], run individual modules or the entire pipeline and view the logbook. If any errors occur, they should be reported in the logbook.

Wiki: Home
Wiki: Pipeline modules
Wiki: Six Frame PFAM

metawatt Wiki

Binner for assembled metagenomes

User interface

User interface

The binning tab

The genes tab

The contigs tab

The pipeline tab

Related