This version fixes a very significant bug that may have led to suboptimal binning results. Part of the HMM profiles was ignored and was not used to asses bin completeness. The best solution is to rerun the module "Six Frame PPFAM" using version 3.5.2.
YOU DO NOT NEED A FRESH PROJECT WHEN UPDATING FROM VERSION 3.2 and onward.
[Getting Started] (including installation, dependencies)
The pipeline has a modular architecture and can be easily customized to include or exclude features analysed. Help topics for version 3.1:
For the algorithm and software please cite:
*Strous M, Kraft B, Bisdorf R, TegetMeyer H (2012) The binning of metagenomic contigs for microbial physiology of mixed cultures. Frontiers in Microbial Physiology and Metabolism 3:410. doi: 10.3389/fmicb.2012.00410.
For questions, problems and comments, please use the forum!
Credits:
The Metawatt binner was developed by Marc Strous with help of Regina Bisdorf. The support of the Max Planck Society, the European Research Council, the Bundesland Nord Rhein West Falen is gratefully acknowledged. Many thanks also to Halina Tegetmeyer, Beate Kraft, Harald Gruber-Vodicka, Xiaoli Dong, Manuel Kleiner and Dimitri Meier for providing valuable suggestions and metagenomes for testing. Many thanks also to Lizbeth Sayavedra for help in porting Metawatt to iOS.
(C) Marc Strous, Calgary 2015
This version introduces the following improvements:
This version introduces the following improvements:
This version introduces the following improvements:
MetaWatt now also runs on windows. Typically, you would still compute contig properties on a unix server (because of the dependencies and computational requirements) but exploration and binning modules could be done on a windows machine
Fewer and less arbitrary parameters, resulting in improved binning.
Fixing of minor bugs.
Addition of a coverage distro of bin contigs.
This version introduces the following minor improvements:
Much faster loading of projects
Slightly faster binning and much faster optimization of bins
More responsiveness in user interface
(Optional) fetching of reference genomes with wget (enables database updates with a proxy server).
Fixing of minor bugs, mainly related to shortlisting of bins
This version introduces the following major changes:
Database and taxonomy file creation and maintenance/updating is now fully automated
With a new project folder layout, the project is created fully automatically
Drastic reduction of disk space required. In addition, Metawatt now also handles gzipped read files
Integration of gene-modules into main pipeline. Metawatt now produces a phylogenetic tree including all reasonable bins, based on concatenated protein alignments of conserved genes
Multiple HMM profile files can be added to a project
Many minor fixes and improvements to the graphical user interface
Major code cleanup
Complete support for command line, fully automated binning on large servers
PLEASE START WITH A FRESH PROJECT WHEN USING THIS VERSION.
This version introduces the following major changes:
Better binning because of improved interaction and arbitration between tetranucleotide and coverage binning
Binning is saved in SIBCI format enabling export to other software and import of binning results from other programs into MetaWatt.
Any set of Pfam profiles can now be used to assess completeness or other bin properties, not just the predefined set supplied with MetaWatt.
You can now also mouse over bin contours enabling more precise editing.
Several minor improvements and fixes.
This version introduces the following major changes:
Differential coverage based binning is now integrated into the binning pipeline. The implementation is effective and fast. For example, combined tetranucleotide and coverage binning of a >30 Mb metagenome is completed within 30 seconds.
Read mapping information that links contigs is also used during binning.
A Bin optimizer module has been added that destroys bins with poor quality and merges oversplit bins. Decisions are based on phylogentic consistency, single copy conserved gene complementarity, GC content, coverage, coding density and degree of connectedness based on read mapping.
IMM binning has been removed entirely.
Metawatt now generally produces good bins without user supervision, even with standard settings, so implementation of command line usage has been improved to enable automated, command line binning.
Changes to the user interface to improve optimizing of the binning.
This version adopts a number of changes "under the hood" in the way the blastx and discovery-of-conserved-genes results are processed. Also, a small error that crippled version 2.2 was fixed.
This version introduces a number of improvements that dramatically speed up many steps. All modules should now complete in minutes, even for large datasets.
Adopted diamond for ultrafast and sensitive classification of contigs (replaced blastn and blastp classification).
Added the option to set BBMap's "fast=t" parameter for faster mapping of reads to contigs.
Created a dedicated pfam database for detection of conserved genes, speeding up this module.
Two changes to GC versus coverage plot: Added the option to plot coding density versus coverage for detection of Eukaryotes and increased maximum scale of coverage to 400x. At present the GC versus coverage plot still has some issues with proper display of scale bars which will be fixed eventually but by dragging the scale bars/setting zoom it is already easy to work around these problems.
The stdbuf command that is now only used in using the glimmer programs (IMM binning) because it is not available on many platforms and complicated successful installation for many users.
Fixed a bug that caused an empty GC/coverage plot in some locales.
Added an additional slider to the Gc versus coverage plot to enable downsampling of contigs for plotting (useful for extremely large metagenomes).
Updated to RDP classifier version 2.7.
Updated to usearch version 7.0
Added an option to customize the way external commands are called (Mac OS Users, remove replace "stdbuf -o0" with " ").
The Metawatt Binner is a graphical Java program for the binning of metagenomic contigs and evaluation of the binning results. Version 2.0 has the following new features:
Taxonomic classification performed with blastn and/or blastp.
Additional assessment of bin quality by analysis of the number conserved single copy genes and by counting the number of transfer RNA genes.
Analysis of the genetic code used in each bin.
Use of multiple raw sequencing read files to compute coverage for each bin in each assembly for each readset.
Read mapping results are used to compute connections between contigs based on paired end and/or single end reads and these connections are used to "polish" binning results by rebinning unambiguously connected contigs to the correct bins. The polishing algorithm also makes use of unambiguous taxonomic classifications of contigs.
Redesigned coverage versus GC content plot also allows plotting coverage versus coverage for different readsets, trimming bins to "blobs" and creation of new bins from "blobs".
Detection and analysis of 16S rDNA genes, including fast construction of phylogenetic trees.
More responsive user interface.
Long contigs are properly binned (sometimes long contigs, >100 kb, were ignored in previous versions).
Console version to enable running the pipeline separately (e.g. on a computing cluster or for benchmarking) without the user interface.
Wiki: Explanations of files generated
Wiki: Getting Started
Wiki: Getting started with the user interface
Wiki: How the binning is done
Wiki: Installation and dependencies
Wiki: Pipeline modules
Wiki: Strategy hints
Wiki: User interface