Menu

How the binning is done

Marc Strous

back to main page

How the binning is done

The Metawatt binner provides a seven-step binning solution:

  1. The assembled contigs are sorted by length (from long to short), and the tetranucleotide composition of each contig is calculated and saved in a separate file. This calculation does not take much time (seconds to minutes).

  2. The assembled contigs are "annotated". This is the most time-consuming step. It depends on BLAST. During annotation, the GC content and percent coverage (parsed from the header lines of the fasta file) are calculated for each contig. Also, each contig is fragmented into small (default 500 bp) DNA fragments and these fragments are BLASTed into a database with full length reference genome DNA sequences. Based on the blast results of the fragments a taxonomic profile is created for each contig. The annotations are saved in a separate file.

  3. The contigs are binned based on a multivariate statistical analysis of the tetranucleotide frequencies for each contig. The program makes use of an empirical relationship between the mean frequency of a source genome and the standard deviation of the frequency observed in sampled DNA fragments obtained from the source genome (see citation for a more complete explanation). Binning does not take much time (seconds to minutes).

  4. The contig annotations are used to visualize a taxonomic signature for each bin.

  5. You, the user, selects "good" bins to train hidden markov models. This depends on Glimmer. "Good" bins are bins with long contigs, if possible, a equal coverage distribution and a consistent taxonomic signature.

  6. All contigs are rebinned with the set of trained hidden markov models.

  7. Detailed inspection of the final bins and, if desired, manual editing.


Related

Wiki: Home