Metassembler: merging and optimizing de novo genome assemblies
September 29, 2015
-------------------------------------------------------------
Alejandro Hernandez Wences and Michael C Schatz
Simons Center for Quantitiatve Biology
Cold Spring Harbor Laboratory
Cold Spring Harbor, NY
Typically de novo genome sequencing projects generate multiple assemblies of the
same sample using different softwares and/or different parameters of the same
software. Instead of discarding the extra assemblies, Metassembler merges them
to the top assembly using mate-pair information and whole-genome alignments, in
order to generate a single superior assembly. The final assembly will combine
the best locally superior assemblies throughout the genome.
Please cite:
Metassembler: merging and optimizing de novo genome assemblies
Wences, AH Schatz MC (2015) Genome Biology 16:207. doi:10.1186/s13059-015-0764-4
http://www.genomebiology.com/2015/16/1/207
INSTALLATION:
-------------
Metassembler requires the following external programs to be installed:
1) MUMmer whole genome alignment pakcage
2) bowtie2
3) samtools
4) python 2.7
The argparse python module must also be installed:
https://pypi.python.org/pypi/argparse
For general instructions on installing python packages in standard and
non-standard locations please refer to: http://docs.python.org/2/install/
If these requirements are met then, under unix like systems, type 'make' in
the 'Metassembler/' root directory.
MANUAL
------
Details on how to use the wrapper metassemble are given in MANUAL
Please also check the detailed Manual here:
https://sourceforge.net/projects/metassembler/files/Metassemble_manual.pdf/download
SAMPLE DATA:
------------
A sample data is provided for testing the installation and for familiarizing
with Metassembler. It consists of two alternate assemblies A.fa and B.fa
generated from the first ~250kb Staphylococcus aureus genome with some
simulated differences.
There are two ways in which you can run the Metassembler, the easiest way is
using the wrapper 'metassemble' which takes as input a configuration file.
In Sample/meta1 run:
./Metassemble_script.sh
This will create a configuration file and run metassemble for A.fa and B.fa
A directory MergeMetassemble/ will be created. This will contain all the
information used in the metassembly process as well ass the final results.
The general layout of the output directory and the description of the important
files contained in it is found in MANUAL. In particular you will find a
description of the *.metassem file which contains instructions on how the
metassembly final sequence is constructed. In this sample data we expect
that the metassembly sequence is composed of assembly A sequence revised
with assembly B insertions.
The other way to perform the Metassembly is running all the processes
step-by-step.
In Sample/meta1 run:
./Step_by_Step_script.sh
This will run each of processes in turn, including the computation of the
CE-statistic for the starting assemblies and the whole genome alignment
using the nucmer program from the MUMmer package.
The resulting metassembly should be a single contig with deletions in
assembly A corrected using sequence from assembly B.
Special Thanks:
Paul Baranay and Scott Emrich