metawatt Wiki

Binner for assembled metagenomes

Status: Beta

Brought to you by: kinestetika

Getting Started

Authors:

back to main page

installation

Extract the downloaded zipped archive:

unzip Metawatt-3.2,zip

The jar file is in the "dist" folder.

project structure

Version 3.2 and later require a specific organization of your data and databases into folders, First, create a folder named "databases".

/shared/path>mkdir databases

This folder can be shared between all users and projects. The database folder will be populated and kept up to date by metawatt. It will at least contain a file with taxonomy data of all reference genomes (reference-taxonomy.txt), a aminoacid fasta file of all predicted open reading frames (reference-genomes.faa) and a file with hidden markov models of conserved single copy genes used for bin completeness assessment (conserved-genes.hmm). For a quick start you can download a database folder containing bacterial, archaeal, eukaryotic and viral reference genomes, created with metawatt, with default settings, in March 2015 here. You still need to run the module "Update databases" in metawatt, as it does by default, but it will save you two hours of runtime in your first run of metawatt.

/shared/path/to>wget http://coe30.coe.ucalgary.ca/~mstrous/metawatt-databases.tar.gz
/shared/path/to>tar xvfz metawatt-databases.tar.gz

If you choose to start from scratch, the metawatt program comes with a database folder that contains two .hmm files, "conserved-genes.hmm" and "rrna.hmm". See [Update databases] for metawatt creates and updates databases.

Next, create a folder for your binning project:

/my/path>mkdir projectname

Inside your project create a folder for your input files (assembled contigs/scaffolds and all read files you want to use for differential coverage based binning), for your output files and a softlink to the database folder:

/my/path/projectname>mkdir input
/my/path/projectname>mkdir output
/my/path/projectname>ln -s /shared/path/to/databases .

Place one or more fasta files with assembled contigs and, optionally, one or more fastq files of sequencing reads in the input folder. Fastq files can be gunzipped. Metawatt automatically determines whether a file is a fasta file or (gzipped) fastq files and will sort out whether fastq files contain paired reads or not.

dependencies

Make sure that all metawatt's dependencies are in your system's path (java does not always finds programs correctly when they are in your user path). Metawatt depends on:

Prodigal
diamond (version (>0.7)
aragorn
hmmer3.1
BBMap
USearch
MAFFT
FastTreeMP (You need the parallel version!)
wget (for database updating, if you're behind a proxy).

Now you are ready to run Metawatt. To view command line parameters type:

java -jar /path/to/Metawatt-3.2 --help

If you want to check dependencies:

java -jar /path/to/Metawatt-3.2 --check-dependencies

running metawatt

If you would like to run metawatt interactively:

java -jar /path/to/Metawatt-3.2

You can open projects with the file menu, by selecting the project folder, which, when properly set up, is indicated with a red dot in the file-open dialog.

If you would like to run your newly created project interactively or explore the binning results:

java -jar /path/to/Metawatt-3.2 --explore /my/path/projectname

If you would like to run the metawatt pipeline on the command line without opening the graphical user interface:

java -jar /path/to/Metawatt-3.2 --run /my/path/projectname

Unless your project is very small, you probably need to allocate more memory and use more threads/processors:

java -Xmx8g -jar /path/to/Metawatt-3.2 --run /my/path/projectname --threads 24

If you have completed the run on your server, you can copy the project folder (including all three subfolders input, output and databases) to your local pc to explore the results:

/local/pc>scp username@my.server.com:/my/path/projectname .
/local/pc>java -Xmx8g -jar /path/to/Metawatt-3.2 --explore /local/pc/projectname

temp folder

Metawatt uses a "temp" folder to store intermediate files. By default, this will be "/tmp/metawatt". Some systems (e.g. arch linux), use a virtual file system for /tmp, a filesystem that resides in memory. This can speed up the pipeline, but it can also lead to a "out of memory error". Also, when multiple users are using metawatt simultaneously on the same server, it will lead to file access problems: only the first user will have access to /tmp/metawatt. To use a custom temp folder, use:

java -jar /path/to/Metawatt-3.2 --run /my/path/projectname --temp-folder /my/temp/folder

log file

In case of problems, or to monitor progress, metawatt maintains a logbook in /my/path/projectname/metawatt-logbook.txt

project file

As metawatt loads your project, it will open the project file "/my/path/projectname/metawatt-project.xml". in this file you can configure all module options and instruct metawatt to skip specific modules. This file can look like this:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<actions>
<action descr="The number of processors/cores used for computations" name="setProcessorsUsed">
<content type="positive_int">4</content>
</action>
<action descr="Block size affects memory usage (see Diamond manuel)." name="setDiamondBlockSize">
<content type="double">8.000000e-01</content>
</action>
<module name="Update databases" status="Skipped"/>
<module name="Predict coverage, %GC, coding density" status="Scheduled"/>
<module name="Classify with diamond blastx" status="Scheduled"/>
<module name="Predict tRNAs" status="Scheduled"/>
<module name="Six frame Pfam" status="Scheduled"/>
<module name="Map reads to contigs" status="Scheduled"/>
<module name="Compute N4 frequencies" status="Scheduled"/>
<module name="Bin with tetranucleotides" status="Scheduled"/>
<module name="Optimize bins" status="Scheduled"/>
<module name="Polish bins" status="Scheduled"/>
<module name="Make Bin Shortlist" status="Scheduled"/>
<module name="Calculate bin phylogeny" status="Skipped"/>
<module name="Export bins" status="Scheduled"/>
</actions>

You can set the options of all modules in this file. If you save your project with the graphical user interface, this is the file that will be overwritten. If you perform manual editing of bins, all edits will also be stored in this file. See [Pipeline modules] for an overview of all modules and their options. You can also set any parameter on the command line, for example:

java -jar /path/to/Metawatt-3.2 --run /my/path/projectname --setDiamondBlockSize 0.8

This example shows that thee name of any parameter in the project file is identical to the corresponding command-line parameter.

overwriting previous results

By default, metawatt will not overwrite previously generated results. If you would like to overwrite all previous results, invoke metawatt as follows:

java -jar /path/to/Metawatt-3.2 --run /my/path/projectname --force

If you would like to rerun specific modules, set the status of those module to "Force overwrite" in the project file:

...
<module name="Map reads to contigs" status="Force redo"/>
...

Wiki: Home
Wiki: Pipeline modules
Wiki: Update databases