Extract the downloaded zipped archive:
unzip Metawatt-3.2,zip
The jar file is in the "dist" folder.
Version 3.2 and later require a specific organization of your data and databases into folders, First, create a folder named "databases".
/shared/path>mkdir databases
This folder can be shared between all users and projects. The database folder will be populated and kept up to date by metawatt. It will at least contain a file with taxonomy data of all reference genomes (reference-taxonomy.txt), a aminoacid fasta file of all predicted open reading frames (reference-genomes.faa) and a file with hidden markov models of conserved single copy genes used for bin completeness assessment (conserved-genes.hmm). For a quick start you can download a database folder containing bacterial, archaeal, eukaryotic and viral reference genomes, created with metawatt, with default settings, in March 2015 here. You still need to run the module "Update databases" in metawatt, as it does by default, but it will save you two hours of runtime in your first run of metawatt.
/shared/path/to>wget http://coe30.coe.ucalgary.ca/~mstrous/metawatt-databases.tar.gz
/shared/path/to>tar xvfz metawatt-databases.tar.gz
If you choose to start from scratch, the metawatt program comes with a database folder that contains two .hmm files, "conserved-genes.hmm" and "rrna.hmm". See [Update databases] for metawatt creates and updates databases.
Next, create a folder for your binning project:
/my/path>mkdir projectname
Inside your project create a folder for your input files (assembled contigs/scaffolds and all read files you want to use for differential coverage based binning), for your output files and a softlink to the database folder:
/my/path/projectname>mkdir input
/my/path/projectname>mkdir output
/my/path/projectname>ln -s /shared/path/to/databases .
Place one or more fasta files with assembled contigs and, optionally, one or more fastq files of sequencing reads in the input folder. Fastq files can be gunzipped. Metawatt automatically determines whether a file is a fasta file or (gzipped) fastq files and will sort out whether fastq files contain paired reads or not.
Make sure that all metawatt's dependencies are in your system's path (java does not always finds programs correctly when they are in your user path). Metawatt depends on:
Now you are ready to run Metawatt. To view command line parameters type:
java -jar /path/to/Metawatt-3.2 --help
If you want to check dependencies:
java -jar /path/to/Metawatt-3.2 --check-dependencies
If you would like to run metawatt interactively:
java -jar /path/to/Metawatt-3.2
You can open projects with the file menu, by selecting the project folder, which, when properly set up, is indicated with a red dot in the file-open dialog.
If you would like to run your newly created project interactively or explore the binning results:
java -jar /path/to/Metawatt-3.2 --explore /my/path/projectname
If you would like to run the metawatt pipeline on the command line without opening the graphical user interface:
java -jar /path/to/Metawatt-3.2 --run /my/path/projectname
Unless your project is very small, you probably need to allocate more memory and use more threads/processors:
java -Xmx8g -jar /path/to/Metawatt-3.2 --run /my/path/projectname --threads 24
If you have completed the run on your server, you can copy the project folder (including all three subfolders input, output and databases) to your local pc to explore the results:
/local/pc>scp username@my.server.com:/my/path/projectname .
/local/pc>java -Xmx8g -jar /path/to/Metawatt-3.2 --explore /local/pc/projectname
Metawatt uses a "temp" folder to store intermediate files. By default, this will be "/tmp/metawatt". Some systems (e.g. arch linux), use a virtual file system for /tmp, a filesystem that resides in memory. This can speed up the pipeline, but it can also lead to a "out of memory error". Also, when multiple users are using metawatt simultaneously on the same server, it will lead to file access problems: only the first user will have access to /tmp/metawatt. To use a custom temp folder, use:
java -jar /path/to/Metawatt-3.2 --run /my/path/projectname --temp-folder /my/temp/folder
In case of problems, or to monitor progress, metawatt maintains a logbook in /my/path/projectname/metawatt-logbook.txt
As metawatt loads your project, it will open the project file "/my/path/projectname/metawatt-project.xml". in this file you can configure all module options and instruct metawatt to skip specific modules. This file can look like this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<actions>
<action descr="The number of processors/cores used for computations" name="setProcessorsUsed">
<content type="positive_int">4</content>
</action>
<action descr="Block size affects memory usage (see Diamond manuel)." name="setDiamondBlockSize">
<content type="double">8.000000e-01</content>
</action>
<module name="Update databases" status="Skipped"/>
<module name="Predict coverage, %GC, coding density" status="Scheduled"/>
<module name="Classify with diamond blastx" status="Scheduled"/>
<module name="Predict tRNAs" status="Scheduled"/>
<module name="Six frame Pfam" status="Scheduled"/>
<module name="Map reads to contigs" status="Scheduled"/>
<module name="Compute N4 frequencies" status="Scheduled"/>
<module name="Bin with tetranucleotides" status="Scheduled"/>
<module name="Optimize bins" status="Scheduled"/>
<module name="Polish bins" status="Scheduled"/>
<module name="Make Bin Shortlist" status="Scheduled"/>
<module name="Calculate bin phylogeny" status="Skipped"/>
<module name="Export bins" status="Scheduled"/>
</actions>
You can set the options of all modules in this file. If you save your project with the graphical user interface, this is the file that will be overwritten. If you perform manual editing of bins, all edits will also be stored in this file. See [Pipeline modules] for an overview of all modules and their options. You can also set any parameter on the command line, for example:
java -jar /path/to/Metawatt-3.2 --run /my/path/projectname --setDiamondBlockSize 0.8
This example shows that thee name of any parameter in the project file is identical to the corresponding command-line parameter.
By default, metawatt will not overwrite previously generated results. If you would like to overwrite all previous results, invoke metawatt as follows:
java -jar /path/to/Metawatt-3.2 --run /my/path/projectname --force
If you would like to rerun specific modules, set the status of those module to "Force overwrite" in the project file:
...
<module name="Map reads to contigs" status="Force redo"/>
...