Welcome to the OperaMS wiki
OperaMS is a metagenomic scaffolding pipeline that takes in a metagenomic scaffold graph and aims to output near-complete individual microbial genomes in your environmental sample.
It uses the following strategy: a graph partitioning tool called Sigma is used to decompose the metagenomic scaffolding problem into distinct single genome scaffolding problems that are then solved by the single genome scaffolder Opera.
1.Reads need to be mapped onto contigs/scaffolds (currently we provide a script that uses bowtie/bwa to do this):
cd path/to/OperaMS
perl bin/preprocess_reads.pl <contig-file> <read-file-1> <read-file-2>
<output-file> (<mapping-tool>)
where mapping-tool is either bowtie(default) or bwa, and read-file-1 and read-file-2 contain paired-end reads in fasta or fastq format. Note that all assembled sequences should be provided in this step to ensure that reads are correctly mapped. Opera has a "contig_size_threshold" parameter to allow users to filter small contigs (default value of 500bp) during the scaffolding step.
Note that the binaries "samtools", "bwa", "bowtie" and "bowtie-build" are assumed to be in the path. If not, the third line of preprocess_reads.pl should be edited appropriately.
2.Provide parameters to OperaMS
cd path/to/OperaMS
perl runOperaMS.pl <configuration-file>
where the configuration-file provides information on the contig file, mapping files and output directory (see below for the format).
Scaffolds output by OperaMS can be found in a multi-fasta file "scaffoldSeq.fasta".
CONTIGS full path to your contigs file in .fasta format.
COVERAGEBAM mapping file used to estimate the contig coverage (we recommend using the .bam mapping file of highest coverage).
LIB full path to the mapping file specifying the location of paired-end reads on the contigs (input to OperaMS; see bin/preprocess_reads.pl). If you have multiple libraries, they can be entered using the same format in different lines.
EDGE_BUNDLESIZE_THRESHOLD Scaffold edges that are supported by less than this number of paired-reads are discarded as noise. By default, we require >= 5 paired-reads to consider an edge as valid.
OUTPUTDIR the directory into which all results are written.
SAMTOOLS full path to the samtools executable.
APACHECOMMONSMATH full path to apache-commons-math3.0.jar
JVMREQ JVM memory requirement. Depending on the complexity of the dataset, you might need to reserve more memory for Sigma's graph handling. We have found that *5GB is enough for the variety of large scale datasets we have handled.
KMER_SIZE the KMER_SIZE used to perform the initial contig assembly (using for example, Velvet or SOAPdenovo).
CONTIGS path/to/contigs.fa
COVERAGEBAM path/to/lib3.bam
LIB path/to/lib1.bam
LIB path/to/lib2.bam
LIB path/to/lib3.bam
EDGE_BUNDLESIZE_THRESHOLD 5 #optional
OUTPUTDIR directory
SAMTOOLS path/to/samtools
APACHECOMMONSMATH path/to/commons-math3-3.0.jar
JVMREQ 5g #optional
KMER_SIZE 31
Please feel free to contact us if you find bugs, have suggestions, need help etc. Use the discussion forum, the mailing-list or simply mail us directly.