Menu

OperaMS

Denis Bertrand Senthil

Welcome to the OperaMS wiki

Introduction

OperaMS is a metagenomic scaffolding pipeline that takes in a metagenomic scaffold graph and aims to output near-complete individual microbial genomes in your environmental sample.

It uses the following strategy: a graph partitioning tool called Sigma is used to decompose the metagenomic scaffolding problem into distinct single genome scaffolding problems that are then solved by the single genome scaffolder Opera.


Installation

  • Download OperaMS from our sourceforge page.
  • In addition, we will depend on two external libraries:
    • Apache Commons Math library commons-math3-3.0.jar which can be downloaded here
    • samtools which could be downloaded here

Typical Usage

Input

  1. Assembled contigs/scaffolds in multi-fasta format
  2. Paired-end reads to be used for scaffolding

How to run OperaMS

1.Reads need to be mapped onto contigs/scaffolds (currently we provide a script that uses bowtie/bwa to do this):

cd path/to/OperaMS
perl bin/preprocess_reads.pl <contig-file> <read-file-1> <read-file-2>
<output-file> (<mapping-tool>)

where mapping-tool is either bowtie(default) or bwa, and read-file-1 and read-file-2 contain paired-end reads in fasta or fastq format. Note that all assembled sequences should be provided in this step to ensure that reads are correctly mapped. Opera has a "contig_size_threshold" parameter to allow users to filter small contigs (default value of 500bp) during the scaffolding step.
Note that the binaries "samtools", "bwa", "bowtie" and "bowtie-build" are assumed to be in the path. If not, the third line of preprocess_reads.pl should be edited appropriately.

2.Provide parameters to OperaMS

cd path/to/OperaMS
perl runOperaMS.pl <configuration-file>

where the configuration-file provides information on the contig file, mapping files and output directory (see below for the format).


Output Format

Scaffolds output by OperaMS can be found in a multi-fasta file "scaffoldSeq.fasta".


Format of the Configuration File

  1. CONTIGS full path to your contigs file in .fasta format.

  2. COVERAGEBAM mapping file used to estimate the contig coverage (we recommend using the .bam mapping file of highest coverage).

  3. LIB full path to the mapping file specifying the location of paired-end reads on the contigs (input to OperaMS; see bin/preprocess_reads.pl). If you have multiple libraries, they can be entered using the same format in different lines.

  4. EDGE_BUNDLESIZE_THRESHOLD Scaffold edges that are supported by less than this number of paired-reads are discarded as noise. By default, we require >= 5 paired-reads to consider an edge as valid.

  5. OUTPUTDIR the directory into which all results are written.

  6. SAMTOOLS full path to the samtools executable.

  7. APACHECOMMONSMATH full path to apache-commons-math3.0.jar

  8. JVMREQ JVM memory requirement. Depending on the complexity of the dataset, you might need to reserve more memory for Sigma's graph handling. We have found that *5GB is enough for the variety of large scale datasets we have handled.

  9. KMER_SIZE the KMER_SIZE used to perform the initial contig assembly (using for example, Velvet or SOAPdenovo).

An Example Configuration File:

CONTIGS path/to/contigs.fa
COVERAGEBAM path/to/lib3.bam    
LIB path/to/lib1.bam
LIB path/to/lib2.bam
LIB path/to/lib3.bam
EDGE_BUNDLESIZE_THRESHOLD   5   #optional
OUTPUTDIR   directory
SAMTOOLS    path/to/samtools
APACHECOMMONSMATH   path/to/commons-math3-3.0.jar
JVMREQ  5g  #optional
KMER_SIZE   31

References

  • OperaMS was developed in the Genome Institute of Singapore and University of Maryland, College Park.
  • OperaMS uses Opera to produce the scaffolds.
  • Contacts:
    • bertrandd@gis.a-star.edu.sg (Denis Bertrand)
    • smuthiah@umiacs.umd.edu (M. Senthil Kumar)
  • Sourceforge Admins:
    Project Admins:

Please feel free to contact us if you find bugs, have suggestions, need help etc. Use the discussion forum, the mailing-list or simply mail us directly.


Related

Wiki: Home

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.