Menu

FinIS wiki

Gao Song Niranjan Nagarajan Denis Bertrand

Welcome to the FinIS wiki!

Introduction

FinIS implements an exact solution for in silico assembly validation and finishing (using quadratic programming), that carefully exploits the shotgun-sequencing information that was left unused by the assembly.

FinIS can be used to simultaneously verify and close gaps in the genome, taking into account repeat-based conflicts and coverage imposed constraints. Unlike assemblers and tools that aim to achieve a similar gap-closure goal through ad hoc heuristics, FinIS optimizes a clear objective function. Also, FinIS is designed to simultaneously improve the contiguity and reliability of the assembly. For a related assembly tool see Opera.


Updates

Release 0.3 (29-Jan-2014)

Changes from Release 0.2:

  • Enable multithreading

It can be downloaded here.

Release 0.2 (20-Dec-2012)

Changes from Release 0.1:

  • Allows for the parsing of a SOAPdenovo assembly as well.

It can be downloaded here.

Release 0.1 (07-Sep-2012)

  • Parses the output of a Velvet assembly for input to FinIS.

It can be downloaded here.


Pre-requisite

MOSEK C++ API

FinIS uses MOSEK C++ API to solve Mixed Integer Quadratic Programs. MOSEK can be downloaded from http://www.mosek.com. It is freely available for academic usage. The free academic license can be requested at http://mosek.com/resources/trial/. Installation manual can be found at http://mosek.com/resources/doc/.


Installation

Suppose the installation directory of MOSEK is <MOSEK>, then the directory containing header files and binary files can be located at <MOSEK>/<version>/tools/platform/<platform>/ (the version and platform in <> should be modified accordingly). Then, type

   bash make.sh <MOSEK>/<version>/tools/platform/<platform>/

in the root directory for FinIS.

An example is shown here:

   bash make.sh ./mosek/6/tools/platform/linux64x86/

Typical Usage

Input

  1. The folder containing assembled contigs and the corresponding contig graph (the current version of FinIS can directly parse the output of Velvet and SOAPdenovo assemblies). For Velvet, there should be a "contigs.fa" file containing assembled contigs in multi-fasta format and a "LastGraph" file containing the contig graph. For SOAPdenovo, there should be a ".contig" file containing assembled contigs in multi-fasta format, a ".preGraphBasic" file containing assembly parameters, a ".updated.edge" file containing the contig graph and a ".scaf" file containing SOAPdenovo scaffolds details (if the scaffolds file contains super-scaffolds that use SOAPdenovo scaffolds as a starting point for scaffolding).

  2. The scaffolds file containing scaffold information in the following format:

    >scaffold_name
    <contig-name> <orientaion (BE/EB)> <contig-length> <gap-size> (<gap-standard-deviation>)
    .
    .
    .

The last column containing the standard deviation of gap sizes is optional. If not specified, the standard deviation of all gaps will be assumed to have the same value.

Running FinIS:

There are two ways to provide parameters to FinIS:

  1. Using the command line

       bin/FinIS <assembly-folder> <scaffolds-file> <output-folder> 
       <num_threads> [mosek_runtime]
    
       <assembly-folder>    The folder containing the assembly produced by Velvet
       <scaffolds-file>     Scaffolds file containing scaffold information
       <output-folder>      Folder to save results
       <num_threads>        Number of OpenMP threads to run with
       [mosek_runtime]      Upper bound on the time mosek is allowed to spend on one task.
    

    For example:

       bin/FinIS test_dataset/velvet/ test_dataset/velvet/scaffolds.scaf 
       test_dataset/velvet/results 10 360
    
  2. Using a configuration file

       bin/FinIS <config-file>
    
       <config-file>        Configuration file
    

    For example:

       bin/FinIS test_dataset/velvet/conf.config
    

    where the configuration file provides information on assembly results, scaffolds file and output directory to use (see below for the format).


Output Format

Filled scaffolds output by FinIS can be found in a multi-fasta file "scaffolds.filled.fasta". Summary of gap-filling statistics can be found in the file "statistics".


Format of Configuration File

An example configuration file can be found in "test_dataset/velvet/conf.config". The main parameters that need to be specified are:

  1. data_directory: the folder containing assembly results of Velvet or SOAPdenovo. For Velvet output, this folder should contain "contigs.fa" and "LastGraph" files. For SOAPdenovo output, this folder should contain ".contig", ".preGraphBasic" and ".updated.edge" files.
  2. scaffolds_file: the scaffold file in the format mentioned in the "Input" section.
  3. output_folder: the directory into which all results are written.
  4. num_threads: the number of threads used by FinIS

Optional parameters that can be specified are:
1. graph_threshold: the threshold of the number of variables of a certain graph to determine if it is too big to be solved using MIQP. By default, it is set to 2000. It can be changed by users, however, bigger value may increase the runtime significantly.
2. solve_big_graph: a switch to determine if the graphs with number of variables larger than the threshold should be solved using other methods or not. If it is set as false, all the gaps in such graphs will remain unfilled.
3. map_file: the mapping file of reads onto contigs for estimating copy number for SOAPdenovo contigs.
4. mosek_runtime: the limits of running time of MOSEK


Test Datasets

There are two test datasets provided with this distribution:

  1. test_dataset/velvet/: this folder contains an assembly produced by Velvet.
  2. test_dataset/soap/: this folder contains an assembly produced by SOAPdenovo.

Additional Information

  1. Non-unique solutions are not filtered out by the current version of FinIS.
  2. The complete pipeline to assemble a genome is as follows:

Related Tools

  1. Opera: Opera (Optimal Paired-End Read Assembler) is a scaffolding program. It can be downloaded here.

References

  • To cite FinIS please use the following citation:

Song Gao, Denis Bertrand, Niranjan Nagarajan. FinIS: Improved in silico Finishing Using an Exact Quadratic Programming Formulation. Lecture Notes in Computer Science, 2012, Volume 7534/2012, 314-325, DOI: 10.1007/978-3-642-33122-0_25.

Please feel free to contact us if you find bugs, have suggestions, need help etc. Use the discussion forum, the mailing-list or simply mail us directly.