Welcome to the FinIS wiki!
FinIS implements an exact solution for in silico assembly validation and finishing (using quadratic programming), that carefully exploits the shotgun-sequencing information that was left unused by the assembly.
FinIS can be used to simultaneously verify and close gaps in the genome, taking into account repeat-based conflicts and coverage imposed constraints. Unlike assemblers and tools that aim to achieve a similar gap-closure goal through ad hoc heuristics, FinIS optimizes a clear objective function. Also, FinIS is designed to simultaneously improve the contiguity and reliability of the assembly. For a related assembly tool see Opera.
Changes from Release 0.2:
It can be downloaded here.
Changes from Release 0.1:
It can be downloaded here.
It can be downloaded here.
FinIS uses MOSEK C++ API to solve Mixed Integer Quadratic Programs. MOSEK can be downloaded from http://www.mosek.com. It is freely available for academic usage. The free academic license can be requested at http://mosek.com/resources/trial/. Installation manual can be found at http://mosek.com/resources/doc/.
Suppose the installation directory of MOSEK is <MOSEK>, then the directory containing header files and binary files can be located at <MOSEK>/<version>/tools/platform/<platform>/ (the version and platform in <> should be modified accordingly). Then, type
bash make.sh <MOSEK>/<version>/tools/platform/<platform>/
in the root directory for FinIS.
An example is shown here:
bash make.sh ./mosek/6/tools/platform/linux64x86/
The folder containing assembled contigs and the corresponding contig graph (the current version of FinIS can directly parse the output of Velvet and SOAPdenovo assemblies). For Velvet, there should be a "contigs.fa" file containing assembled contigs in multi-fasta format and a "LastGraph" file containing the contig graph. For SOAPdenovo, there should be a ".contig" file containing assembled contigs in multi-fasta format, a ".preGraphBasic" file containing assembly parameters, a ".updated.edge" file containing the contig graph and a ".scaf" file containing SOAPdenovo scaffolds details (if the scaffolds file contains super-scaffolds that use SOAPdenovo scaffolds as a starting point for scaffolding).
The scaffolds file containing scaffold information in the following format:
>scaffold_name
<contig-name> <orientaion (BE/EB)> <contig-length> <gap-size> (<gap-standard-deviation>)
.
.
.
The last column containing the standard deviation of gap sizes is optional. If not specified, the standard deviation of all gaps will be assumed to have the same value.
There are two ways to provide parameters to FinIS:
Using the command line
bin/FinIS <assembly-folder> <scaffolds-file> <output-folder>
<num_threads> [mosek_runtime]
<assembly-folder> The folder containing the assembly produced by Velvet
<scaffolds-file> Scaffolds file containing scaffold information
<output-folder> Folder to save results
<num_threads> Number of OpenMP threads to run with
[mosek_runtime] Upper bound on the time mosek is allowed to spend on one task.
For example:
bin/FinIS test_dataset/velvet/ test_dataset/velvet/scaffolds.scaf
test_dataset/velvet/results 10 360
Using a configuration file
bin/FinIS <config-file>
<config-file> Configuration file
For example:
bin/FinIS test_dataset/velvet/conf.config
where the configuration file provides information on assembly results, scaffolds file and output directory to use (see below for the format).
Filled scaffolds output by FinIS can be found in a multi-fasta file "scaffolds.filled.fasta". Summary of gap-filling statistics can be found in the file "statistics".
An example configuration file can be found in "test_dataset/velvet/conf.config". The main parameters that need to be specified are:
Optional parameters that can be specified are:
1. graph_threshold: the threshold of the number of variables of a certain graph to determine if it is too big to be solved using MIQP. By default, it is set to 2000. It can be changed by users, however, bigger value may increase the runtime significantly.
2. solve_big_graph: a switch to determine if the graphs with number of variables larger than the threshold should be solved using other methods or not. If it is set as false, all the gaps in such graphs will remain unfilled.
3. map_file: the mapping file of reads onto contigs for estimating copy number for SOAPdenovo contigs.
4. mosek_runtime: the limits of running time of MOSEK
There are two test datasets provided with this distribution:
Song Gao, Denis Bertrand, Niranjan Nagarajan. FinIS: Improved in silico Finishing Using an Exact Quadratic Programming Formulation. Lecture Notes in Computer Science, 2012, Volume 7534/2012, 314-325, DOI: 10.1007/978-3-642-33122-0_25.
FinIS was developed in the Genome Institute of Singapore and National University of Singapore.
Contact: gaosong@nus.edu.sg (Song GAO)
Sourceforge Admins:
Please feel free to contact us if you find bugs, have suggestions, need help etc. Use the discussion forum, the mailing-list or simply mail us directly.