Download Latest Version v0.3.tar.gz (40.1 MB)
Email in envelope

Get an email when there's a new version of FinIS

Home / v0.1
Name Modified Size InfoDownloads / Week
Parent folder
README 2012-09-07 5.5 kB
finis_v0.1.tar.gz 2012-09-07 31.8 MB
Totals: 2 Items   31.8 MB 0
README for FinIS (Finishing In-Silico)

Author: Song Gao
Date: 6th September 2012
Version: 0.1

Pre-requisite
----------------------------------------------------------------------------------------------------
1. MOSEK C++ API
   FinIS uses MOSEK C++ API to solve Mixed Integer Quadratic Programs. MOSEK can be downloaded 
   from http://www.mosek.com. It is freely available for academic usage. The free academic license 
   can be requested at http://mosek.com/resources/trial/. Installation manual can be found at
   http://mosek.com/resources/doc/. 

Installation
----------------------------------------------------------------------------------------------------

Suppose the installation directory of MOSEK is <MOSEK>, then the directory containing header files 
and binary files can be located at <MOSEK>/<version>/tools/platform/<platform>/ (the version and 
platform in <> should be modified accordingly). Then, type 
	 bash make.sh $HOME/Software/mosek/<version>/tools/platform/<platform>/
in the root directory for FinIS.

An example is shown here:
   bash make.sh ./mosek/6/tools/platform/linux64x86/

Typical Usage
----------------------------------------------------------------------------------------------------

Input:
1) The folder containing assembled contigs and the corresponding contig graph. Current version of
   FinIS can only handle the output of Velvet. In this folder, there should be a "contigs.fa" file
   containing assembled contigs in multi-fasta format and a "LastGraph" file containing the contig
   graph. 
2) The scaffolds file containing scaffold information in the following format:
       >scaffold_name
       <contig-name>	<orientaion (BE/EB)>	<contig-length>	<gap-size>	(<gap-standard-deviation>) 
       .
       .
       .
   The last column containing the standard deviation of gap sizes is optional. If not specified, the standard 
   deviation of all gaps will be assumed to have the same value.

Running FinIS:
1) There are two ways to provide parameters to FinIS:

   (A) Using the command line
       bin/FinIS <assembly-folder> <scaffolds-file> <output-folder>

       <assembly-folder> 	The folder containing the assembly produced by Velvet
       <scaffolds-file> 	Scaffolds file containing scaffold information
       <output-folder> 		Folder to save results
       
       For example:
       	   bin/FinIS test_dataset test_dataset/scaffolds.scaf test_dataset/results

   (B) Using a configuration file   
       bin/FinIS <config-file>

       <config-file>	       Configuration file

       For example:
       	   bin/FinIS test_dataset/conf.config

       where the configuration file provides information on assembly results, scaffolds file and 
       output directory to use (see below for the format).

Output Format
----------------------------------------------------------------------------------------------------

Filled scaffolds output by FinIS can be found in a multi-fasta file "scaffolds.filled.fasta". Summary
of gap-filling statistics can be found in the file "statistics". 

Format of Configuration File
----------------------------------------------------------------------------------------------------

An example configuration file can be found in "test_dataset/conf.config". The main parameters
that need to be specified are:

   a) data_directory: the folder containing assembly results. Since current version of FinIS can only
      deal with Velvet output, this folder should contain "contigs.fa" and "LastGraph" files.
   b) scaffolds_file: the scaffold file in the format mentioned in the "Input" section. 
   c) output_folder: the directory into which all results are written.
   
Optional parameters that can be specified are:
   d) graph_threshold: the threshold of the number of variables of a certain graph to determine if 
      it is too big to be solved using MIQP. By default, it is set to 2000. It can be changed by 
      users, however, bigger value may increase the runtime significantly.
   e) solve_big_graph: a switch to determine if the graphs with number of variables larger than 
      the threshold should be solved using other methods or not. If it is set as false, all the 
      gaps in such graphs will remain unfilled. 

Additional Information
----------------------------------------------------------------------------------------------------
1. Non-unique solutions are not filtered out by the current version of FinIS.
2. The complete pipeline to assemble a genome is as follows:
   a) Produce contigs using Velvet. Details of Velvet can be found at 
      	  http://www.ebi.ac.uk/~zerbino/velvet/.
   b) Produce scaffolds using a scaffolder (e.g. Opera - freely availabe at http://sourceforge.net/projects/operasf/).
   c) Finish genomes using FinIS by providing the folder containing Velvet output and 
      the scaffold file produced by the scaffolder.

References
----------------------------------------------------------------------------------------------------

1. To cite FinIS please use the following citation:

Song Gao, Denis Bertrand, Niranjan Nagarajan. FinIS: Improved in silico Finishing Using an Exact 
Quadratic Programming Formulation. Lecture Notes in Computer Science, 2012, Volume 7534/2012, 
314-325, DOI: 10.1007/978-3-642-33122-0_25.

2. SourceForge Page: https://sourceforge.net/projects/finis/

3. Contact: gaosong@nus.edu.sg (Song GAO) 

Source: README, updated 2012-09-07