Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README | 2012-09-07 | 5.5 kB | |
finis_v0.1.tar.gz | 2012-09-07 | 31.8 MB | |
Totals: 2 Items | 31.8 MB | 0 |
README for FinIS (Finishing In-Silico) Author: Song Gao Date: 6th September 2012 Version: 0.1 Pre-requisite ---------------------------------------------------------------------------------------------------- 1. MOSEK C++ API FinIS uses MOSEK C++ API to solve Mixed Integer Quadratic Programs. MOSEK can be downloaded from http://www.mosek.com. It is freely available for academic usage. The free academic license can be requested at http://mosek.com/resources/trial/. Installation manual can be found at http://mosek.com/resources/doc/. Installation ---------------------------------------------------------------------------------------------------- Suppose the installation directory of MOSEK is <MOSEK>, then the directory containing header files and binary files can be located at <MOSEK>/<version>/tools/platform/<platform>/ (the version and platform in <> should be modified accordingly). Then, type bash make.sh $HOME/Software/mosek/<version>/tools/platform/<platform>/ in the root directory for FinIS. An example is shown here: bash make.sh ./mosek/6/tools/platform/linux64x86/ Typical Usage ---------------------------------------------------------------------------------------------------- Input: 1) The folder containing assembled contigs and the corresponding contig graph. Current version of FinIS can only handle the output of Velvet. In this folder, there should be a "contigs.fa" file containing assembled contigs in multi-fasta format and a "LastGraph" file containing the contig graph. 2) The scaffolds file containing scaffold information in the following format: >scaffold_name <contig-name> <orientaion (BE/EB)> <contig-length> <gap-size> (<gap-standard-deviation>) . . . The last column containing the standard deviation of gap sizes is optional. If not specified, the standard deviation of all gaps will be assumed to have the same value. Running FinIS: 1) There are two ways to provide parameters to FinIS: (A) Using the command line bin/FinIS <assembly-folder> <scaffolds-file> <output-folder> <assembly-folder> The folder containing the assembly produced by Velvet <scaffolds-file> Scaffolds file containing scaffold information <output-folder> Folder to save results For example: bin/FinIS test_dataset test_dataset/scaffolds.scaf test_dataset/results (B) Using a configuration file bin/FinIS <config-file> <config-file> Configuration file For example: bin/FinIS test_dataset/conf.config where the configuration file provides information on assembly results, scaffolds file and output directory to use (see below for the format). Output Format ---------------------------------------------------------------------------------------------------- Filled scaffolds output by FinIS can be found in a multi-fasta file "scaffolds.filled.fasta". Summary of gap-filling statistics can be found in the file "statistics". Format of Configuration File ---------------------------------------------------------------------------------------------------- An example configuration file can be found in "test_dataset/conf.config". The main parameters that need to be specified are: a) data_directory: the folder containing assembly results. Since current version of FinIS can only deal with Velvet output, this folder should contain "contigs.fa" and "LastGraph" files. b) scaffolds_file: the scaffold file in the format mentioned in the "Input" section. c) output_folder: the directory into which all results are written. Optional parameters that can be specified are: d) graph_threshold: the threshold of the number of variables of a certain graph to determine if it is too big to be solved using MIQP. By default, it is set to 2000. It can be changed by users, however, bigger value may increase the runtime significantly. e) solve_big_graph: a switch to determine if the graphs with number of variables larger than the threshold should be solved using other methods or not. If it is set as false, all the gaps in such graphs will remain unfilled. Additional Information ---------------------------------------------------------------------------------------------------- 1. Non-unique solutions are not filtered out by the current version of FinIS. 2. The complete pipeline to assemble a genome is as follows: a) Produce contigs using Velvet. Details of Velvet can be found at http://www.ebi.ac.uk/~zerbino/velvet/. b) Produce scaffolds using a scaffolder (e.g. Opera - freely availabe at http://sourceforge.net/projects/operasf/). c) Finish genomes using FinIS by providing the folder containing Velvet output and the scaffold file produced by the scaffolder. References ---------------------------------------------------------------------------------------------------- 1. To cite FinIS please use the following citation: Song Gao, Denis Bertrand, Niranjan Nagarajan. FinIS: Improved in silico Finishing Using an Exact Quadratic Programming Formulation. Lecture Notes in Computer Science, 2012, Volume 7534/2012, 314-325, DOI: 10.1007/978-3-642-33122-0_25. 2. SourceForge Page: https://sourceforge.net/projects/finis/ 3. Contact: gaosong@nus.edu.sg (Song GAO)