Home
Name Modified Size InfoDownloads / Week
v1.2 2012-05-22
README 2012-04-13 8.8 kB
Totals: 2 Items   8.8 kB 0
#############
# About
#############

A fusion transcript is an aberrant RNA molecule comprising exonic sequence from two normally separate genes. They can be formed either by transcription of a fusion gene, following some sort of translocation event, or by trans-splicing. They have been implicated as the cause of both haematological malignancies, and solid tumours, including prostate, breast and lung cancers.

FusionFinder is a perl-based software package, which can be used to find fusion transcript candidates in RNA-Seq data.

#############
# QuickStart
#############

1. Download and unzip the software from Sourceforge or http://bioinformatics.childhealthresearch.org.au/software/fusionfinder/FusionFinder_v1.2.zip
2. Install the prerequisites (perl modules, Bowtie, Muscle)
3. Install the latest Ensembl API (version 66) http://www.ensembl.org/info/docs/api/api_installation.html
4. Download and unzip the latest Ensembl reference data http://bioinformatics.childhealthresearch.org.au/datasets/fusionfinder/human_66_bowtie.zip
5. Make a configuration file (called fusionfinder.cnf below) specifying the location of your Ensembl API and pointing at your nearest Ensembl mirror (preferably locally installed)
6. Read the manual
7. Run the software

To run the test data, issue the following command:

bin/fusionfinder.pl --reads test_data/BCRABL1_testdata_reads.fq --cref human_66_coding --ncref human_66_noncoding --config conf/fusionfinder.cnf

#############
# System Requirements
#############

The basic requirements to run FusionFinder are an operating system and some disk space for your input and output files. FusionFinder is written in and requires Perl and associated additional modules described below to run. The FusionFinder protocol also requires an aligner. We use Bowtie for this purpose, which should be obtained and installed from the link below. FusionFinder relies heavily on Ensembl and access to an Ensembl mirror is critical. We recommend you install a local version which speeds processing immensely and instructions can be found below. MySQL is required if you want to install a local version of Ensembl.

#############
# Hardware requirements
#############

OS (Windows, MacOSX, Linux)
Smaller datasets will work on 32-bit and 4GB memory
Larger datasets will require more memory and a 64-bit OS

#############
# Software requirements
#############

1. Bowtie (http://bowtie-bio.sourceforge.net/index.shtml)
2. Perl Modules (search for each at CPAN - http://search.cpan.org
   Bio::Perl
   Bio::Tools::Run::Alignment::Muscle (install C/CJ/CJFIELDS/BioPerl-Run-1.006900.tar.gz if using cpan)
   Getopt::Long
   List::MoreUtils
   Pod::Usage
   Config::IniFiles
   Sort::ArrayOfArrays
   Term::ProgressBar
   GD::Graph
   DBI
   DBD::mysql
3. Ensembl API (download the appropriate version for the reference data and the Ensembl database you are connecting to http://www.ensembl.org/info/docs/api/api_installation.html)
   In the paper we used Ensembl API version 62
4. The multiple sequence aligner Muscle (http://www.drive5.com/muscle/)
5. MySQL (http://dev.mysql.com/)
   A mysql client (including devel package if installing with yum etc) is required to connect to Ensembl
   MySQL server is required if you want to install a local version of Ensembl
6. Ensembl Database
   If you want to use the default UK Ensembl database then FusionFinder will use this by default.
   However, it is highly recommended to use the closest Ensembl resource to you in order to speed up processing.
   You can find some public Ensembl mirrors at http://www.ensembl.org/info/data/mysql.html or alternatively for ultimate performance you can easily install your own.
   Simple instructions for this can be found at http://www.ensembl.org/info/docs/webcode/install/ensembl-data.html
   Either way you need access to (at a very minimum) the Ensembl human Core database and the human Compara database.
   In the paper we used a locally installed Ensembl version 62.

   When using a mirror database, you will need to give the connection details (server hostname, username and password)
   in the FusionFinder configuration file as below.

#############
# Downloading and Installing FusionFinder
#############

1. Download

You will need to download the software itself and some reference data, which can be found below.

The current version of FusionFinder is: FusionFinder version 1.2 - second public release stable version (03/04/2012) and can be downloaded from Sourceforge or http://bioinformatics.childhealthresearch.org.au/software/fusionfinder/FusionFinder_v1.2.zip

2. Install FusionFinder

Once all system requirements are fullfilled and you have downloaded FusionFinder, simply extract the contents to an appropriate local, or system accessible directory.
For example on Linux: unzip FusionFinder_v1.2.zip

3. Configure

An example configuration file is provided below. The essential part of this file is to tell the scripts where your Ensembl API is. For those users who install a local Ensembl database or wish to point to a local Ensembl mirror simply modify the server hostname, username and password details in this file. Also if you have not installed Bowtie on your PATH then you can specify the location of the bowtie binary here.

[API]
path=/path/to/my/ensembl/api/root
[Ensembl]
host=ensemblhost.xyz
user=ensembluser
pass=ensemblpass
[Bowtie]
path=/full/path/to/Bowtie

#############
# Manual
#############

The documentation for each script used in the workflow can be found online at http://bioinformatics.childhealthresearch.org.au/software/fusionfinder/fusionfinder-manual.php
or as a PDF in the doc directory of this distribution.

You can also access any of the documentation for each script by running the following at the command line:
perl script_name.pl --help

There are two perl scripts used in a complete analysis.
1. fusionfinder.pl - Specifically searches for fusion candidates.
2. make_alignments.pl - Generates multiple alignments of selected fusions candidates.

#############
# Example FusionFinder Workflow
#############

This is a simple workflow. The data file used in the examples can be found in the test data.

Step 1. Find candidate gene fusions in your read data

fusionfinder.pl --reads <fastq read file(s)> --config <config file> --cref <coding transcript reference file> --ncref <noncoding transcript reference file> --threads <number of threads to use>

eg fusionfinder.pl --reads BCRABL1_testdata_reads.fq --cref human_62_coding --ncref human_62_noncoding --mp_cutoff 1 --config fusionfinder.cnf

Step 2. Generation of multiple alignments for interesting fusion candidates

make_alignments.pl --readsfile <fusionfinder_reads_file.tsv> --g1 <G1 HGNC symbol> --g2 <G2 HGNC symbol> --config <config file>

eg make_alignments.pl --g1 BCR --g2 ABL1 --readsfile fusionfinder_reads.tsv --limit 20 --config fusionfinder.cnf

#############
# Datasets
#############

1. Test Read Data

The full dataset used in our paper is from published work by Levin and colleagues and can be found at http://bioinformatics.childhealthresearch.org.au/datasets/fusionfinder/levin_dataset.zip.
This data can be found below and represents the enriched dataset referred to in their manuscript
The total processing time for this dataset will depend on your system but will be approximately 3 hours with a local Ensembl database.

The levin dataset ~14 million 76mer reads (compressed 1.1GB; uncompressed 2.6GB)

A smaller subset of this dataset consisting of reads from a single fusion can be found in the test_data directory of this distribution.
The total processing time for this dataset will depend on your system but will be approximately 2 minutes with a local Ensembl database.

Subset test dataset ~85 thousand 76mer reads (compressed 6.2M; uncompressed 17MB)


2. Reference Data

These files contain both the coding and noncoding reference bowtie indices used in the protocol and the corresponding fasta files annotated in the listed Ensembl version

The coding and noncoding transcripts Bowtie Index for Ensembl version 66 (compressed 245MB; uncompressed 328MB) can be found at http://bioinformatics.childhealthresearch.org.au/datasets/fusionfinder/human_66_bowtie.zip

Archive versions can be found at http://bioinformatics.childhealthresearch.org.au/datasets/fusionfinder/ and in the paper we used data from Ensembl version 62 which can be found at http://bioinformatics.childhealthresearch.org.au/datasets/fusionfinder/human_62_bowtie.zip

Alternatively if you have access to an older version of Ensembl that you want to use, you can generate a custom reference sequence
using the make_reftrans.pl script distributed with FusionFinder.

#############
# Contact us
#############

FusionFinder was written by Richard Francis as part of his PhD in Bioinformatics at the University of Western Australia.
Source: README, updated 2012-04-13