deFuse Wiki

Status: Beta

Brought to you by: amcpherson

DeFuse

Authors: Anonymous

Attachments

Bccancer_logo.png (21246 bytes)

Email.jpg (2850 bytes)

Sfu_logo.png (9256 bytes)

There is a newer version of this page. You can find it here.

deFuse is a software package for gene fusion discovery using RNA-Seq data. The software uses clusters of discordant paired end alignments to inform a split read alignment analysis for finding fusion boundaries. The software also employs a number of heuristic filters in an attempt to reduce the number of false positives and produces a fully annotated output for each predicted fusion.

Papers that have used deFuse

deFuse has been used to discover gene fusions in tumour samples for the following papers:

MHC class II transactivator CIITA is a recurrent gene fusion partner in lymphoid cancers, Nature 2011

Contact

Please feel free to email the author if you have any questions or issues.

News

Version 0.3.6 is now available

Version 0.3.6 introduces new annotations that are leveraged using the adaboost classifier for slightly higher accuracy. The filtered output is now based on the probability produced by the adaboost classifier. deFuse 0.3.6 updates can be used to quickly reannotate a deFuse 0.3.5 analysis, however the repeats.txt file from the newly posted dataset is required. To reannotate:

rm output_directory/annotations.txt ./reannotate.pl -c config.txt -o output_directory

Known issues:

we are currently working on speed issues that may be a problem for some larger datasets
blat may fail on low memory machines, to be fixed soon by splitting up the reference for blat

Version 0.3.5 is now available

Version 0.3.5 is a minor update that fixes a number of small bugs and fixes reannotate.pl so that it can actually be used to classify the results of any defuse 0.3.X run. This functionality was not working for 0.3.4. Simply run the following to annotate any 0.3.X run and obtain the new annotations and adaboost probability score. It is not necessary to run reannotate.pl if you have already run version 0.3.5 of defuse.pl.

rm output_directory/annotations.txt ./reannotate.pl -c config.txt -o output_directory -p max_threads

Known issues:

blat may fail on low memory machines, to be fixed soon by splitting up the reference for blat
the method is somewhat sensitive to the max_insert_size parameter, set to 3 standard deviations above expected fragment length, but no higher

Version 0.3.4 is now available

Version 0.3.4 uses an adaboost classifier trained on 60 true positives and 61 false positives to produce a single probability score for each fusion. The R ada package is required. You can take full advantage of the adaboost classifier for results produced using other 0.3.X versions of defuse by simply running the reannotate.pl script *Edit: this does not work, please update to version 0.3.5*. Doing a full rerun using 0.3.4 should not be necessary. Once again, the dataset has not changed.

Changes:

adaboost classifier to produce a single probability score for each prediction
additional features calculated for each fusion, including boundary sequence di-nucleotide entropy and fusion boundary homology
bugfix for long read lengths and short fragment lengths

Known issues:

blat may fail on low memory machines, to be fixed soon by splitting up the reference for blat
the method is somewhat sensitive to the max_insert_size parameter, set to 3 standard deviations above expected fragment length, but no higher

Version 0.3.3 is now available

Version 0.3.3 reworks the split alignments so they are much quicker and do not hit the disk as much as for previous versions. If your deFuse runs are taking a long time, its a good idea to upgrade. The dataset has not changed.

Changes:

reworked split alignments to be quicker and less disk intensive
discord_read_trim in config.txt allows better use of libraries with long read lengths and shorter fragment lengths
quality type for bowtie is configurable by setting bowtie_quals in config.txt

Known issues:

blat may fail on low memory machines, to be fixed soon by splitting up the reference for blat
the method is somewhat sensitive to the max_insert_size parameter, set to 3 standard deviations above expected fragment length, but no higher

Version 0.3.2 is now available

If you were having problems previously with deFuse creating very large files, consuming large amounts of memory and taking long amounts of time, it could be because version 0.3.1 and 0.3.0 were not properly filtering IG rearrangements. The problems are exacerbated if your RNA-Seq data was produced from a tumour with high amounts of immune infiltration. This issue is now fixed in version 0.3.2. Note that results from 0.3.1 and 0.3.0 will not be wrong, but will include a superset of what you're interested in (assuming you are not interested in predicting IG rearrangements).

Version 0.3.2 also provides a number of other annotations that may be interesting for prioritizing fusions for validation or further experiments.

Changes:

fixed exclusion of IG rearrangements, thereby fixing overly high resource usage by deFuse
denovo breakpoint assembly is now optional via a setting in config.txt
annotation of the expression of each gene
annotation of fusions spliced at exon boundarys corrected
annotation of an interruption index added
annotation of an fusion splice index added
annotation of genomic position and strand of fusion splice added

Known issues:

blat may fail on low memory machines, to be fixed soon by splitting up the reference for blat
libraries with long read lengths and short fragment lengths (ie 75bp reads and 150 bp fragments) will not work, to be fixed by chopping large reads

Version 0.3.1 is now available

Changes:

Two unnecessary and potentially large files were being produced, this is no longer the case

Version 0.3.0 is now available

This version represents a major change to the way split reads are calculated and has produced more validated predicitons than 0.2.0. Not backward compatible with 0.2.0.

Changes:

New split read calculation
Parallelized split read calculation
Output now includes column headers

Version 0.2.0 is now available

This is the first official release of deFuse.