deFuse is a software package for gene fusion discovery using RNA-Seq data. The software uses clusters of discordant paired end alignments to inform a split read alignment analysis for finding fusion boundaries. The software also employs a number of heuristic filters in an attempt to reduce the number of false positives and produces a fully annotated output for each predicted fusion.
deFuse has been used to discover gene fusions in tumour samples for the following papers:
Please feel free to email the author if you have any questions or issues.
Version 0.3.6 introduces new annotations that are leveraged using the adaboost classifier for slightly higher accuracy. The filtered output is now based on the probability produced by the adaboost classifier. deFuse 0.3.6 updates can be used to quickly reannotate a deFuse 0.3.5 analysis, however the repeats.txt file from the newly posted dataset is required. To reannotate:
rm output_directory/annotations.txt ./reannotate.pl -c config.txt -o output_directory
Known issues:
Version 0.3.5 is a minor update that fixes a number of small bugs and fixes reannotate.pl so that it can actually be used to classify the results of any defuse 0.3.X run. This functionality was not working for 0.3.4. Simply run the following to annotate any 0.3.X run and obtain the new annotations and adaboost probability score. It is not necessary to run reannotate.pl if you have already run version 0.3.5 of defuse.pl.
rm output_directory/annotations.txt ./reannotate.pl -c config.txt -o output_directory -p max_threads
Known issues:
Version 0.3.4 uses an adaboost classifier trained on 60 true positives and 61 false positives to produce a single probability score for each fusion. The R ada package is required. You can take full advantage of the adaboost classifier for results produced using other 0.3.X versions of defuse by simply running the reannotate.pl script *Edit: this does not work, please update to version 0.3.5*. Doing a full rerun using 0.3.4 should not be necessary. Once again, the dataset has not changed.
Changes:
Known issues:
Version 0.3.3 reworks the split alignments so they are much quicker and do not hit the disk as much as for previous versions. If your deFuse runs are taking a long time, its a good idea to upgrade. The dataset has not changed.
Changes:
Known issues:
If you were having problems previously with deFuse creating very large files, consuming large amounts of memory and taking long amounts of time, it could be because version 0.3.1 and 0.3.0 were not properly filtering IG rearrangements. The problems are exacerbated if your RNA-Seq data was produced from a tumour with high amounts of immune infiltration. This issue is now fixed in version 0.3.2. Note that results from 0.3.1 and 0.3.0 will not be wrong, but will include a superset of what you're interested in (assuming you are not interested in predicting IG rearrangements).
Version 0.3.2 also provides a number of other annotations that may be interesting for prioritizing fusions for validation or further experiments.
Changes:
Known issues:
Changes:
This version represents a major change to the way split reads are calculated and has produced more validated predicitons than 0.2.0. Not backward compatible with 0.2.0.
Changes:
This is the first official release of deFuse.