SOAPfuse Wiki

a tool for identifying fusion transcripts from paired-end RNA-Seq data

Brought to you by: jwl890427

update_log

Authors:

v1.27 at 01-18-2016

update the ins-report in the alignWG sub-directory (s01).
update the method obtaining the sequenced base/read amount in the workflow.
update the PSL format and its generation function, see blog.
optimize the module directory setting.
update the gene family process script.
update the name for duplicated gene_symbols in PSL file.
PSL provides information of gene/trans for SOAPfuse, no need to read gtf during operation.
optimize the fusion partner number limitation in step S05.
new version (v1.53) of config file.

v1.26 at 07-29-2013

fix a small bug in the s08 during intron-mode analysis.
control the maximum memory consumed by the Junction-lib-alignment in intron-mode analysis.
solve the replicated fusion point pair of same gene pair.
add some file-check-points for stable operation.
fix a bug in the expression stat for draw SOAPfuse fusion figure.
add DE analysis based on TMM method, read files created during SOAPfuse analysis.
optimize the construction of SOAPfuse database, please reconstruct your database files.
Add one new file, cyto_band file, download this file from UCSC:
For hg18: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/cytoBand.txt.gz
For hg19: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz
Put it in your current SOAPfuse database directory and 'gzip -d' it to get the text file "cytoBand.txt".
remove the memory exceeding risk when transforms initial junc-lib alignment results.
add 'Overlap-at-3-end' in the ins.report file.
fix the bug caused by the very large extend base in step s08.
new version (v1.52) of config file.
Add two parameters ('DB_cytoBand' and 'PG_DE_stat'), and modify some descriptions.
To update your current config file, pls use 'transfer_old_SOAPfuse_config_to_new_config.pl' in config dir.
After transfered, please change '$(pg_dir)/aln_bin/convert' to '$(pg_dir)/convert', as the 'convert' tool has been moved outside of 'aln_bin' directory. The reason is: it is not for aln-work, just for svg2png transforming.
slightly loose the span.reads.amass filtrations in step s05.
optimize the pretreatment of readid from user's original read files.
fix a bug in the s07 step that may cause the junc-reads' mapping outside of transcript partner region.
update the selection of multiple transcripts fused at same position.
fix a bug in the s05 step that may allow the transcripts pair belong to one same gene.
fix a bug in loading the outdir parameter.
add a judgment on the abnormal start_codon or stop_codon.

v1.25 at 04-12-2013

further optimize the ability to deal with original RNA-Seq data.
fix a bug in the prediction of frame-shift or in-frame-shift.
optimize the construction of junction library to avoid the bwt-index failure caused by tiny fasta ref.
add a new figure about all detected fusion events.
NOTE:
one sample, one figure. This figure is the 3D landscape of fusion events in concerned sample. It took Jia ~4 days to complete the drawing algorithm and combine it with SOAPfuse pipeline.
new version (v1.5) of config file.
Comparing to v1.4, the v1.5 config does not add any new parameters, but just changes the prefix of parameter 'intron_len_extend_from_exon_edge' from 'PA_s07' to 'PA_all'.
Tips: find 'transfer_old_SOAPfuse_config_to_new_config.pl' in config dir, and use it.
Fix a little bug of SOAPfuse database construction.
Note:
pls use the database created by SOAPfuse-S00-Generate_SOAPfuse_database.pl script of V1.25 for SOAPfuse of V1.25 versions.
pls use the database created by SOAPfuse-S00-Generate_SOAPfuse_database.pl script of V1.24 for SOAPfuse of V1.24 and former versions.
Or, you may have possibilities to encounter some errors.
Refine the candidate transcript pair selection based on the discordantly mapped paired-end reads.
Improve the junc-reads detection of fusion analysis in non-intron mode.
Optimize the report of unexpected situation or errors SOAPfuse encounted during analysis.
Slightly rearrange the layout of files in directory 'final_fusion_genes/XXX/analysis'.

v1.24 at 03-19-2013

add the somatic mode, tumor-vs-control.
could detect fusion junction point that locates in the flanking intron nearby exons.
optimize the alignment strategies of RNA-Seq reads.
detect fusions of transcript, not of gene any more.
optimize the final results and their format, easier to recognize.
add the prediction of frame-shift or in-frame-shift (primary version).
NOTE:
although it is primary version, Jia has tested all well-known and classical fusions found in different tumors, and all of them were successfully predicted as in-frame-shift. Bingo!
Well, Jia is planning to optimize it in the next version, as he has found some new interesting fusion cases.
combine the fusion and expression svg figures into one figure, brand new version. We named it as 'SOAPfuse fusion figure'. It will help you a lot in your fusion research.
WARN:
the format of this new figure is created and only used by SOAPfuse, and we are planing to apply for a patent.
fix several abashed bugs, -_-!!!
optimize the ability to deal with original RNA-Seq data.
add several new filter strategies.
optimize the maximum consumed memory, make it steady in 7.5G to 9G.
Note:
for this, Jia has tested the huge RNA-Seq data (>30G, wow), and the maximum memory is about 9G.
new version (v1.4) of config file.
supply one script for user to contruct the whole SOAPfuse database in one step (only Homo sapiens).

v1.23

abandoned during private beta

v1.22 at 03-18-2012

change supporting reads name standard:
old 'cross-read' change to 'span-read'
old 'span-read' change to 'junc-read'
add filtration on the minimum covered range of span-read.
add filtration on the single-base-high-repeative span-read.
add the classification of predicted fusions.

v1.21 at 02-08-2012

add the retrim for the cross-reads found by core of SOAPfuse.
based on the mapped orientation, judge which end (5' or 3') shoule be saved with continue bases of combined gene sequence. it makes calculation of pair-region more accurate.
optimize config.

v1.20 at 01-31-2012

cross-reads are uniquely mapped in the former version, while this version put trimmed reads that have repeat hits into cross-reads.
pick out trimmed reads of which mapped genes are not more than maximum (a parameter in config, default is 2).
combine two steps ('cut' and 'realign') to one step ('trim and realign'). new pipeline contains nine steps.
shortest length that will trim to is set in config by users. To make sure the accurate alignment, minimum is 30nt.
control the maximum memory used in two steps ('denovo unmap' and 'draw expression').
add the prediction of fused sequence.
draw svg figures that show the supporting reads (cross-read and span-read) alignment on fused sequence.
draw svg figures that show expression level (absolute coverage) of genes participate in fusions.
optimize config.

v1.1 at 11-14-2011

add the cut to 50bp operation in the step 'cut and realign' (changed in the updated version).
add the filtrations for multiple modifications using blastn in two steps ('candidate' and 'denovo unmap').
add the estimation of insert size after the alignment against whole genome reference.
optimize the format translation from bwa to soap after the unmap reads realignment by bwa.
add the judgement of fused orientation based on the cross-reads mapping information after get the candidate fusion set.
modify names of all scripts, make them uniform and easy to distinguish.
improve the default number (5nt) of bases that a span (junction) read must map to both sides of the fusion at least.
update the filtering of homologous regions between genes by blastn aligning, make it more accurate.

v1.0 at 09-16-2011

achieve the basic automatic pipeline operation (contains ten steps).
show satisfactory performance by comparing with previously released tools based on several datasets.