Home
Name Modified Size InfoDownloads / Week
fusion_test.sh 2019-02-23 38.0 kB
README 2018-11-19 7.1 kB
MapSplice-v2.2.2.tar.gz 2018-11-19 8.7 MB
tophat.post.config 2018-02-11 85 Bytes
tophat.rna.config 2018-02-11 93 Bytes
Alignment.oscript 2018-02-02 1.0 kB
FusionMap_2015-03-31.tar.gz 2018-01-23 25.4 MB
fusionmap.sh 2018-01-22 7.1 kB
configuration.tmp 2017-12-29 1.2 kB
samtools-0.1.19.tar.gz 2017-07-25 3.4 MB
ericscript-0.5.5.tar.gz 2017-07-24 569.7 kB
SOAPfuse-v1.26.tar.gz 2016-07-06 42.8 MB
mcl 2016-03-03 406.9 kB
refGene_sorted.txt 2016-03-03 10.8 MB
ensGtp.txt 2016-03-03 7.4 MB
ensGene.txt 2016-03-03 39.6 MB
blastn 2016-03-03 33.5 MB
cytoBand.txt.gz 2015-11-15 6.6 kB
hg19table.txt.gz 2015-11-15 3.7 MB
hgnc_complete_set.txt.gz 2015-11-15 3.1 MB
bowtie-1.1.1.tar.gz 2015-11-15 14.2 MB
hg19table.txt 2015-11-15 122.3 kB
chimerascan-0.4.5.tar.gz 2015-11-15 4.1 MB
Homo_sapiens.GRCh37.60.chr.gtf.gz 2015-11-15 20.3 MB
Mono.tar.gz 2015-11-15 89.8 MB
Jinja2-2.7.3.tar.gz 2015-11-15 1.3 MB
scikit-learn-0.14.1.tar.gz 2015-11-15 6.8 MB
Homo_sapiens.GRCh37.69.gtf.gz 2015-11-15 26.2 MB
fusion_test.config 2015-11-15 70 Bytes
Totals: 29 Items   342.5 MB 4
# fusion_test.sh
# Author: Chin-Chen Pan
# Directore, General and Surgical Pathology
# Professor, attending pathologist
# Department of Pathology and Laboratory Medicine
# Taipei Veterans General Hospital
# TAIWAN
# Version 4.5.2
# Date: Nov 2, 2018

[Introduction]

fusion_test.sh is a shell script to detect fusion in DNAseq or RNAseq. It uses chimerascan, SOAPfuse, MapSplice2, FusionMap, fusioncatcher, TopHat and EricScript. The output files of chimerascan are further annotated by Jinja and Pegasus Fusion.

[Before running]

1. Prepare fusion_test.config. The file contains four words in one line. No other words and lines are allowed.

	/path/to/programs /path/to/inputfile /path/to/outputfile thread_number

	ex1: 
	/home/user_name	/media/user_name/disk1/input /home/user_name/output 8

	ex2:
        ~  ~/input ~/output 8

2. python-dev, zlib1g-dev, phython-pandas, libgdiplus and default-jre, R, R package ada must be installed.

	sudo apt-get install python-dev
	sudo apt-get install zlib1g-dev
	sudo apt-get install python-pandas
	sudo apt-get install default-jre
        sudo apt-get install r-base
	sudo apt-get install libgdiplus
        R
        >install.packages("ada")
        >q()

3. Install chimerascan, jinja, pegasus, sklearn (only from scikit-learn-0.14.1) and mono (only from mono-2.10.9 ). Please refer to the authors' websites.
   If you place the folders of chimerascan-0.4.5 (including ez_setup.py), Jinja2-2.7.3, scikit-learn-0.14.1, mono-2.10.9 in the /path/to/programs, the script can automatically install the programs the first time you run it.
   Pre-built Mono and MapSplice are provided.
   Download SOAPfuse-v1.26, Homo_sapiens.GRCh37.69.gtf.gz, cytoBand.txt.gz and hgnc_complete_set.txt to /path/to/programs. The script will build SOAP-index automatically.
   Download Tophat2 prebuilt binary tophat-2.0.0.Linux_x86_64, bowtie1 index from Tophat site.
   Copy blastn, mcl, ensGtp.txt, ensGene.txt, refGene_sorted.txt into tophat-2.0.0.Linux_x86_64 folder. Those files can be downloaded here.
   Make /path/to/programs/blast directory, download human_genomic*, other_genomic*, and nt* from blast database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/), and extract them under /path/to/programs/blast.
   Download bedtools and extract the directory to /path/to/programs/bedtools2. The script will use it to install bedtools the frist time you run it.
   eriscript uses old version of samtools (samtools-0.1.19). Download prebuilt samtools here, decomress, rename and copy the samtools to /usr/bin.

   Install oshell to /path/to/programs/oshell according to the authors' website. http://www.arrayserver.com/wiki/index.php?title=Oshell#OmicScript_for_FusionMap

   Copy Alignment.oscript to /path/to/programs/oshell.
   
   sudo cp /path/to/samtools-0.1.19/samtools /usr/bin/samtools-0.1.19
   
   Download ericscript-0.5.5.tar.gz here and decompress it to /path/to/programs.
   Download ericscript_db_homosapiens_ensembl73 from the ericscript site and extract it to /path/to/programs/ericscript_db_homosapiens_ensembl73.
   
   Copy configuration.tmp to /path/to/fusioncatcher/fusioncatcher_v0.99.5a/etc. It will be used as a template for configuration.cfg for fusioncatcher.
        

4. The followings files and folders must be placed in the /path/to/programs.

	chimerascan-0.4.5
	bowtie-1.1.1
	chimerascan_hg19_ucsc_index (index for chimerascan)
	Pegasus_dist.0.3.1
	SOAPfuse-v1.26
	SOAPfuse-index (index for SOAPfuse)
	MapSplice-v2.2.2
	chromFa (index for MapSplice)
	Homo_sapiens.GRCh37.60.chr.gtf (required for Pegasus Fusion)
	Homo_sapiens.GRCh37.69.gtf (required for MapSplice)
	Mono
	oshell
	oshell/Alignment.oscript
	OmicsoftFolders (index for FusionMap)
	fusioncatcher
	tophat-2.1.0.Linux_x86_64
	BowtieIndex
	blast
        ericscript-0.5.5
        ericscript_db_homosapiens_ensembl73

	Some of the files can be downloaded here.

5. The original configuration files, ./SOAPfuse-v1.26/config.txt and ./oshell/Alignment.oscript, are used as template. Do not change the content of these 2 files. You can create config2.txt and Alignment2.oscript with different parameters (in the options -sf2 and -fm2).

6. You may write additional options to tophat.rna.config, tophat.genome.config and tophat.post.config, each containing a single line (only the first line of the files will be read), and save them to /path/to/programs/tophat-2.1.0.Linux_x86_64.
	   
7. The seq files must be paired end, and named as samplename_1.suffix and samplename_2.suffix. The suffix must be one of the fastq/fq/fastq.gz/fq.gz.

8. In order to be compatible with SOAPfuse, the seq files must be placed in the following paths:

	/path/to/inputfile/samplename/Lib/samplename_1.suffix
	/path/to/inputfile/samplename/Lib/samplename_2.suffix

[RUNNING]

Syntax:  sh fusion_test.sh samplename suffix -options
	
	options:
	-sf2: use second configuration file for SOAPfuse
  	-fm2: use second configuration file for FusionMap
  	-rna: RNAseq (default)
  	-dna: DNAseq for FusionMap
  	-transcriptome: Use transcriptome index in TopHat (default)
  	-genome: Use genome index in TopHat
  	-s: shutdown after finished
  	-kt: keep temporary files
  	-skc: skip chimerascan
  	-sks: skip SOAPfuse
  	-skm: skip MapSplice
  	-skf: skip FusionMap
  	-skt: skip fusioncatcher
  	-skh: skip TopHat
  	-ske: skip EricScript

	ex1:
	  sh fusion_test.sh test1 fastq.gz -genome
	ex2:
	  sh fusion_test.sh test2 fastq -skc -sf2 -skm -s

[How to build chimerascan index]
1. download http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz 
2. uncompress chromFa.tar.gz
3. cat chr?.fa chr??.fa > ~/hg19.fa
4. python /usr/local/bin/chimerascan_index.py  --bowtie-path=~/bowtie-1.1.1 ~/hg19.fa ~/hg19table.txt ~/chimerascan_hg19_ucsc_index
   (hg19table.txe can be downloaded here).

[How to build SOAPfuse index]
perl SOAPfuse-S00-Generate_SOAPfuse_database.pl -wg ~/hg19/hg19.fa -gtf ~/Homo_sapiens.GRCh37.69.gtf.gz -cbd ~/cytoBand.txt.gz -gf ~/hgnc_complete_set.txt -sd ~/SOAPfuse-v1.26/ -dd ~/SOAPfuse-index/
(Homo_sapiens.GRCh37.69.gtf.gz, cytoBand.txt.gz and hgnc_complete_set.txt can be downloaded here.)

[How to build MapSplice index]
1. uncompress chromFa.tar.gz into ~/chromFa
2. keep chr1.fa to chr22.fa, chrM.fa, chrX.fa, chrY.fa. delete others.
3. the program will automatically build the index.

[How to build FusionMap index]
Make following empty folders in /path/to/programs.
	OmicsoftFolders
	---OmicsoftSGE
		---Fusion
		---ReferenceLibrary
Run with -kt function at the first time, the program will download the idex files to /path/to/outputfile/filename/FusionMap/OmicsoftFolders. After finished, copy the /path/to/outputfile/filename/FusionMap/OmicsoftFolders back to /path/to/programs/OmicsoftFolders.

Note: MapSplice and FusionCatcher will unzip the input files to two temporary files which will be deleted after the procedure. To avoid this, unzip the input files before running the script and use unzipped files for all procedures.

Source: README, updated 2018-11-19