Home
Name Modified Size InfoDownloads / Week
rna_test.sh 2024-06-03 65.8 kB
README 2024-05-24 9.9 kB
stringtie.config 2018-03-20 3 Bytes
cufflinks.config 2018-03-19 25 Bytes
star-fusion2.config 2018-03-08 179 Bytes
star-fusion.config 2018-03-01 190 Bytes
gene_cytoband.txt 2015-12-13 1.3 MB
genesymbol.table.txt 2015-12-13 3.6 MB
cufflinks-2.2.1.tar.gz 2015-11-15 11.6 MB
bam-readcount.tar.gz 2015-11-15 13.0 MB
target_intervals.bed.gz 2015-11-15 5.4 MB
rna_test.config 2015-11-15 71 Bytes
Totals: 12 Items   35.0 MB 0
# rna_test.sh
# Author: Chin-Chen Pan
# Directore, General and Surgical Pathology
# Professor, attending pathologist
# Department of Pathology and Laboratory Medicine
# Taipei Veterans General Hospital
# TAIWAN
# Version 15.1.1
# Date: May 24, 2024

[Introduction]

rna_test.sh is a shell script to run GATK best practice for variant-calling in rnaeq. It uses STAR for alignment, HaplotypeCaller to call variants, and Annovar to annotate. It also employs cufflinks FPKM and BAM-readcount. It runs STAR-fusion if indicated. It runs Arriba if arriba is provided.

[Before running]

1. Prepare rna_test.config. The file contains four words in one line. No other words and lines are allowed.

	/path/to/programs /path/to/inputfile /path/to/outputfile thread_number

	ex1: 
	/home/user_name	/media/user_name/disk1/input /home/user_name/output 8

	ex2:
        ~  ~/input ~/output 8

2. default-jre and samtools must be installed.

	sudo apt-get install openjdk-8-jdk
	sudo apt-get install samtools
	sudo apt-get install python2.7-dev
	sudo apt-get install python-numpy
	sudo apt-get install python-matplotlib
        sudo apt-get install python-pysam

3. Download pre-compiled static STAR from STAR's website (verson >2.5.4). Copy STAR to the /usr/bin. 

4. Download prebuilt Stringtie from website. Copy it to /path/to/programs/stringtie-1.3.4d.Linux_x86_64.

5. If HTSeq-0.6.1p1 is present in the /path/to/programs, the script will install HTSeq at first time you run the script. Or you can install HTSeq beforehand.
   
        sudo apt-get install build-essential python2.7-dev python-numpy python-matplotlib python-pysam python-htseq

6. The followings files and folders must be placed in the /path/to/programs.

	ctat_genome_lib_build_dir
        STAR-Fusion-v1.2.0
	picard-tools/picard.jar
	picard-tools/GenomeAnalysisTK.jar 
        picard-tools/fgbio.jar (for umi mode)
	annovar
	hg19 (see below)
	dbsnp_138.hg19.vcf
	target_intervals.bed
	cufflink.mask.gtf
	dbSNPnew
	cosmic
	clinvar
	genecode.v19.annotation.gtf
	genesymbol.table.txt
	gene_cytoband.txt
        stringtie-1.3.4d.Linux_x86_64
        hg19.coding.gtf
            (genecode.v19.annotation.gtf could be downloaded from http://www.gencodegenes.org/releases/19.html)

    fgbio.jar can be installed by CONDA. Copy the file to /path/to/programs.

    For annovar, download the followings filters.
	perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar refGene humandb/
	perl annotate_variation.pl -buildver hg19 -downdb cytoBand humandb/
	perl annotate_variation.pl -buildver hg19 -downdb genomicSuperDups humandb/ 
	perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar esp6500siv2_all humandb/
	perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar 1000g2014oct humandb/
	perl annotate_variation.pl -buildver hg19 -downdb phastConsElements46way humandb/
	perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar clinvar_20150330 humandb/
	perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar cosmic70 humandb/
        perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar dbnsfp30a humandb/
        perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar exac03nontcga humandb/

	Download left-normalized avsnp (hg19_avsnp150.txt) from ANNOVAR site.
     
        Use the script dbsnpnew.sh cosmicnew.sh and clinvarnew.sh to make dbSNP cosmic and clinvar files. 

	These files are tab-delimited. The format should look like this:

		rs990360547	chr10-100000035C-T
		rs946241909	chr10-100000101T-C
		rs923734382	chr10-100000126A-G
		rs894795669	chr10-100000179C-T

    Some of the files can be downloaded here.
    
    Copy bam-readcount and cufflinks to /usr/local/bin. You can make the bam-readcount or dowload the prebuilt one here. The cufflinks can be downloaded from their website.
 
    	sudo cp /path/to/bam-readcount /usr/local/bin
	sudo cp /path/to/cufflinks /usr/local/bin

    Download STAR-Fusion Release v1.2.0 from its website and install the software by typing sudo make in the directory.
    These modules may be required.

	perl -MCPAN -e shell
   	install DB_File
   	install URI::Escape
   	install Set::IntervalTree
   	install Carp::Assert
   	install JSON::XS
   	install PerlIO::gzip

    Create mask file for Cufflinks using the following command:

        cat /path/to/programs/gencode.v19.annotation.gtf | awk ' !/protein_coding/ || /rRNA/ || /tRNA/ || /5S/ || /chrM/ {print}' > /path/to/programs/cufflink.mask.gtf

    Create hg19.coding.gtf using the following command:

        cat /path/to/programs/gencode.v19.annotation.gtf | awk /protein_coding/ | sed /rRNA/d | sed /tRNA/d | sed /5S/d > /path/to/programs/hg19.coding.gtf

    Split the target.interals by chromosomes.

        awk -F '\t' '{print $0 >> "/path/to/programs/target_intervals/" $1 ".bed"}' /path/to/programs/target_intervals.bed 

7. Write the STAR commands to produce Chimeric.out.junction into /path/to/programs/STAR-Fusion-v1.2.0/star-fusion.config if you want to run STAR-Fusion. The additional options for STAR-Fusion are written into /path/to/programs/STAR-Fusion-v1.2.0/star-fusion2.config. Write the additional cufflinks commands to /path/to/programs/cufflinks-2.2.1/cufflinks.config. Write the additional stringtie commands to /path/to/programs/stringtie-1.3.4d.Linux_x86_64/stringtie.config. Only the first line of these files will be read. 

8. Install Arriba to /path/to/programs/arriba.

9. The rnaseq files must be paired end, and named as samplename_1.suffix and samplename_2.suffix. The suffix must be one of the fastq/fq/fastq.gz/fq.gz.

10. In order to be compatible for SOAPfuse, the RNAseq files must be placed in the following paths:

	/path/to/inputfile/samplename/Lib/samplename_1.suffix
	/path/to/inputfile/samplename/Lib/samplename_2.suffix

[RUNNING]

Syntax:  sh rna_test.sh samplename suffix -options
	
	options:
	-sb: keep sorted BAM
        -sam: keep SAM
 	-dt: delete temporary files
	-nk: no keep temporary files in process
	-sk#: skip procedure # [1 to 6]
	-s: shutdown after finished
        -skv: skip variant calling
        -skc: skip Cufflink
        -skh: skip HTseq count
	-sks: skip Stringtie
        -skb: skip BQSR
        -ska: skip arriba
        -dcov: use -dcov in SplitNCigarReads


	ex1:
	  sh rna_test.sh test1 fastq.gz -skv -nk -sb
	ex2:
	  sh rna_test.sh test2 fastq -sk1 -sk2 -sk3 -sk4 -dcov -dt -s

Note: 
1. The script is designed to start from the last process. However, please delete the last incomplete temporary file after interrupted, otherwise the results may be incomplete.
2. STAR 1-step alignment is disabled due to the lack of real efficiency.
3. Cufflink and Stringtie use the first STAR alignment files. In order not to duplicate the alignment process, keep temporary files (not use -dt and -nk) until Cufflink or Stringtie has been finished.
4. If not enough memory occurs in Step 6 [PrintReads], disable BQSR function by -skb.
5. If stucked at SplitNCigarReads (step 4 [split reads]), use -dcov to limit the maximum coverage to 100000.

[How to build hg19 index]

# bwa, picard-tools/picard.jar, samtools must be installed.

# download http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz 

# uncompress the chromFa.tar.gz into ~/hg19

cd ~/hg19

# cat individual chromosome.fa into hg19.fa in the following orders:

cat chrM.fa chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa chr6.fa chr7.fa chr8.fa chr9.fa chr10.fa chr11.fa chr12.fa chr13.fa chr14.fa chr15.fa chr16.fa chr17.fa chr18.fa chr19.fa chr20.fa chr21.fa chr22.fa chrX.fa chrY.fa chr1_gl000191_random.fa chr1_gl000192_random.fa chr4_ctg9_hap1.fa chr4_gl000193_random.fa chr4_gl000194_random.fa chr6_apd_hap1.fa chr6_cox_hap2.fa chr6_dbb_hap3.fa chr6_mann_hap4.fa chr6_mcf_hap5.fa chr6_qbl_hap6.fa chr6_ssto_hap7.fa chr7_gl000195_random.fa chr8_gl000196_random.fa chr8_gl000197_random.fa chr9_gl000198_random.fa chr9_gl000199_random.fa chr9_gl000200_random.fa chr9_gl000201_random.fa chr11_gl000202_random.fa chr17_ctg5_hap1.fa chr17_gl000203_random.fa chr17_gl000204_random.fa chr17_gl000205_random.fa chr17_gl000206_random.fa chr18_gl000207_random.fa chr19_gl000208_random.fa chr19_gl000209_random.fa chr21_gl000210_random.fa chrUn_gl000211.fa chrUn_gl000212.fa chrUn_gl000213.fa chrUn_gl000214.fa chrUn_gl000215.fa chrUn_gl000216.fa chrUn_gl000217.fa chrUn_gl000218.fa chrUn_gl000219.fa chrUn_gl000220.fa chrUn_gl000221.fa chrUn_gl000222.fa chrUn_gl000223.fa chrUn_gl000224.fa chrUn_gl000225.fa chrUn_gl000226.fa chrUn_gl000227.fa chrUn_gl000228.fa chrUn_gl000229.fa chrUn_gl000230.fa chrUn_gl000231.fa chrUn_gl000232.fa chrUn_gl000233.fa chrUn_gl000234.fa chrUn_gl000235.fa chrUn_gl000236.fa chrUn_gl000237.fa chrUn_gl000238.fa chrUn_gl000239.fa chrUn_gl000240.fa chrUn_gl000241.fa chrUn_gl000242.fa chrUn_gl000243.fa chrUn_gl000244.fa chrUn_gl000245.fa chrUn_gl000246.fa chrUn_gl000247.fa chrUn_gl000248.fa chrUn_gl000249.fa > hg19.fa

# build index with following commands:

bwa index -a bwtsw -p hg19 hg19.fa
java -jar /path/to/programs/picard-tools-1.134/picard.jar CreateSequenceDictionary R=/path/to/programs/hg19/hg19.fa O=/path/to/programs/hg19/hg19.dict
# creat individual dict for each chr
java -jar ~/picard-tools-1.134/picard.jar CreateSequenceDictionary R=~/hg19/chr1.fa O=~/hg19/chr1.dict # etc.
samtools faidx /path/to/programs/hg19/hg19.fa 
Download the index file for STAR from its website (GRCh37_gencode_v19_CTAT_lib_Nov012017.plug-n-play.tar.gz). Unzip it to /path/to/programs/ctat_genome_lib_build_dir. Then overwrite the following subdirectory.
 STAR --runMode genomeGenerate --genomeDir /path/to/programs/ctat_genome_lib_build_dir/ref_genome.fa.star.idx --genomeFastaFiles /path/to/programs/hg19/hg19.fa  --runThreadN  N
 ln -sf /path/to/programs/hg19/hg19.fa /path/to/programs/ctat_genome_lib_build_dir/ref_genome.fa
 ln -sf /path/to/programs/hg19/hg19.fa.fai /path/to/programs/ctat_genome_lib_build_dir/ref_genome.fa.fai

Source: README, updated 2024-05-24