Name | Modified | Size | Downloads / Week |
---|---|---|---|
rna_test.sh | 2024-06-03 | 65.8 kB | |
README | 2024-05-24 | 9.9 kB | |
stringtie.config | 2018-03-20 | 3 Bytes | |
cufflinks.config | 2018-03-19 | 25 Bytes | |
star-fusion2.config | 2018-03-08 | 179 Bytes | |
star-fusion.config | 2018-03-01 | 190 Bytes | |
gene_cytoband.txt | 2015-12-13 | 1.3 MB | |
genesymbol.table.txt | 2015-12-13 | 3.6 MB | |
cufflinks-2.2.1.tar.gz | 2015-11-15 | 11.6 MB | |
bam-readcount.tar.gz | 2015-11-15 | 13.0 MB | |
target_intervals.bed.gz | 2015-11-15 | 5.4 MB | |
rna_test.config | 2015-11-15 | 71 Bytes | |
Totals: 12 Items | 35.0 MB | 0 |
# rna_test.sh # Author: Chin-Chen Pan # Directore, General and Surgical Pathology # Professor, attending pathologist # Department of Pathology and Laboratory Medicine # Taipei Veterans General Hospital # TAIWAN # Version 15.1.1 # Date: May 24, 2024 [Introduction] rna_test.sh is a shell script to run GATK best practice for variant-calling in rnaeq. It uses STAR for alignment, HaplotypeCaller to call variants, and Annovar to annotate. It also employs cufflinks FPKM and BAM-readcount. It runs STAR-fusion if indicated. It runs Arriba if arriba is provided. [Before running] 1. Prepare rna_test.config. The file contains four words in one line. No other words and lines are allowed. /path/to/programs /path/to/inputfile /path/to/outputfile thread_number ex1: /home/user_name /media/user_name/disk1/input /home/user_name/output 8 ex2: ~ ~/input ~/output 8 2. default-jre and samtools must be installed. sudo apt-get install openjdk-8-jdk sudo apt-get install samtools sudo apt-get install python2.7-dev sudo apt-get install python-numpy sudo apt-get install python-matplotlib sudo apt-get install python-pysam 3. Download pre-compiled static STAR from STAR's website (verson >2.5.4). Copy STAR to the /usr/bin. 4. Download prebuilt Stringtie from website. Copy it to /path/to/programs/stringtie-1.3.4d.Linux_x86_64. 5. If HTSeq-0.6.1p1 is present in the /path/to/programs, the script will install HTSeq at first time you run the script. Or you can install HTSeq beforehand. sudo apt-get install build-essential python2.7-dev python-numpy python-matplotlib python-pysam python-htseq 6. The followings files and folders must be placed in the /path/to/programs. ctat_genome_lib_build_dir STAR-Fusion-v1.2.0 picard-tools/picard.jar picard-tools/GenomeAnalysisTK.jar picard-tools/fgbio.jar (for umi mode) annovar hg19 (see below) dbsnp_138.hg19.vcf target_intervals.bed cufflink.mask.gtf dbSNPnew cosmic clinvar genecode.v19.annotation.gtf genesymbol.table.txt gene_cytoband.txt stringtie-1.3.4d.Linux_x86_64 hg19.coding.gtf (genecode.v19.annotation.gtf could be downloaded from http://www.gencodegenes.org/releases/19.html) fgbio.jar can be installed by CONDA. Copy the file to /path/to/programs. For annovar, download the followings filters. perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar refGene humandb/ perl annotate_variation.pl -buildver hg19 -downdb cytoBand humandb/ perl annotate_variation.pl -buildver hg19 -downdb genomicSuperDups humandb/ perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar esp6500siv2_all humandb/ perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar 1000g2014oct humandb/ perl annotate_variation.pl -buildver hg19 -downdb phastConsElements46way humandb/ perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar clinvar_20150330 humandb/ perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar cosmic70 humandb/ perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar dbnsfp30a humandb/ perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar exac03nontcga humandb/ Download left-normalized avsnp (hg19_avsnp150.txt) from ANNOVAR site. Use the script dbsnpnew.sh cosmicnew.sh and clinvarnew.sh to make dbSNP cosmic and clinvar files. These files are tab-delimited. The format should look like this: rs990360547 chr10-100000035C-T rs946241909 chr10-100000101T-C rs923734382 chr10-100000126A-G rs894795669 chr10-100000179C-T Some of the files can be downloaded here. Copy bam-readcount and cufflinks to /usr/local/bin. You can make the bam-readcount or dowload the prebuilt one here. The cufflinks can be downloaded from their website. sudo cp /path/to/bam-readcount /usr/local/bin sudo cp /path/to/cufflinks /usr/local/bin Download STAR-Fusion Release v1.2.0 from its website and install the software by typing sudo make in the directory. These modules may be required. perl -MCPAN -e shell install DB_File install URI::Escape install Set::IntervalTree install Carp::Assert install JSON::XS install PerlIO::gzip Create mask file for Cufflinks using the following command: cat /path/to/programs/gencode.v19.annotation.gtf | awk ' !/protein_coding/ || /rRNA/ || /tRNA/ || /5S/ || /chrM/ {print}' > /path/to/programs/cufflink.mask.gtf Create hg19.coding.gtf using the following command: cat /path/to/programs/gencode.v19.annotation.gtf | awk /protein_coding/ | sed /rRNA/d | sed /tRNA/d | sed /5S/d > /path/to/programs/hg19.coding.gtf Split the target.interals by chromosomes. awk -F '\t' '{print $0 >> "/path/to/programs/target_intervals/" $1 ".bed"}' /path/to/programs/target_intervals.bed 7. Write the STAR commands to produce Chimeric.out.junction into /path/to/programs/STAR-Fusion-v1.2.0/star-fusion.config if you want to run STAR-Fusion. The additional options for STAR-Fusion are written into /path/to/programs/STAR-Fusion-v1.2.0/star-fusion2.config. Write the additional cufflinks commands to /path/to/programs/cufflinks-2.2.1/cufflinks.config. Write the additional stringtie commands to /path/to/programs/stringtie-1.3.4d.Linux_x86_64/stringtie.config. Only the first line of these files will be read. 8. Install Arriba to /path/to/programs/arriba. 9. The rnaseq files must be paired end, and named as samplename_1.suffix and samplename_2.suffix. The suffix must be one of the fastq/fq/fastq.gz/fq.gz. 10. In order to be compatible for SOAPfuse, the RNAseq files must be placed in the following paths: /path/to/inputfile/samplename/Lib/samplename_1.suffix /path/to/inputfile/samplename/Lib/samplename_2.suffix [RUNNING] Syntax: sh rna_test.sh samplename suffix -options options: -sb: keep sorted BAM -sam: keep SAM -dt: delete temporary files -nk: no keep temporary files in process -sk#: skip procedure # [1 to 6] -s: shutdown after finished -skv: skip variant calling -skc: skip Cufflink -skh: skip HTseq count -sks: skip Stringtie -skb: skip BQSR -ska: skip arriba -dcov: use -dcov in SplitNCigarReads ex1: sh rna_test.sh test1 fastq.gz -skv -nk -sb ex2: sh rna_test.sh test2 fastq -sk1 -sk2 -sk3 -sk4 -dcov -dt -s Note: 1. The script is designed to start from the last process. However, please delete the last incomplete temporary file after interrupted, otherwise the results may be incomplete. 2. STAR 1-step alignment is disabled due to the lack of real efficiency. 3. Cufflink and Stringtie use the first STAR alignment files. In order not to duplicate the alignment process, keep temporary files (not use -dt and -nk) until Cufflink or Stringtie has been finished. 4. If not enough memory occurs in Step 6 [PrintReads], disable BQSR function by -skb. 5. If stucked at SplitNCigarReads (step 4 [split reads]), use -dcov to limit the maximum coverage to 100000. [How to build hg19 index] # bwa, picard-tools/picard.jar, samtools must be installed. # download http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz # uncompress the chromFa.tar.gz into ~/hg19 cd ~/hg19 # cat individual chromosome.fa into hg19.fa in the following orders: cat chrM.fa chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa chr6.fa chr7.fa chr8.fa chr9.fa chr10.fa chr11.fa chr12.fa chr13.fa chr14.fa chr15.fa chr16.fa chr17.fa chr18.fa chr19.fa chr20.fa chr21.fa chr22.fa chrX.fa chrY.fa chr1_gl000191_random.fa chr1_gl000192_random.fa chr4_ctg9_hap1.fa chr4_gl000193_random.fa chr4_gl000194_random.fa chr6_apd_hap1.fa chr6_cox_hap2.fa chr6_dbb_hap3.fa chr6_mann_hap4.fa chr6_mcf_hap5.fa chr6_qbl_hap6.fa chr6_ssto_hap7.fa chr7_gl000195_random.fa chr8_gl000196_random.fa chr8_gl000197_random.fa chr9_gl000198_random.fa chr9_gl000199_random.fa chr9_gl000200_random.fa chr9_gl000201_random.fa chr11_gl000202_random.fa chr17_ctg5_hap1.fa chr17_gl000203_random.fa chr17_gl000204_random.fa chr17_gl000205_random.fa chr17_gl000206_random.fa chr18_gl000207_random.fa chr19_gl000208_random.fa chr19_gl000209_random.fa chr21_gl000210_random.fa chrUn_gl000211.fa chrUn_gl000212.fa chrUn_gl000213.fa chrUn_gl000214.fa chrUn_gl000215.fa chrUn_gl000216.fa chrUn_gl000217.fa chrUn_gl000218.fa chrUn_gl000219.fa chrUn_gl000220.fa chrUn_gl000221.fa chrUn_gl000222.fa chrUn_gl000223.fa chrUn_gl000224.fa chrUn_gl000225.fa chrUn_gl000226.fa chrUn_gl000227.fa chrUn_gl000228.fa chrUn_gl000229.fa chrUn_gl000230.fa chrUn_gl000231.fa chrUn_gl000232.fa chrUn_gl000233.fa chrUn_gl000234.fa chrUn_gl000235.fa chrUn_gl000236.fa chrUn_gl000237.fa chrUn_gl000238.fa chrUn_gl000239.fa chrUn_gl000240.fa chrUn_gl000241.fa chrUn_gl000242.fa chrUn_gl000243.fa chrUn_gl000244.fa chrUn_gl000245.fa chrUn_gl000246.fa chrUn_gl000247.fa chrUn_gl000248.fa chrUn_gl000249.fa > hg19.fa # build index with following commands: bwa index -a bwtsw -p hg19 hg19.fa java -jar /path/to/programs/picard-tools-1.134/picard.jar CreateSequenceDictionary R=/path/to/programs/hg19/hg19.fa O=/path/to/programs/hg19/hg19.dict # creat individual dict for each chr java -jar ~/picard-tools-1.134/picard.jar CreateSequenceDictionary R=~/hg19/chr1.fa O=~/hg19/chr1.dict # etc. samtools faidx /path/to/programs/hg19/hg19.fa Download the index file for STAR from its website (GRCh37_gencode_v19_CTAT_lib_Nov012017.plug-n-play.tar.gz). Unzip it to /path/to/programs/ctat_genome_lib_build_dir. Then overwrite the following subdirectory. STAR --runMode genomeGenerate --genomeDir /path/to/programs/ctat_genome_lib_build_dir/ref_genome.fa.star.idx --genomeFastaFiles /path/to/programs/hg19/hg19.fa --runThreadN N ln -sf /path/to/programs/hg19/hg19.fa /path/to/programs/ctat_genome_lib_build_dir/ref_genome.fa ln -sf /path/to/programs/hg19/hg19.fa.fai /path/to/programs/ctat_genome_lib_build_dir/ref_genome.fa.fai