Download Latest Version hg19.refSeq.sorted.txt.gz (3.3 MB)
Email in envelope

Get an email when there's a new version of exome-test

Home
Name Modified Size InfoDownloads / Week
exome_test.sh 2021-04-19 132.6 kB
README 2021-02-17 6.7 kB
cosmicnew.sh 2021-02-17 2.8 kB
dbsnpnew.sh 2021-02-08 2.4 kB
clinvarnew.sh 2021-02-08 905 Bytes
gene_cytoband.txt 2015-12-13 1.3 MB
bam-readcount.tar.gz 2015-11-15 13.0 MB
hg19.refSeq.sorted.txt.gz 2015-11-15 3.3 MB
target_intervals.bed.gz 2015-11-15 5.4 MB
exome_test.config 2015-11-15 70 Bytes
Totals: 10 Items   23.1 MB 0
# exome_test.sh
# Author: Chin-Chen Pan
# Directore, General and Surgical Pathology
# Professor, attending pathologist
# Department of Pathology and Laboratory Medicine
# Taipei Veterans General Hospital
# TAIWAN
# Version 12.3.1
# Date: Feb 8, 2021

[Introduction]

exome_test.sh is a shell script to run GATK best practice and varscan for variant-calling in exomseq. It uses bwa for alignment, HaplotypeCaller, (and UnifiedGenotyper) and varscan to call variants, and Annovar to annotate. It also employs DepthofCoverage and BAM-readcount.

If sample.umi.fastq or sample.umi.fastq.gz is inside the input folder, the program will switch to UMI mode in [STEP 3 Mark duplicates].

[Before running]

1. Prepare exome_test.config. The file contains four words in one line. No other words and lines are allowed.
	/path/to/programs /path/to/inputfile /path/to/outputfile thread_number

	ex: 
	/home/user_name	/media/user_name/disk1/input /home/user_name/output 8

2. bwa, default-jre (version 1.8) and samtools must be installed.

	sudo apt-get install bwa
	sudo apt-get install openjdk-8-jdk
	apt-get install samtools

3. The followings files and folders must be placed in the /path/to/programs.

	picard-tools/picard.jar
	picard-tools/GenomeAnalysisTK.jar 
        picard-tools/fgbio.jar (for umi mode)
	VarScan.v2.3.9.jar
	annovar
	hg19 (see below)
	dbsnp_138.hg19.vcf
	target_intervals
	hg19.refSeq.sorted.txt
	dbSNPnew
	cosmic
	clinvar
	gene_cytoband.txt
   
    For annovar, download the followings filters.
	perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar refGene humandb/
	perl annotate_variation.pl -buildver hg19 -downdb cytoBand humandb/
	perl annotate_variation.pl -buildver hg19 -downdb genomicSuperDups humandb/ 
	perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar esp6500siv2_all humandb/
	perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar 1000g2014oct humandb/
	perl annotate_variation.pl -buildver hg19 -downdb phastConsElements46way humandb/
	perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar clinvar_20150330 humandb/
	perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar cosmic70 humandb/
        perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar dbnsfp30a humandb/
        perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar exac03nontcga humandb/
   
    fgbio.jar can be installed by CONDA. Copy the jar file to /path/to/programs.


    Download left-normalized avsnp (hg19_avsnp150.txt) from ANNOVAR site.
     
    Use the script dbsnpnew.sh cosmicnew.sh and clinvarnew.sh to make dbSNP cosmic and clinvar files. 

    These files are tab-delimited. The format should look like this:

		rs990360547	chr10-100000035C-T
		rs946241909	chr10-100000101T-C
		rs923734382	chr10-100000126A-G
		rs894795669	chr10-100000179C-T

    Copy bam-readcount to /usr/local/bin. You can make the bam-readcount or dowload the prebuilt one here.
 
    	sudo cp /path/to/bam-readcount /usr/local/bin

    Split the target.interals by chromosomes.

        awk -F '\t' '{print $0 >> "/path/to/programs/target_intervals/" $1 ".bed"}' /path/to/programs/target_intervals.bed 

4. The exomeseq files must be paired end, and named as samplename_1.suffix and samplename_2.suffix. The suffix must be one of the fastq/fq/fastq.gz/fq.gz.

5. In order to be compatible with SOAPfuse, the exomeseq files must be placed in the following paths:

	 /path/to/inputfile/samplename/Lib/samplename_1.suffix
    	 /path/to/inputfile/samplename/Lib/samplename_2.suffix

[RUNNING]

Syntax:  sh exome_test.sh samplename suffix -options
	
	options:
	-pu: make pileup file in Varscan	
	-dv: delete varscan.annovar
	-ht: use HaplotypeCaller (default)
	-ug: use UnifiedGenotyper besides HaplotypeCaller
        -ni: no intervals restriction (do not use target_intervals.bed)
        -vb: -B in varscan (less stringent)
        -ks: keep SAM
	 -s: shutdown after finished
	-dt: delete temporary files after finished
	-nk: no keep temporary files in process
	-sk#: skip procedure # [1 to 8]
        -sk18: skip procedure 1-8
        -skd: skip DepthofCoverage
        -spb: keep splitted bams for further use (e.g. itd_test.sh)

	ex1:
	  sh exome_test.sh test1 fastq.gz -pu -dv -nk -ni
	ex2:
	  sh exome_test.sh test2 fastq -sk1 -sk2 -sk3 -sk4 -ug -dt -s

Note: The script is designed to start from the last process. However, please delete the last incomplete temporary file after interrupted, otherwise the results may be incomplete.

[How to build hg19 index]

# bwa, picard-tools/picard.jar, samtools must be installed.

# download http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz 

# uncompress the chromFa.tar.gz into ~/hg19

cd ~/hg19

# cat individual chromosome.fa into hg19.fa in the following order:

cat chrM.fa chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa chr6.fa chr7.fa chr8.fa chr9.fa chr10.fa chr11.fa chr12.fa chr13.fa chr14.fa chr15.fa chr16.fa chr17.fa chr18.fa chr19.fa chr20.fa chr21.fa chr22.fa chrX.fa chrY.fa chr1_gl000191_random.fa chr1_gl000192_random.fa chr4_ctg9_hap1.fa chr4_gl000193_random.fa chr4_gl000194_random.fa chr6_apd_hap1.fa chr6_cox_hap2.fa chr6_dbb_hap3.fa chr6_mann_hap4.fa chr6_mcf_hap5.fa chr6_qbl_hap6.fa chr6_ssto_hap7.fa chr7_gl000195_random.fa chr8_gl000196_random.fa chr8_gl000197_random.fa chr9_gl000198_random.fa chr9_gl000199_random.fa chr9_gl000200_random.fa chr9_gl000201_random.fa chr11_gl000202_random.fa chr17_ctg5_hap1.fa chr17_gl000203_random.fa chr17_gl000204_random.fa chr17_gl000205_random.fa chr17_gl000206_random.fa chr18_gl000207_random.fa chr19_gl000208_random.fa chr19_gl000209_random.fa chr21_gl000210_random.fa chrUn_gl000211.fa chrUn_gl000212.fa chrUn_gl000213.fa chrUn_gl000214.fa chrUn_gl000215.fa chrUn_gl000216.fa chrUn_gl000217.fa chrUn_gl000218.fa chrUn_gl000219.fa chrUn_gl000220.fa chrUn_gl000221.fa chrUn_gl000222.fa chrUn_gl000223.fa chrUn_gl000224.fa chrUn_gl000225.fa chrUn_gl000226.fa chrUn_gl000227.fa chrUn_gl000228.fa chrUn_gl000229.fa chrUn_gl000230.fa chrUn_gl000231.fa chrUn_gl000232.fa chrUn_gl000233.fa chrUn_gl000234.fa chrUn_gl000235.fa chrUn_gl000236.fa chrUn_gl000237.fa chrUn_gl000238.fa chrUn_gl000239.fa chrUn_gl000240.fa chrUn_gl000241.fa chrUn_gl000242.fa chrUn_gl000243.fa chrUn_gl000244.fa chrUn_gl000245.fa chrUn_gl000246.fa chrUn_gl000247.fa chrUn_gl000248.fa chrUn_gl000249.fa > hg19.fa

# build index with following commands:

bwa index -a bwtsw -p hg19 hg19.fa
java -jar ~/picard-tools/picard.jar CreateSequenceDictionary R=~/hg19/hg19.fa O=~/hg19/hg19.dict
# creat individual dict for each chr
java -jar ~/picard-tools/picard.jar CreateSequenceDictionary R=~/hg19/chr1.fa O=~/hg19/chr1.dict # etc.
samtools faidx ~/hg19/hg19.fa
Source: README, updated 2021-02-17