Name | Modified | Size | Downloads / Week |
---|---|---|---|
exome_test.sh | 2021-04-19 | 132.6 kB | |
README | 2021-02-17 | 6.7 kB | |
cosmicnew.sh | 2021-02-17 | 2.8 kB | |
dbsnpnew.sh | 2021-02-08 | 2.4 kB | |
clinvarnew.sh | 2021-02-08 | 905 Bytes | |
gene_cytoband.txt | 2015-12-13 | 1.3 MB | |
bam-readcount.tar.gz | 2015-11-15 | 13.0 MB | |
hg19.refSeq.sorted.txt.gz | 2015-11-15 | 3.3 MB | |
target_intervals.bed.gz | 2015-11-15 | 5.4 MB | |
exome_test.config | 2015-11-15 | 70 Bytes | |
Totals: 10 Items | 23.1 MB | 0 |
# exome_test.sh # Author: Chin-Chen Pan # Directore, General and Surgical Pathology # Professor, attending pathologist # Department of Pathology and Laboratory Medicine # Taipei Veterans General Hospital # TAIWAN # Version 12.3.1 # Date: Feb 8, 2021 [Introduction] exome_test.sh is a shell script to run GATK best practice and varscan for variant-calling in exomseq. It uses bwa for alignment, HaplotypeCaller, (and UnifiedGenotyper) and varscan to call variants, and Annovar to annotate. It also employs DepthofCoverage and BAM-readcount. If sample.umi.fastq or sample.umi.fastq.gz is inside the input folder, the program will switch to UMI mode in [STEP 3 Mark duplicates]. [Before running] 1. Prepare exome_test.config. The file contains four words in one line. No other words and lines are allowed. /path/to/programs /path/to/inputfile /path/to/outputfile thread_number ex: /home/user_name /media/user_name/disk1/input /home/user_name/output 8 2. bwa, default-jre (version 1.8) and samtools must be installed. sudo apt-get install bwa sudo apt-get install openjdk-8-jdk apt-get install samtools 3. The followings files and folders must be placed in the /path/to/programs. picard-tools/picard.jar picard-tools/GenomeAnalysisTK.jar picard-tools/fgbio.jar (for umi mode) VarScan.v2.3.9.jar annovar hg19 (see below) dbsnp_138.hg19.vcf target_intervals hg19.refSeq.sorted.txt dbSNPnew cosmic clinvar gene_cytoband.txt For annovar, download the followings filters. perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar refGene humandb/ perl annotate_variation.pl -buildver hg19 -downdb cytoBand humandb/ perl annotate_variation.pl -buildver hg19 -downdb genomicSuperDups humandb/ perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar esp6500siv2_all humandb/ perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar 1000g2014oct humandb/ perl annotate_variation.pl -buildver hg19 -downdb phastConsElements46way humandb/ perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar clinvar_20150330 humandb/ perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar cosmic70 humandb/ perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar dbnsfp30a humandb/ perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar exac03nontcga humandb/ fgbio.jar can be installed by CONDA. Copy the jar file to /path/to/programs. Download left-normalized avsnp (hg19_avsnp150.txt) from ANNOVAR site. Use the script dbsnpnew.sh cosmicnew.sh and clinvarnew.sh to make dbSNP cosmic and clinvar files. These files are tab-delimited. The format should look like this: rs990360547 chr10-100000035C-T rs946241909 chr10-100000101T-C rs923734382 chr10-100000126A-G rs894795669 chr10-100000179C-T Copy bam-readcount to /usr/local/bin. You can make the bam-readcount or dowload the prebuilt one here. sudo cp /path/to/bam-readcount /usr/local/bin Split the target.interals by chromosomes. awk -F '\t' '{print $0 >> "/path/to/programs/target_intervals/" $1 ".bed"}' /path/to/programs/target_intervals.bed 4. The exomeseq files must be paired end, and named as samplename_1.suffix and samplename_2.suffix. The suffix must be one of the fastq/fq/fastq.gz/fq.gz. 5. In order to be compatible with SOAPfuse, the exomeseq files must be placed in the following paths: /path/to/inputfile/samplename/Lib/samplename_1.suffix /path/to/inputfile/samplename/Lib/samplename_2.suffix [RUNNING] Syntax: sh exome_test.sh samplename suffix -options options: -pu: make pileup file in Varscan -dv: delete varscan.annovar -ht: use HaplotypeCaller (default) -ug: use UnifiedGenotyper besides HaplotypeCaller -ni: no intervals restriction (do not use target_intervals.bed) -vb: -B in varscan (less stringent) -ks: keep SAM -s: shutdown after finished -dt: delete temporary files after finished -nk: no keep temporary files in process -sk#: skip procedure # [1 to 8] -sk18: skip procedure 1-8 -skd: skip DepthofCoverage -spb: keep splitted bams for further use (e.g. itd_test.sh) ex1: sh exome_test.sh test1 fastq.gz -pu -dv -nk -ni ex2: sh exome_test.sh test2 fastq -sk1 -sk2 -sk3 -sk4 -ug -dt -s Note: The script is designed to start from the last process. However, please delete the last incomplete temporary file after interrupted, otherwise the results may be incomplete. [How to build hg19 index] # bwa, picard-tools/picard.jar, samtools must be installed. # download http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz # uncompress the chromFa.tar.gz into ~/hg19 cd ~/hg19 # cat individual chromosome.fa into hg19.fa in the following order: cat chrM.fa chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa chr6.fa chr7.fa chr8.fa chr9.fa chr10.fa chr11.fa chr12.fa chr13.fa chr14.fa chr15.fa chr16.fa chr17.fa chr18.fa chr19.fa chr20.fa chr21.fa chr22.fa chrX.fa chrY.fa chr1_gl000191_random.fa chr1_gl000192_random.fa chr4_ctg9_hap1.fa chr4_gl000193_random.fa chr4_gl000194_random.fa chr6_apd_hap1.fa chr6_cox_hap2.fa chr6_dbb_hap3.fa chr6_mann_hap4.fa chr6_mcf_hap5.fa chr6_qbl_hap6.fa chr6_ssto_hap7.fa chr7_gl000195_random.fa chr8_gl000196_random.fa chr8_gl000197_random.fa chr9_gl000198_random.fa chr9_gl000199_random.fa chr9_gl000200_random.fa chr9_gl000201_random.fa chr11_gl000202_random.fa chr17_ctg5_hap1.fa chr17_gl000203_random.fa chr17_gl000204_random.fa chr17_gl000205_random.fa chr17_gl000206_random.fa chr18_gl000207_random.fa chr19_gl000208_random.fa chr19_gl000209_random.fa chr21_gl000210_random.fa chrUn_gl000211.fa chrUn_gl000212.fa chrUn_gl000213.fa chrUn_gl000214.fa chrUn_gl000215.fa chrUn_gl000216.fa chrUn_gl000217.fa chrUn_gl000218.fa chrUn_gl000219.fa chrUn_gl000220.fa chrUn_gl000221.fa chrUn_gl000222.fa chrUn_gl000223.fa chrUn_gl000224.fa chrUn_gl000225.fa chrUn_gl000226.fa chrUn_gl000227.fa chrUn_gl000228.fa chrUn_gl000229.fa chrUn_gl000230.fa chrUn_gl000231.fa chrUn_gl000232.fa chrUn_gl000233.fa chrUn_gl000234.fa chrUn_gl000235.fa chrUn_gl000236.fa chrUn_gl000237.fa chrUn_gl000238.fa chrUn_gl000239.fa chrUn_gl000240.fa chrUn_gl000241.fa chrUn_gl000242.fa chrUn_gl000243.fa chrUn_gl000244.fa chrUn_gl000245.fa chrUn_gl000246.fa chrUn_gl000247.fa chrUn_gl000248.fa chrUn_gl000249.fa > hg19.fa # build index with following commands: bwa index -a bwtsw -p hg19 hg19.fa java -jar ~/picard-tools/picard.jar CreateSequenceDictionary R=~/hg19/hg19.fa O=~/hg19/hg19.dict # creat individual dict for each chr java -jar ~/picard-tools/picard.jar CreateSequenceDictionary R=~/hg19/chr1.fa O=~/hg19/chr1.dict # etc. samtools faidx ~/hg19/hg19.fa