| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| exome_filter.sh | 2021-02-09 | 9.7 kB | |
| README | 2018-01-15 | 2.0 kB | |
| ExAC.id.gz | 2017-09-11 | 7.1 MB | |
| Totals: 3 Items | 7.1 MB | 0 |
# exome_filter.sh
# Author: Chin-Chen Pan
# Directore, General and Surgical Pathology
# Professor, attending pathologist
# Department of Pathology and Laboratory Medicine
# Taipei Veterans General Hospital
# TAIWAN
# Version 2.1.1
# Date: Jan 15, 2017
[Introduction]
A script to filter exome-seq by 1000G, ExAc, dbSNP with minimal coverage and T/N ratio.
The script uses the files produced by exome_test.sh.
The filter process includes:
Variants accepted if called by both HaplotypeCaller and Varscan.
If called by either one caller, the filter includes: >minimal coverage, >minimal T/N ratio
Then all variants filtered by non-1000g, non-ExAC and non-dbSNP.
[Before running]
1. Prepare exome_test.config. The file contains four words in one line. No other words and lines are allowed.
/path/to/programs /path/to/inputfile /path/to/outputfile thread_number
ex1:
/home/user_name /media/user_name/disk1/input /home/user_name/output 8
ex2:
~ ~/input ~/output 8
2. The followingsfolders must be placed in the /path/to/programs.
ExAC.id
ExAC.id is a file with variations of MAF<0.01%. It can be produced by using ANNOVAR file or from ExAC_nonTCGA.r0.3.1.sites.vep.vcf.gz downloaded from ExAC site (the former is better).
cat /path/to/annovar/humandb/hg19_exac03nontcga.txt | sed '1d' | awk -F '\t' '$6>0.0001 {print $1 "-" $2 $4 "-" $5 "\t" $6}' | sed '/e-/d' | sed 's/^/chr/g' | sort -t "`echo -e "\t"`" -u -k1,1 > ExAC.id
zcat ./ExAC_nonTCGA.r0.3.1.sites.vep.vcf.gz | sed '/^#/!s/^/chr/g' | sed 's/chrMT/chrM/g' | sed '/AF=.....e-05/d' | awk -F '\t' ' {for (i=1; i<=split($5, a, ","); ++i) print $1 "-" $2 $4 "-" a[i] }' | sort -u > ./ExAC.id
[RUNNING]
Syntax: sh exome_filter.sh sample_name(requird) minimal coverage(requird) minimal T/N ratio(required)
options:
-s: shutdown after finished
-kt: keep temporary files
ex:
sh exome_filter.sh T1 20 0.2 -kt