########################################
# HOW TO GET STARTED WITH RE-ANNOTATOR #
########################################
A. Things to do before the first time using Re-Annotator
I) Install External Programs
Following programs should be installed on your system prior to running the Re-Annotator:
a) PERL
b) BWA (http://sourceforge.net/projects/bio-bwa/files/)
c) SAMtools (https://sourceforge.net/projects/samtools/files/)
d) Annovar (http://www.openbioinformatics.org/annovar/)
II) Get External Data
a) Reference Genome Sequence
- Download the reference genome, e.g., hg19
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
- unzip (using gzip)
make sure every chromosome is in a single file
b) Gene Database
- Download information on gene locations, e.g., RefSeq
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz
- unzip (using gzip)
c) Make sure desired databases for annotation are available in Annovar:
These can be for instance
- RefGene
./annotate_variation.pl -downdb refGene -buildver hg19 humandb/
- snpdatabase, e.g., snp129 (available from annovar)
./annotate_variation.pl -downdb snp129 -buildver hg19 -webfrom annovar humandb/
III) Generate the mRNA Reference Sequence
a) Execute new_exomeBuilding.pl to build the mRNA reference sequence
example:
$> ./BuildExomeReference.pl -i ~/ReAnnotator/refGene.txt -o ~/ReAnnotator/exomeRef/hg19exome -r ~/ReAnnotator/hg19/
b) Use "BWA index" to generate the BWA index files for
i) the exome reference sequence
ii) the whole genome reference sequence
IV) Complete the config.sh
* provide exact locations to the external programs
in the file config.sh
* config.sh is also the place to change the default settings
- # of CPUs
- # of mismatches
- genome version, e.g., gh19, hg18, mm9, ...
- genedb (refGene, ensemble, ...)
- snpdb (snp135, snp136, ...)
NOTE: step III) has to be carried out every time the Gene Database is updated!
NOTE: update config.sh to meet the needs for your Re-Annotation
B. Things to do at every run
I) Check that config.sh is set up correctly for the genome and exome you are about to use
- regenerate the mRNA reference if you are using an updated database version or new genome release
II) Convert probes file into a fasta file
For Illumina probe files the script parse_fastaFromOriginIlmnAnno.pl can be used.
III)
execute: ./ReAnnotator.sh
with corresponding parameters
example:
./ReAnnotator.sh my_illumina_probes.fasta ~/ReAnnotator/exomeRef/hg19exome_inclUTR.fasta ~/ReAnnotator/hg19/hg19genome.fasta.gz ~/ReAnnotator/refGene.txt ~/ReAnnotator/outputs/ ~/ReAnnotator/tmp/
Prior to running the script, make sure that all the directories exist.
The script will call following scripts in a row:
a) run_realignment.sh
b) run_coordinateConversion.sh
c) run_genome_snp_annotation.sh