Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
translateVCF.tar.bz2 | 2012-06-12 | 18.6 kB | |
readme.txt | 2012-05-01 | 3.9 kB | |
Totals: 2 Items | 22.5 kB | 0 |
translateVCF.pl INSTALL: Simply unpack the tarball to a directory of your choosing. Ensure that you keep the three items in the same directory as all 3 are required by the program. This program is tried and tested on Linux and Mac OS X but won't currently work on Windows (if anyone really needs this then it should be possible to rewrite for Windows). You may need to install some additional perl modules before it will run - this should be easily acheivable using CPAN. INFO: translateVCF.pl is a script to annotate VCF files with the functional consequences of variants with regard to RefSeq, Ensembl or UCSC known gene transcripts. It will output the vcf input with an additional column at the start of each record giving the variant consequence of the mutation. These consequences may be one of the following: -intergenic -intronic -non-coding -splice_consensus (i.e. not in invariant splice site but in consensus splice site - i.e. 3 bases preceding exon, first base of exon, last 3 bases of exon or 3-6 bases after an exon) -splicing (invariant donor or acceptor splice position) -synonymous -missense -nonsense -stop_loss -non-frameshift_deletion -frameshift_deletion -non-frameshift_insertion -frameshift_insertion The first column will give information for each transcript overlapping the variant in the approximate format: consequence:gene_symbol:transcript_id:dna_position:protein_position (if relevant) e.g. missense:FGGY:NM_001113411:c.129T>G,p.N43K e.g. non-coding:L1TD1:NM_019079:3'UTR e.g. intergenic e.g. intronic/splice_consensus:EFCAB7:NM_032437:c.804+4T>C Entries for different transcripts are separated with two colons (i.e. "::"). The translateVCF.pl should be able to address most people's needs, but both the SortGenomicCoordinates.pm and extractCDS.pl contain their own documentation accessible using the perldoc command. If you wish to create a CDS reference file for encode annotations for use with translateVCF.pl then use extractCDS.pl script as instructed with the --refgene flag in place using either wgEncodeGencodeBasicV11_Cds.txt.gz or wgEncodeGencodeCompV11.txt.gz files downloaded from hgdownload.cse.ucsc.edu. RUNNING: View instructions on how to use translateVCF.pl by entering either 'perldoc translateVCF.pl' or 'translateVCF.pl -m' on the command line. In brief, you will need a reference CDS file that can either be created using the standalone 'extractCDS.pl' script that came in the tarball or by using the --DOWNLOAD_NEW flag when invoking translateVCF.pl. translateVCF.pl will create them or assume them to be in the directory [build_name]/knowngene, [build_version]/refgene or [build_version]/ensgene for UCSC known gene, refseq or ensembl annotations respectively. If using the --DOWNLOAD_NEW flag and you haven't specified a fasta directory for chromosomes then translateVCF.pl will create a directory to download them into at [build_name]/fasta_chromosomes. You can compress or delete these files to save space after extracting the CDS sequences. If you already possess the relevant gene table and chromosome fasta files or wish to download them to a location of your choosing you may wish to symlink to them in the folder you extracted translateVCF.tar.bz2 to. Alternatively, you can specify the location of these file as and when you need them each time with the relevant command line argument. CREDIT: Written by David A. Parry University of Leeds Distributed free of charge under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License Should anyone find this program useful in their research I would appreciate the appropriate credit/acknowledgement! I'm not a programmer by trade but have taken time to teach myself enough to help me solve research problems I encounter. If I've saved you time and effort I'd be grateful for any recognition of my efforts.