Deploying VIGOR3 ---------------- 1. un-tar VIGOR3.tgz. This creates a directory named VIGOR3 containing the VIGOR software and reference databases. $ tar xzvf VIGOR3.tgz -C /mypath 2. define a scratch space for vigor $ # path used here is an example, any directory will do $ mkdir /mypath/VIGOR3/tempspace $ chmod 777 /mypath/VIGOR3/tempspace 3. define a symbolic link for the scratch space $ # symbolic link requires FULL path $ cd /mypath/VIGOR3 $ chmod 777 prod3 $ ln -s /mypath/VIGOR3/tempspace prod3/vigorscratch 4. define symbolic links for external programs $ cd /mypath/VIGOR3 $ ln -s /usr/local/bin/perl prod3/perl $ ln -s /usr/local/bin/blastall prod3/blastall $ ln -s /usr/local/bin/bl2seq prod3/bl2seq $ ln -s /usr/local/bin/formatdb prod3/formatdb $ ln -s /usr/local/bin/fastacmd prod3/fastacmd $ ln -s /usr/local/bin/clustalw prod3/clustalw2 $ ln -s /usr/local/bin/muscle prod3/muscle $ ln -s /usr/local/bin/cd-hit prod3/cd-hit $ chmod 555 prod3 notes: 1. the dbutils directory under prod3 contains utility programs used to support the creation of reference databases for VIGOR 2. muscle and cd-hit are used by programs in "dbutils", they are not required by VIGOR. 3. the adhoc directory under prod3 contains a handful of adhoc programs created during the project, these programs use many of VIGOR's library functions but are not part of VIGOR 4. three additional programs are contained in the prod3 directory a. rna_finder - used by the JCVI pipeline to annotate non- coding genes b. tblUTR - used by the JCVI pipeline to extend gene boundries to include the UTRs c. hmm3Evidence - used by the JCVI pipeline to suppply HMM3 evidence supporting the functional annotation of the gene Running VIGOR3 -------------- Example: $ VIGOR3.pl -d yfv -i samples/westnile.fasta -o test/westnile (sample fasta and output files can be found in the samples directory) Usage: -- allow VIGOR to choose the reference database $ VIGOR3.pl -i inputfasta -o outputprefix -- tell VIGOR which reference database to use $ VIGOR3.pl -d refdb -i inputfasta -o outputprefix Command Line Options: -a auto-select the reference database, equivalent to "-d any", default behavior unless overridden by -d or -G, (-A is a synonym for this option) -d <ref db>, specify the reference database to be used, (-D is a synonym for this option) -e <evalue>, override the default evalue used to identify potential genes, the default is usually 1E-5, but varies by reference database -c <pct ref> minimum coverage of reference product (0-100) required to report a gene, by default coverage is ignored -C complete (linear) genome (do not treat edges as gaps) -0 (zero) complete circular genome (allows gene to span origin) -f <0, 1, or 2>, frameshift sensitivity, 0=low 1=normal 2=high (defaults to 1) -i <input fasta>, path to fasta with genomic sequences to be annotated (-I is a synonym for this option) -l do NOT use locus_tags in TBL file output (incompatible with -L) -L USE locus_tags in TBL file output (incompatible with -l) -o <output prefix>, prefix for outputfile files, e.g. if the ouput prefix is /mydir/anno VIGOR will create output files /mydir/anno.tbl, /mydir/anno.stats, etc., (-O is a synonym for this option) -P <parameter=value~~...~~parameter=value>, override default values of VIGOR parameters -j turn off JCVI rules, JCVI rules treat gaps and ambiguity codes conservatively, use this option to relax these constraints and produce a more speculative annotation -m ignore reference match requirements (coverage/identity/similarity), sometimes useful to evaluate raw contigs and rough draft sequences -s <gene size> minimum size (aa) of product required to report a gene, by default size is ignored Outputs: outputprefix.rpt - summary of program results outputprefix.stats - run statistics (per genome sequence) in tab- delimited format outputprefix.cds - fasta file of predicted CDSs outputprefix.pep - fasta file of predicted proteins outputprefix.tbl - predicted features in GenBank tbl format outputprefix.aln - alignment of predicted protein to reference, and reference protein to genome outputprefix.fs - subset of aln report for those genes with potential sequencing issues outputprefix.at - potential sequencing issues in tab-delimited format Reference Datasets: Name Description (Synonyms) any any virus (vda) cov_abcdx Alpha/Beta/Gamma/Delta/Unclassified Cov* veev Alphaviruses (VEEV/EEEV) (alpha,eeev) bunya Bunyaviridae hanta Bunyaviridae Hantavirus (hantavirus) obunya Bunyaviridae Orthobunyavirus bunya_misc Bunyaviridae miscellaneous gcv Coronavirus (cov) gcv_g1a Coronavirus Group 1A (cov_g1a) gcv_g1b Coronavirus Group 1B (cov_g1b) gcv_g2a Coronavirus Group 2A (cov_g2a) gcv_g2b Coronavirus Group 2B (SARS) (cov_g2b, sars) gcv_g2cd Coronavirus Group 2C & 2D (cov_g2c, cov_g2d gcv_g3 Coronavirus Group 3 (cov_g3) filo Filoviridae (Ebola/Marburg) (ebola, marburg) giv Flu (flu) giv_a Flu A (flu_a) giv_b Flu B (flu_b) giv_c Flu C (flu_c) hrv Human Rhinovirus/Enterovirus (entero, rhino) hadv Human adenovirus hadv_a Human adenovirus A hadv_b Human adenovirus B hadv_c Human adenovirus C hadv_d Human adenovirus D hadv_e Human adenovirus E hadv_f Human adenovirus F hadv_g Human adenovirus G hhv Human herpesvirus+ (hsv) hhv1 Human herpesvirus 1+ (hsv1) hhv2 Human herpesvirus 2+ hhv3 Human herpesvirus 3 (Varicellovirus)+ (var) hhv4 Human herpesvirus 4+ hhv5 Human herpesvirus 5+ msl Measles / Morbillivirus (measles) mpv Metapneumovirus (MPV) mmp Mumps / Rubulavirus (mumps) norv Norovirus (noro) norv_1 Norovirus I (noro1) norv_2 Norovirus II (noro2) norv_misc Norovirus miscellaneous norv_mur Norovirus murine rabies Rabies rsv Respiratory syntactical virus (RSV) respiro Respirovirus (resp) hpiv_1 Respirovirus HPIV-1 (hpiv1) hpiv_3 Respirovirus HPIV-3 (hpiv3) sendai Respirovirus Sendai rtv Rotavirus (rota) rtv_a Rotavirus A (rota_a) rtv_b Rotavirus B (rota_b) rtv_c Rotavirus C (rota_c) rtv_f Rotavirus F (rota_f) rtv_g Rotavirus G (rota_g) rbl Rubella (rubella) sapo Sapovirus yfv Yellow Fever / Japanese encephalitis (JEV) (jev) * non-standard grouping, must be invoked directly, not included in "any virus" via -A or as a subset of other -D specifications + these datasets have not been curated
