Full instructions are in the file doc/README, in doc.tar.gz .
NOTE: autoAnnotate.dbi calls wu-blastp. A version of wu-blastp is available at
http://www.advbiocomp.com/blast/obsolete/
However, versions from that site have not been tested with autoAnnotate.
Neither has autoAnnotate been tested with NCBI blastp.
QUICKSTART
To install JCVI autoAnnotate v2.4, May 17 2010, on Linux:
1. Install Perl modules from www.cpan.org.
To install DBD::File on Debian:
apt-get install lib-dbd-file-perl
To install DBD::File using an installed cpan:
perl -MCPAN -e shell
or
cpan
then
install DBD::File
install DBD::SQLite
etc.
If the install fails due to some tests failing, try e.g.,
force install DBD::File
If these methods fail, download the packages from CPAN, untar them,
find the .pm files inside the packages (probably inside a lib directory),
them move them into src/lib, maintaining their directory structure
(e.g., for DBD::File, File.pm goes into src/lib/DBD/File.pm).
These are the Perl modules to install:
Carp
CDB_File
Config::IniFiles
Cwd
DBD::File
DBD::SQLite
DBI
English
File::Basename
File::Which
FindBin
Getopt::Long
List::Util
Log::Log4perl
Pod::Usage
Storable
Text::CSV_XS
threads (optional)
Thread::Queue (optional)
Tree::Trie
These Perl modules are included in the src/lib directory, so you
shouldn't need to install them; but be aware that you may be using
a different version of them now:
Algorithm::Diff
Blast::BlastHitDataType
DBD::CSV
String::Diff
2. Download and install tmhmm 2.0.
(You can skip this step, but the automatic self-test in step 4 will fail.)
According to http://www.cbs.dtu.dk/services/TMHMM/:
"Would you prefer to run TMHMM at your own site? TMHMM 2.0 is available as a
stand-alone software package, with the same functionality as the service
above. Ready-to-ship packages exist for the most common UNIX platforms. There
is a download page for academic users; other users are requested to contact
CBS Software Package Manager at software@cbs.dtu.dk."
Manual: http://www.csc.fi/english/research/sciences/bioscience/programs/tmhmm/tmhmm_manual
Installation instructions: http://www.cbs.dtu.dk/services/doc/tmhmm-2.0c.readme
3. Download and install signalp 3.0 from http://www.cbs.dtu.dk/services/SignalP/.
(You can skip this step, but the automatic self-test in step 4 will fail.)
4. Download the software from https://sourceforge.net/projects/prokfunautoanno/.
Then
tar xvzf src.tar.gz
./unpack.bash
This will run autoAnnotate.dbi on the organism ann_test1, produce an
output file, and diff it with out/sqlite-ann_test1-100519sp.dat .
There should be no differences. See doc/README if there are.
The file src/anno-sqlite.bash can be used like this:
./anno-sqlite.bash ann_test1 100520 gram-
to run AutoAnnotate on the genome identified as "ann_test1" (in
the data/genomes directory, and in either data/db/SQLite
or data/db/CSV). The anno-sqlite.bash script calls
autoAnnotate.dbi with some useful arguments.
You can run it with default arguments like this:
perl autoAnnotate.dbi -D ann_test1
In that case, it will not run SignalP, and will guess whether the organism
is gram+ or gram- using an unvalidated method that looks for homology
to genes found primarily in gram+ or gram- bacteria.
If you have not installed tmhmm, you can run AutoAnnotate like this:
perl autoAnnotate.dbi -D ann_test1 -notmhmm
IMPROVEMENTS
5. To use uniprot instead of swissprot for loading annotations,
or to update to a new UniProt release,
cd .. (to the AutoAnnotate root directory)
mkdir data/uniprot
Go to ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/
and download uniprot_sprot.dat.gz and uniprot_trembl.dat.gz into data/uniprot.
Then
gunzip data/uniprot/*
cd ../src
perl makeUniprot -tr -spf ../data/uniprot/uniprot_sprot.dat -trf ../data/uniprot/uniprot_trembl.dat
This will produce a 5G file
data/db/SQLite/uniprot
containing some of the information from Uniprot (Swissprot + Trembl).
6. To run your own genomes, you'll probably want to figure out
some way to run hmmer2 and BLAST on a large compute grid. If you
run autoAnnotate without generating the hmmer and BLAST output files
(and set -makedata), it will call blast and hmmpfam itself
(which you will need to install first). This will take weeks or
months on a single machine. NOTE: autoAnnotate.dbi calls wu-blastp,
which is no longer available anywhere. It hasn't been tested with
NCBI blastp.
7. SourceForge has a January 2010 release of PANDA.
If we release a new version, you can download panda from JCVI
at ftp://ftp.jcvi.org/pub/data/panda (login as user 'anonymous'),
then gunzip and untar it into data/panda.
cd data
mkdir panda
cd panda
(get allgroupWithTables090810.tar.gz and indices-allgroup090810.tar.gz)
tar xvf all*.tar.gz
tar xvf ind*.tar.gz
8. Download bioname, if you would like to use it (via applyBioName.pl),
from https://sourceforge.net/projects/microbiomeutil/files/
into the src directory, then
tar xvf bioname-06072009.tar.gz
bioname attempts to clean up protein names.
9. Download and install coils (usable by specifying -coils on the
autoAnnotate command line) from
ftp://ftp.ebi.ac.uk/pub/software/unix/coils-2.2/.
Documentation at http://www.ch.embnet.org/software/coils/COILS_doc.html.
Once downloaded, compile coils with these commands:
tar xvf ncoils.tar.gz
cd coils
cc -O2 -I. -o ncoils ncoils.c read_matrix.c -lm
mv ncoils <somewhere on your executable path>
cd ..
Now note that you are not moving the top-level coils directory, but only the
directory containing the source code and (what matters to us) the data files:
mv coils <path>/anno/data/coils
Run coils in AutoAnnotate by adding the command-line parm -coils
when calling autoAnnotate.dbi .
10. The file makeDBs.pl can be used to update the data files.
Read that file for more information. Make a backup of the data/db/SQLite
directory before trying to use it.
Phil Goetz
pgoetz @ jcvi.org