Prok Functional AutoAnnotate - Browse /2.4 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
doc.tar.gz	2010-05-21	113.3 kB	0
md5s	2010-05-21	433 Bytes	0
src.tar.gz	2010-05-21	930.0 kB	0
uniprot.tar.gz	2010-05-21	1.6 GB	0
data.tar.gz	2010-05-21	310.0 MB	0
README-FIRST-sf	2010-05-20	6.0 kB	0
indices.tar.gz	2010-05-20	754.7 MB	0
allgroup.tar.gz	2010-05-20	3.2 GB	0
sqlite.tar.gz	2010-05-20	722.4 MB	0
Totals: 9 Items		6.6 GB	0

Full instructions are in the file doc/README, in doc.tar.gz .
NOTE: autoAnnotate.dbi calls wu-blastp. A version of wu-blastp is available at
http://www.advbiocomp.com/blast/obsolete/
However, versions from that site have not been tested with autoAnnotate.
Neither has autoAnnotate been tested with NCBI blastp.

QUICKSTART

To install JCVI autoAnnotate v2.4, May 17 2010, on Linux:

1. Install Perl modules from www.cpan.org.

To install DBD::File on Debian:

apt-get install lib-dbd-file-perl

To install DBD::File using an installed cpan:

perl -MCPAN -e shell

cpan

then

install DBD::File
install DBD::SQLite

etc.

If the install fails due to some tests failing, try e.g.,

force install DBD::File

If these methods fail, download the packages from CPAN, untar them,
find the .pm files inside the packages (probably inside a lib directory),
them move them into src/lib, maintaining their directory structure
(e.g., for DBD::File, File.pm goes into src/lib/DBD/File.pm).

These are the Perl modules to install:

Carp
CDB_File
Config::IniFiles
Cwd
DBD::File
DBD::SQLite
DBI
English
File::Basename
File::Which
FindBin
Getopt::Long
List::Util
Log::Log4perl
Pod::Usage
Storable
Text::CSV_XS
threads (optional)
Thread::Queue (optional)
Tree::Trie

These Perl modules are included in the src/lib directory, so you
shouldn't need to install them; but be aware that you may be using
a different version of them now:

Algorithm::Diff
Blast::BlastHitDataType
DBD::CSV
String::Diff

2. Download and install tmhmm 2.0.
(You can skip this step, but the automatic self-test in step 4 will fail.)

According to http://www.cbs.dtu.dk/services/TMHMM/:
"Would you prefer to run TMHMM at your own site? TMHMM 2.0 is available as a
stand-alone software package, with the same functionality as the service
above. Ready-to-ship packages exist for the most common UNIX platforms. There
is a download page for academic users; other users are requested to contact
CBS Software Package Manager at software@cbs.dtu.dk."

Manual: http://www.csc.fi/english/research/sciences/bioscience/programs/tmhmm/tmhmm_manual
Installation instructions: http://www.cbs.dtu.dk/services/doc/tmhmm-2.0c.readme

3. Download and install signalp 3.0 from http://www.cbs.dtu.dk/services/SignalP/.
(You can skip this step, but the automatic self-test in step 4 will fail.)

4. Download the software from https://sourceforge.net/projects/prokfunautoanno/.
Then

tar xvzf src.tar.gz
./unpack.bash

This will run autoAnnotate.dbi on the organism ann_test1, produce an
output file, and diff it with out/sqlite-ann_test1-100519sp.dat .
There should be no differences. See doc/README if there are.

The file src/anno-sqlite.bash can be used like this:

./anno-sqlite.bash ann_test1 100520 gram-

to run AutoAnnotate on the genome identified as "ann_test1" (in
the data/genomes directory, and in either data/db/SQLite
or data/db/CSV). The anno-sqlite.bash script calls
autoAnnotate.dbi with some useful arguments.
You can run it with default arguments like this:

perl autoAnnotate.dbi -D ann_test1

In that case, it will not run SignalP, and will guess whether the organism
is gram+ or gram- using an unvalidated method that looks for homology
to genes found primarily in gram+ or gram- bacteria.
If you have not installed tmhmm, you can run AutoAnnotate like this:

perl autoAnnotate.dbi -D ann_test1 -notmhmm

IMPROVEMENTS

5. To use uniprot instead of swissprot for loading annotations,
or to update to a new UniProt release,

cd .. (to the AutoAnnotate root directory)
mkdir data/uniprot

Go to ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/
and download uniprot_sprot.dat.gz and uniprot_trembl.dat.gz into data/uniprot.
Then

gunzip data/uniprot/*
cd ../src
perl makeUniprot -tr -spf ../data/uniprot/uniprot_sprot.dat -trf ../data/uniprot/uniprot_trembl.dat

This will produce a 5G file

data/db/SQLite/uniprot

containing some of the information from Uniprot (Swissprot + Trembl).

6. To run your own genomes, you'll probably want to figure out
some way to run hmmer2 and BLAST on a large compute grid. If you
run autoAnnotate without generating the hmmer and BLAST output files
(and set -makedata), it will call blast and hmmpfam itself
(which you will need to install first). This will take weeks or
months on a single machine. NOTE: autoAnnotate.dbi calls wu-blastp,
which is no longer available anywhere. It hasn't been tested with
NCBI blastp.

7. SourceForge has a January 2010 release of PANDA.
If we release a new version, you can download panda from JCVI
at ftp://ftp.jcvi.org/pub/data/panda (login as user 'anonymous'),
then gunzip and untar it into data/panda.

cd data
mkdir panda
cd panda
(get allgroupWithTables090810.tar.gz and indices-allgroup090810.tar.gz)
tar xvf all*.tar.gz
tar xvf ind*.tar.gz

8. Download bioname, if you would like to use it (via applyBioName.pl),
from https://sourceforge.net/projects/microbiomeutil/files/
into the src directory, then

tar xvf bioname-06072009.tar.gz

bioname attempts to clean up protein names.

9. Download and install coils (usable by specifying -coils on the
autoAnnotate command line) from
ftp://ftp.ebi.ac.uk/pub/software/unix/coils-2.2/.
Documentation at http://www.ch.embnet.org/software/coils/COILS_doc.html.

Once downloaded, compile coils with these commands:

tar xvf ncoils.tar.gz
cd coils
cc -O2 -I. -o ncoils ncoils.c read_matrix.c -lm
mv ncoils <somewhere on your executable path>
cd ..

Now note that you are not moving the top-level coils directory, but only the
directory containing the source code and (what matters to us) the data files:

mv coils <path>/anno/data/coils

Run coils in AutoAnnotate by adding the command-line parm -coils
when calling autoAnnotate.dbi .

10. The file makeDBs.pl can be used to update the data files.
Read that file for more information. Make a backup of the data/db/SQLite
directory before trying to use it.

Phil Goetz
pgoetz @ jcvi.org

Source: README-FIRST-sf, updated 2010-05-20

Prok Functional AutoAnnotate Files

Get an email when there's a new version of Prok Functional AutoAnnotate