Download Latest Version README-FIRST-sf (3.3 kB)
Email in envelope

Get an email when there's a new version of Prok Functional AutoAnnotate

Home / 2.4
Name Modified Size InfoDownloads / Week
Parent folder
doc.tar.gz 2010-05-21 113.3 kB
md5s 2010-05-21 433 Bytes
src.tar.gz 2010-05-21 930.0 kB
uniprot.tar.gz 2010-05-21 1.6 GB
data.tar.gz 2010-05-21 310.0 MB
README-FIRST-sf 2010-05-20 6.0 kB
indices.tar.gz 2010-05-20 754.7 MB
allgroup.tar.gz 2010-05-20 3.2 GB
sqlite.tar.gz 2010-05-20 722.4 MB
Totals: 9 Items   6.6 GB 0
Full instructions are in the file doc/README, in doc.tar.gz .
NOTE: autoAnnotate.dbi calls wu-blastp.  A version of wu-blastp is available at
http://www.advbiocomp.com/blast/obsolete/
However, versions from that site have not been tested with autoAnnotate.
Neither has autoAnnotate been tested with NCBI blastp.



QUICKSTART

To install JCVI autoAnnotate v2.4, May 17 2010, on Linux:


1. Install Perl modules from www.cpan.org.

To install DBD::File on Debian:

	apt-get install lib-dbd-file-perl

To install DBD::File using an installed cpan:

	perl -MCPAN -e shell

or

	cpan

then

	install DBD::File
	install DBD::SQLite

etc.

If the install fails due to some tests failing, try e.g.,

   force install DBD::File

If these methods fail, download the packages from CPAN, untar them,
find the .pm files inside the packages (probably inside a lib directory),
them move them into src/lib, maintaining their directory structure
(e.g., for DBD::File, File.pm goes into src/lib/DBD/File.pm).

These are the Perl modules to install:

	Carp
	CDB_File
	Config::IniFiles
	Cwd
	DBD::File
	DBD::SQLite
	DBI
	English
	File::Basename
	File::Which
	FindBin
	Getopt::Long
	List::Util
	Log::Log4perl
	Pod::Usage
	Storable
	Text::CSV_XS
	threads       (optional)
	Thread::Queue (optional)
	Tree::Trie

These Perl modules are included in the src/lib directory, so you
shouldn't need to install them; but be aware that you may be using
a different version of them now:

	Algorithm::Diff
	Blast::BlastHitDataType
	DBD::CSV
	String::Diff


2. Download and install tmhmm 2.0.
(You can skip this step, but the automatic self-test in step 4 will fail.)

According to http://www.cbs.dtu.dk/services/TMHMM/:
"Would you prefer to run TMHMM at your own site? TMHMM 2.0 is available as a
stand-alone software package, with the same functionality as the service
above. Ready-to-ship packages exist for the most common UNIX platforms. There
is a download page for academic users; other users are requested to contact
CBS Software Package Manager at software@cbs.dtu.dk."

Manual: http://www.csc.fi/english/research/sciences/bioscience/programs/tmhmm/tmhmm_manual
Installation instructions: http://www.cbs.dtu.dk/services/doc/tmhmm-2.0c.readme


3. Download and install signalp 3.0 from http://www.cbs.dtu.dk/services/SignalP/.
(You can skip this step, but the automatic self-test in step 4 will fail.)


4. Download the software from https://sourceforge.net/projects/prokfunautoanno/.
Then

   tar xvzf src.tar.gz
	./unpack.bash

This will run autoAnnotate.dbi on the organism ann_test1, produce an
output file, and diff it with out/sqlite-ann_test1-100519sp.dat .
There should be no differences.  See doc/README if there are.

The file src/anno-sqlite.bash can be used like this:

   ./anno-sqlite.bash ann_test1 100520 gram-

to run AutoAnnotate on the genome identified as "ann_test1" (in
the data/genomes directory, and in either data/db/SQLite
or data/db/CSV).  The anno-sqlite.bash script calls
autoAnnotate.dbi with some useful arguments.
You can run it with default arguments like this:

   perl autoAnnotate.dbi -D ann_test1

In that case, it will not run SignalP, and will guess whether the organism
is gram+ or gram- using an unvalidated method that looks for homology
to genes found primarily in gram+ or gram- bacteria.
If you have not installed tmhmm, you can run AutoAnnotate like this:

   perl autoAnnotate.dbi -D ann_test1 -notmhmm


IMPROVEMENTS


5. To use uniprot instead of swissprot for loading annotations,
or to update to a new UniProt release,

	cd ..   (to the AutoAnnotate root directory)
   mkdir data/uniprot

Go to ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/
and download uniprot_sprot.dat.gz and uniprot_trembl.dat.gz into data/uniprot.
Then

	gunzip data/uniprot/*
	cd ../src
   perl makeUniprot -tr -spf ../data/uniprot/uniprot_sprot.dat -trf ../data/uniprot/uniprot_trembl.dat

This will produce a 5G file

	data/db/SQLite/uniprot

containing some of the information from Uniprot (Swissprot + Trembl).


6. To run your own genomes, you'll probably want to figure out
some way to run hmmer2 and BLAST on a large compute grid.  If you
run autoAnnotate without generating the hmmer and BLAST output files
(and set -makedata), it will call blast and hmmpfam itself
(which you will need to install first).  This will take weeks or
months on a single machine.  NOTE: autoAnnotate.dbi calls wu-blastp,
which is no longer available anywhere.  It hasn't been tested with
NCBI blastp.


7. SourceForge has a January 2010 release of PANDA.
If we release a new version, you can download panda from JCVI
at ftp://ftp.jcvi.org/pub/data/panda (login as user 'anonymous'),
then gunzip and untar it into data/panda.

	cd data
	mkdir panda
	cd panda
	(get allgroupWithTables090810.tar.gz and indices-allgroup090810.tar.gz)
	tar xvf all*.tar.gz
	tar xvf ind*.tar.gz


8. Download bioname, if you would like to use it (via applyBioName.pl),
from https://sourceforge.net/projects/microbiomeutil/files/
into the src directory, then

	tar xvf bioname-06072009.tar.gz

bioname attempts to clean up protein names.


9. Download and install coils (usable by specifying -coils on the
autoAnnotate command line) from
ftp://ftp.ebi.ac.uk/pub/software/unix/coils-2.2/.
Documentation at http://www.ch.embnet.org/software/coils/COILS_doc.html.

Once downloaded, compile coils with these commands:

	tar xvf ncoils.tar.gz
	cd coils
	cc -O2 -I. -o ncoils ncoils.c read_matrix.c -lm
	mv ncoils <somewhere on your executable path>
	cd ..

Now note that you are not moving the top-level coils directory, but only the
directory containing the source code and (what matters to us) the data files:

	mv coils <path>/anno/data/coils

Run coils in AutoAnnotate by adding the command-line parm -coils
when calling autoAnnotate.dbi .


10. The file makeDBs.pl can be used to update the data files.
Read that file for more information.  Make a backup of the data/db/SQLite
directory before trying to use it.


Phil Goetz
pgoetz @ jcvi.org
Source: README-FIRST-sf, updated 2010-05-20