TaxOnTree - Including taxonomic information on your phylogenetic tree
Introduction:
TaxOnTree is a bioinformatic tool that generates phylogenetic trees and
adds taxonomic information to it. With TaxOnTree, you can easily access
the taxonomic information of each sequence in the tree and also
the taxonomic distant between species comprising your tree.
Prerequisites:
Unix Platform;
PERL;
Internet connection;
FigTree (tree.bio.ed.ac.uk/software/figtree/);
Third-party softwares for phylogenetic pipeline.
For impatients:
> tar -zxvf TaxOnTree_vXXX_XXX.tgz
> cd taxontree
> ./taxontree
For detailed description of TaxOnTree parameters:
> ./taxontree -man
Basic usages:
> ./taxontree -singleID <sequence_ID>
> ./taxontree -seqFile <FASTA_file>
> ./taxontree -listFile <list_file>
> ./taxontree -treeFile <tree_file> -queryID <sequence_ID>
Other parameters:
[-db database_name] [-maxTarget int_value] [-evalue] [-threshold]
[-out file_name] [-queryID query_id] [-queryTax tax_id] [-txidMap
tax_id] [-aligner] [-showIsoform] [-noTrimAl] [-position]
[-delimiter] [-treeTable table_file] [-printLeaves] [-noMidPoint]
Installation:
TaxOnTree is ready to use in most of Unix Platform. But it only
works if the folders lib/ and bin/ that follow this script are in the
same location. If you want to freely run TaxOnTree in other location,
add the TaxOnTree folder into the environment variable by using, for
example, the following commands:
> echo "export PATH=$PATH:/path/to/program/taxontree/" >> ~/.bash_profile
> source ~/.bash_profile
Dependencies:
(a) PERL:
Most of UNIX platforms have PERL installed. TaxOnTree was developed and
tested in PERL 5.10.1 and 5.16.3. See the PERL website if you need to
install or update PERL in your PC.
(b) Internet connection:
TaxOnTree uses NCBI and Uniprot APIs to request a BLAST search, retrieve
sequences and taxonomic information of the sequences that will comprise
the phylogenetic tree, so make sure that you run it while connected.
############################# Important! #############################
TaxOnTree was developed considering the NCBI guideline in minimizing the
number of HTTP requests to not overload its server. But we ask users to
use TaxOnTree prudently when running in batch specially when your job
requests a BLAST search on the web server.
For more information about NCBI server usage guidelines and policies see:
www.ncbi.nlm.nih.gov/books/NBK25497/#chapter2.Usage_Guidelines_and_Requiremen
Consider also seeing NCBI Copyrigth and disclaimers at:
www.ncbi.nlm.nih.gov/About/disclaimer.html
########################################################################
For large number of jobs, consider running BLAST locally by installing
the StandAlone BLAST and a BLAST-formatted database in your machine (See
below). Consider also in running TaxOnTree offline to speed-up your job
analysis (See the section "Running TaxOnTree offline").
(c) FigTree
FigTree is a software for tree visualization developed in Java by Andrew
Rambaut research group. TaxOnTree uses FigTree's resources to display
the taxonomic information embedded in the phylogenetic tree. You can
download FigTree in the following link:
http://tree.bio.ed.ac.uk/software/figtree/
(d) Third-party softwares for phylogenetic pipeline:
TaxOnTree can work as a phylognetic pipeline, but for this, you
have to install in your machine some third-party softwares for:
- Putative ortholog search:
- StandAlone BLAST;
- Sequence alignment:
- MUSCLE;
- PRANK;
- ClustalO;
- Kalign;
- Alignment refinement:
- Trimal;
- Tree reconstruction:
- FastTree;
If you want to use the pipeline, please consider installing in
your machine at least one software of each step listed above.
Command lines used by TaxOnTree when running those softwares are described
in CONFIG.xml file. You can modify or include some parameters of the
softwares or even add a new software to the pipeline (See CONFIG.xml).
If you do not have permission to install softwares with your user account,
you can compile them and set the path to the software in CONFIG.xml
(See CONFIG.xml).
Except for BLAST+, binaries of third-party softwares are in the folder bin/
that accompany this package. There are cases in which these binaries don't
execute because of the compiler incompatibility. In this case, we recommend
you to remove the binaries in bin/ folder and replace them for binaries generated
by the compiler installed in your machine.
See below for more details in installing the third-party softwares.
(d1) StandAlone BLAST+
StandAlone BLAST+ programs are hosted in:
ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
For it installation, follow the manual on:
http://www.ncbi.nlm.nih.gov/books/NBK52640/
After installing the standAlone-BLAST in your machine, make a BLAST-
formatted database by downloading the protein sequences that will
comprise your database in FASTA format by using "makeblastdb".
Before start formatting your BLAST database make sure to:
1) use only protein sequences from GenBank or Uniprot database
preserving their FASTA-header pattern;
2) add the options -parse_seqids and -hash_index in the command-
line;
3) not use "refseq" or "nr" for the generated database name.
Your command line should look like this:
> makeblastdb -in <fasta_file> -dbtype prot -out <database_name> -parse_seqids -hash_index
To run TaxOnTree using a local formatted BLAST database just set the
parameter -db with the name of your database. If your database is not
in the current folder, just set the path and the name of your database.
Example:
> ./taxontree -singleID 123456789 -db <database_name>
or
> ./taxontree -singleID 123456789 -db /path/to/your/database/database_name
In our SourceForge page (http://sourceforge.net/projects/taxontree/),
pre-formatted Blast databases (same used in our web tool) are available.
You can download, extract and use it on TaxOnTree command.
(d3) Other third-party softwares:
Here we list the links where you can get the source code of the third-party
softwares used in TaxOnTree. If some binary in bin/ folder is not working,
please download the source and follow the insctuction of each software for
its installation:
- Softwares for sequence alignment:
1. MUSCLE (Edgar, 2004) http://www.drive5.com/muscle/downloads.htm
2. PRANK (Loytynoja and Goldman, 2005) http://wasabiapp.org/download/prank/
3. Clustal Omega (Sievers et al., 2011) http://www.clustal.org/omega/
4. Kalign (Lassmann et al., 2009) http://msa.sbc.su.se/cgi-bin/msa.cgi
- Software for alignment trimming:
1. trimAl (Capella-Gutierrez et al., 2009) http://trimal.cgenomics.org/downloads
- Software for phylogenetic tree inference:
1. FastTree (Price et al., 2010) http://meta.microbesonline.org/fasttree/
Running TaxOnTree Offline:
Running TaxOnTree without internet connection requires:
- StandAlone BLAST+ (See item d1 from section Dependencies);
- BLAST-formatted sequence database (See item d2 from section Dependencies);
- MySQL database containing the taxonomy and gene data of each protein accession (See below).
The Dump file of the MySQL database used by TaxOnTree is available at our Sourceforge
page. Just download it and upload it in your MySQL database using the following command line:
> mysql -u <username> -p < taxontree.sql
or in the MySQL environment:
mysql> source taxontree.sql
To configure TaxOnTree to use the local MySQL database, set the MySQL account user name
and password in CONFIG.xml file (See CONFIG.xml). Then, in the TaxOnTree command
line, add the paramenter -mysql. Example:
> ./taxontree -querysingleid 123456789 -db <database_name> -mysql
Viewing the tree:
After running TaxOnTree, it generates a file with ".nex" extension.
Open this file with FigTree. For more instructions about the tree
visualization see the PDF file "TaxOnTree_figtreeInstruction.pdf".
Other output files:
*_blast.txt - blast result;
*_all_seq.fasta - all sequences for analysis in Fasta format;
*_seq_aligned.fasta - all sequences in Fasta format after alignment;
*_seq_aligned_TrimAl.fasta - aligned sequences in Fasta format after running TrimAl;
*_seq_FastTree.tree - tree in newick format generated by FastTree;
*_seq_tree.nex - tree in Nexus format;
*_taxRankTable.txt - taxonomy rank report of your tree;
Other scripts/docs included in this package:
nexus2SVG.pl: a script that takes a tree generated by TaxOnTree (Nexus file)
to generate a graphic tree in SVG format. It uses FigTree's
graphical resources. Run the following command for details on
its usage:
> perl nexus2SVG.pl -man
TaxOnTree_Manual.pdf: it contains a brief explanation on some TaxOnTree features
and instructions on how to visualize and manipulate a tree generated
by TaxOnTree on FigTree software.
Contact:
If you have suggestion or question, feel free to contact us by these
email addresses:
tetsufmbio@gmail.com (Tetsu Sakamoto)
miguel@icb.ufmg.br (J. Miguel Ortega)