README for the scripts and data associated with McComish et al.
"Multiple local maxima for likelihoods of phylogenetic trees
constructed from biological data."
1. OVERVIEW
There are four scripts, three data files, and a custom version of the
R package phangorn associated with the paper. These are described in
more detail below.
2. REQUIREMENTS
To use the scripts, a working installation of R is required, with the
provided version of phangorn and all its dependencies. For the script
nni1optima_fa.rscript, a production version of phangorn should be
used (version 1.7-1 is the version used with this script in the
paper). This can be obtained from cran.r-project.org.
3. FILE DESCRIPTIONS
3.1 Scripts
===========
3.1.1 optima_nex.rscript
------------------------
Given a nucleotide alignment in NEXUS format: generates n random tree
topologies; selects m sets of random edge lengths as starting values
for each topology; optimises edge lengths (but not topology); and
compares the resulting sets of edge lengths to detect distinct optima
on each topology.
Also calculates a neighbour-joining tree, optimises its topology to
find a maximum likelihood tree, and carries out the above procedure
to detect distinct optima on this ML tree.
Takes six command-line arguments:
The name of the input NEXUS alignment file.
The number of random topologies to check.
The number of random starting points for each topology.
The substitution model to be used. See the phangorn
documentation for a list of available models.
A filename for summary output.
A filename for detailed output.
3.1.2 optima_fa.rscript
-----------------------
As for optima_nex.rscript above, but takes a nucleotide alignment in
FASTA format as input.
3.1.3 optima_aa.rscript
-----------------------
As for optima_nex.rscript above, but takes an amino acid alignment in
FASTA format as input.
3.1.4 nni1optima_fa.rscript
---------------------------
Takes a nucleotide alignment in FASTA format as input. Calculates a
neighbour-joining tree, optimises its topology to find a maximum
likelihood tree, and calculates all trees one nearest-neighbour
interchange step from the ML tree. Then selects m sets of random edge
lengths as starting values for each topology; optimises edge lengths
(but not topology); and compares the resulting sets of edge lengths
to detect distinct optima on each topology.
Takes the same arguments as optima_nex.rscript above, omitting the
number of topologies to check.
3.2 Data files
==============
3.2.1 mammal_mt.tar.gz
----------------------
The 100 randomly chosen subsets of ten taxa from the alignment of
Lin et al. (2002) used in the paper, in gzipped tar archive format.
The original alignment can be found at
http://www.allanwilsoncentre.ac.nz/massey/fms/AWC/download/4_Laurasiatherian12.txt
3.2.2 hepB.tar.gz
-----------------
The random subsets of eight, nine, ten, twelve, and fifteen strains
from the data of Harrison et al. (2011) used in the paper, in gzipped
tar archive format.
3.2.3 prokaryote.tar.gz
-----------------------
The 100 random subsets of nine taxa taken from the data of
Puigbo et al. (2009) used in the paper, in gzipped tar archive
format.
The chloroplast alignments used in the paper are not provided here,
but can be found at
http://www.allanwilsoncentre.ac.nz/massey/fms/AWC/download/treeness_triangle_real_data.zip
3.3 Custom phangorn package
===========================
phangorn_1.6-1.tar.gz is the custom version of the R package phangorn
used in the paper with all scripts except nni1optima_fa.rscript.