Menu

Tree [144fca] master /
 History

HTTPS access


File Date Author Commit
 lib 2013-03-05 Florent Angly Florent Angly [43f967] Version 0.17
 script 2013-03-05 Florent Angly Florent Angly [e0fcae] Renamed script from 'GAAS' to 'gaas'
 t 2011-07-19 Florent Angly Florent Angly [e24a7c] Initial commit (GAAS version 0.16)
 utils 2011-07-19 Florent Angly Florent Angly [e24a7c] Initial commit (GAAS version 0.16)
 .gitignore 2013-03-05 Florent Angly Florent Angly [9b12a1] Added a .gitignore file
 Changes 2013-03-05 Florent Angly Florent Angly [43f967] Version 0.17
 License 2011-07-19 Florent Angly Florent Angly [e24a7c] Initial commit (GAAS version 0.16)
 Makefile.PL 2013-03-05 Florent Angly Florent Angly [43f967] Version 0.17
 README 2013-03-05 Florent Angly Florent Angly [144fca] README update
 Tutorial.txt 2011-07-19 Florent Angly Florent Angly [e24a7c] Initial commit (GAAS version 0.16)
 gaas-flowchart-small.gif 2011-07-19 Florent Angly Florent Angly [e24a7c] Initial commit (GAAS version 0.16)

Read Me

GAAS

GAAS (Genome Abundance and Average Size) performs BLAST similarities search of
metagenomic sequences against a database of complete genomes to estimate their
relative abundance and average size. Can be used for any sort of complete
sequences: genomes (viral, microbial, eukaryal), plasmids, genes, ... Results
can be visually represented as phylogenic trees, size spectra and abundance
piecharts.

This program provides a command-line interface only.


CITATION

If you use GAAS in your research, please cite:
Angly FE et al., The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes, PLoS Computational Biology 5, no. 12 (12, 2009): e1000593


INSTALLATION

1/ External dependencies:
  You need to install these dependencies first:
    * Perl (http://www.perl.com/download.csp)
    * NCBI BLAST v2 (http://www.ncbi.nlm.nih.gov/BLAST/download.shtml)

  At this point, you should be able to run BLAST by typing blastall in a terminal
  or command prompt. If this doesn't display the BLAST usage message, add the
  directory where you installed BLAST to your PATH environment variable. In
  Windows:
    Start menu > Control Panel > Performances & Maintenance > System > Advanced >
    Environment variables: Edit the PATH variable in the bottom window by adding
    the BLAST directory path, i.e. PATH = C:\Program Files\BLAST\bin;%PATH%

2/ Install GAAS:
  Case A/ If you downloaded the "standalone" GAAS version
    Skip to next step.

  Case B/ If you downloaded the regular GAAS version (CPAN module style)
    The following Perl modules are dependencies that are either provided in this 
    package or will be installed automatically for you:
      * Getopt::Declare >= 1.13
      * Math::Round
      * Math::Random::MT >= 1.16
      * SVG::Parser
      * SVG::Graph
      * SVG::TT::Graph >= 0.16
      * CSS::Tiny
      * Bio::Phylo >= 0.18
      * Statistics::Descriptive::Weighted >= 0.5
      * MLDBM
      * Win32::Symlink (if you use Windows)

    Note that installation of some modules will likely require the installation of
    a C compiler, which may not installed on your system if you use Windows. It
    should be done automatically for you, but if you encounter installation problems
    on Windows, try to get a compiler from here: http://www.bloodshed.net/dev/devcpp.html

    To install GAAS, run the following commands in a terminal or command prompt:
      On Linux, Unix, MacOS:
        perl Makefile.PL && make install
      On Windows:
        perl Makefile.PL && nmake install
 
    On Unix/Linux, if you do not have administrator rights and want to install the
    module locally into, say, ~/my/dir, try something along these lines:
      i/ Make sure that your CPAN configuration file ~/.cpan/CPAN/MyConfig.pm contains
         these entries:
          'makepl_arg' => q[INSTALL_BASE=~/my/dir],
          'mbuildpl_arg' => q[--install_base ~/my/dir],
         This will ensure that programs that you install through CPAN are installed
         locally. 
      ii/ Make sure that your PERL5LIB and PATH environment variables are up-to-date
          echo 'export PERL5LIB=${PERL5LIB}:~/Bin/perl/lib' >> ~/.bashrc
          echo 'export PATH=${PATH}:~/Bin/perl/bin' >> ~/.bashrc
          source ~/.bashrc
      iii/ Install GAAS:
          perl Makefile.PL INSTALL_BASE=/my/dir
          make install

3/ Download data files:
  Data files for the analysis of viral, Bacterial, Archaeal and Eukaryal communities
  and tree files for the Viral Proteomic Tree and Tree of Life can be downloaded
  from http://biome.sdsu.edu/gaas/data/


DOCUMENTATION

After installing GAAS, you can find out the program syntax by running the following
command in a terminal or a command prompt:

  Case A/ If you downloaded the "standalone" GAAS version:
    Navigate to where the GAAS file is located and type:
      perl GAAS --help

  Case B/ If you downloaded the regular GAAS version:
    Simply type:
      GAAS --help

The 'utils' folder included in the GAAS package contains a utility:

* merge_tabular_BLAST_results:
This tool takes tabular BLAST files generated by BLAST searches against
several databases and merges the results to produce a files that looks like the
results had been generated by comparison against a single database.


MEMORY USAGE

For large datasets, the amount of memory used by GAAS can grow quite large
because GAAS needs to keep in memory information like the name of the sequences,
their length, etc. If you are running out of memory when running GAAS, there
are some solutions:
1/ Use the save_mem (-sm) option:
  This will save some of the information that would otherwise reside in memory
  on your harddrive. Thus, the amount of memory used by GAAS should be very low,
  but the GAAS computation will be somewhat slower.
2/ Use less memory intensive options:
  Using a taxonomy file, normalizing by genome length or filtering similarities
  by relative sequence length all increase memory consumption. Try skipping them
  or using alternative options that don't use sequence length.
3/ Use smaller database files:
  You might save memory by using database files that contain a smaller number of
  sequences. For example, if you were using a large database such as NCBI nt,
  you may try a database that contains less but better curated sequences, such
  as NBI RefSeq.


USING ALTERNATIVE OR PARALLEL BLAST PROGRAMS

To use alternative BLAST programs that are not called 'blastall' (and 'formatdb')
or that need to be passed extra arguments (for example, the number of cluster
nodes to use, modify the file GAAS.pm. In this file, locate the following lines
and change them according to your needs, e.g.:

    'formatdb_prog'  => 'formatdb',            # path to formatdb program
    'formatdb_extra' => undef,                 # extra arguments for formatdb
    'blastall_prog'  => 'btbatchblast',        # path to blastall program
    'blastall_extra' => '--chunk 100',         # extra arguments for blastall


COPYRIGHT AND LICENCE

Copyright 2009,2010,2011 Florent ANGLY <florent.angly@gmail.com>

GAAS is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
GAAS is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with GAAS.  If not, see <http://www.gnu.org/licenses/>.


BUGS

All complex software has bugs lurking in it, and this program is no exception.
If you find a bug please email me at <florent.angly@gmail.com> so that I can
make GAAS better.
The GAAS source code is under Git revision control. Feel free to hack the code.
To get started, do a :
    git clone git://gaas.git.sourceforge.net/gitroot/gaas/gaas


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.