LMAT Wiki

Efficient taxonomic labeling of very large metagenomic datasets.

Brought to you by: bticktock, dhysom42, lloyd23, shea0, ska777

Home

UPDATE: Please visit https://github.com/LivGen/LMAT for updated code. Code on Sourceforge site may not compile on newer Linux systems.

Welcome to the Livermore Metagenomics Analysis Toolkit (LMAT) wiki on SourceForge. This wiki was set up to share ongoing developments of the software and support community development interest for the software as an open source resource (GNU GPL) freely available to anyone working on problems in metagenomics. LMAT is a collection of software tools primarily written in C++, which are designed to efficiently analyze very large shotgun metagenomic datasets for taxonomic and gene function content. The primary initial innovation is to apply pre-computed genome index files tagged with taxonomy data, which are stored in a memory mapped file for fast read-only lookups. The result is the capability to search a large genome database of virus, bacteria, archaea, protozoa, fungal and human (and potentially others) and rapidly determine the contents of very large datasets (e.g. many tens to hundreds of gigabases or more in size). The unique feature of the approach is to rely on commodity hardware that supports a fast interconnect between the computer's CPU, DRAM and local storage and make extensive use of multi-core processing. The approach operates in contrast to traditional clusters where analysis is distributed to multiple nodes across a network. Thus, the model presented here is to maintain an analysis capability that can be co-located with the sequencer. LMAT offers the most complete microbial database publicly available (to our knowledge) for metagenomic analysis. The database includes the complete and assembled draft genomes for viruses, bacteria, archaea, fungi and protozoa, human reference assemblies and an extensive collection of genetic data from the 1000 genomes project.

Work applying three primary configurations is under way:

Use of single large (e.g. 512GB-1TB) DRAM multi-core node, which can process large amounts of data extremely quickly.
Reduced size databases, which support search tools that give a quick summary of sample contents and have smaller memory requirements (< 64 GB).
Use of flash drives (NVRAM) as a proxy for DRAM to further support use of low cost commodity hardware alternatives (and could support alternative high memory low cost cluster based tools).

[What's New]
[Example LMAT Run]

LMAT Wiki

Efficient taxonomic labeling of very large metagenomic datasets.

Home

Related