The metagenomic paradigm offers the opportunity to study protein families, and therefore the metabolic and functional potential, of the constituent microbes in a community. A nucleotide assembly-based strategy does not fare much better since metagenomic assemblies are typically very fragmented and also leave a large fraction of reads unassembled. We present a method for reconstructing complete protein sequences directly from NGS metagenomic data. Our framework is based on a novel Short Peptide Assembler (SPA) that assembles protein sequences from their constituent peptide fragments identified on short reads. We also present a new implementation of SPA based on suffix array (SFA-SPA) which runs significantly faster than SPA.

Youngik Yang, Cuncong Zhong, and Shibu Yooseph*
J. Craig Venter Institute, San Diego, CA
{yyang,czhong,syooseph}@jcvi.org
* Corresponding author

SPA is available in binary on 64 bit Linux OS.
SFA-SPA is available in binary and source for 64 bit Linux OS.

Project Activity

See All Activity >

Categories

Bio-Informatics

License

GNU General Public License version 3.0 (GPLv3)

Follow SPA

SPA Web Site

Other Useful Business Software
AI-powered service management for IT and enterprise teams Icon
AI-powered service management for IT and enterprise teams

Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
Try it Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of SPA!

Additional Project Details

Operating Systems

BSD, Linux

Intended Audience

Science/Research

Programming Language

C++, Perl

Related Categories

Perl Bio-Informatics Software, C++ Bio-Informatics Software

Registered

2012-10-01