The metagenomic paradigm offers the opportunity to study protein families, and therefore the metabolic and functional potential, of the constituent microbes in a community. A nucleotide assembly-based strategy does not fare much better since metagenomic assemblies are typically very fragmented and also leave a large fraction of reads unassembled. We present a method for reconstructing complete protein sequences directly from NGS metagenomic data. Our framework is based on a novel Short Peptide Assembler (SPA) that assembles protein sequences from their constituent peptide fragments identified on short reads. We also present a new implementation of SPA based on suffix array (SFA-SPA) which runs significantly faster than SPA.
Youngik Yang, Cuncong Zhong, and Shibu Yooseph*
J. Craig Venter Institute, San Diego, CA
* Corresponding author
SPA is available in binary on 64 bit Linux OS.
SFA-SPA is available in binary and source for 64 bit Linux OS.
Be the first to post a review of SPA!