Menu

Home

Yao Haobin

Taxonomic annotation is a critical first step for analysis of metagenomic data. Despite a lot of tools being developed, the performance (both running time and accuracy) is still not satisfactory, in particular, when a close species-level reference does not exist in the database. In this paper, we propose a novel annotation tool, MetaAnnotator, to annotate metagenomics reads which outperforms all existing tools significantly when only genus-level references exist in the database. In particular, from our experiments, MetaAnnotator can assign 87.5% reads correctly (67.5% reads are assigned to the exact genus) with only 8.5% reads wrongly assigned. The best existing tool (MetaCluster-TA) can only achieve 73.4% correct read assignment (with only 50.9% reads assigned to the exact genus and 22.6% reads wrongly assigned). The speed of MetaAnnotator is also the second faster (1 hour for 20 million reads). The core concepts behind MetaAnnotator includes (i) we only consider exact k-mer matches in coding regions of the references as they should be more significant and ac-curate; and (ii) to assign reads to taxonomy nodes, we construct genome and taxonomy specific probabilistic models from reference database; and (iii) with BWT data structure to speed up the k-mer matching.

The software is implemented with c++. Currently it supports annotation of Bacteria genome. For usage, please refer to "readme" file in released version 1.0.

Project Members:


MongoDB Logo MongoDB