Download Latest Version MetaPA_3.2.4.tar.gz (28.9 MB)
Email in envelope

Get an email when there's a new version of MetaPA

Home / database
Name Modified Size InfoDownloads / Week
Parent folder
NCBI_5936.c90.faa.gz 2018-11-27 1.0 GB
NCBI_5872.c90.faa.gz 2018-11-27 1.0 GB
Totals: 2 Items   2.1 GB 0
MetaPA 3.2.4 manual

Description: 
MetaPA is a de Bruijn graph algorithm to obtain complete protein coding genes by assembling metagenomic 
and metatranscriptomic short reads. It employs a multiple-step strategy to process an assembly job: 
(1) predict ORFs from short nucleotide reads and translate them to protein segments; (2) construct de 
Bruijn graphs in the space of oligopeptides, where a node denotes a k-mer and an edge represents a 
(k+1)-mer to connect two overlapping k-mers; (3) simplify the graph and decompose it into sub-graphs 
(denoted by connected components), from which the longest/shortest paths are called to evaluate confidence 
of associated k-mers; (4) for each short read, evaluate ORF candidates according to summarized confidence 
score of k-mers; (5) repeat the steps of (1-4) using longer k-mers; (6) in the procedure of calling paths 
from sub-graphs, read sequences and paired-end information are utilized to make decisions when meeting 
forks or crosses. If not determined, reference protein sequences are adopted as templates to guide the 
procedure of path searching; (7) remove redundantly assembled proteins and false sequences predicted from 
intergenic regions.

For more details see the MetaPA paper below.
Jiemeng Liu*, Qichao Lian*, Yamao Chen, Ji Qi. (2019) Amino acid based de Bruijn graph algorithm for 
identifying complete coding genes from metagenomic and metatranscriptomic short reads. Nucleic Acids Research.

MetaPA is written in C++ and Java and supports parallel computation with multiple threads.

Table of Contents
1. Obtaining MetaPA
2. System requirement
3. Installation
4. Running MetaPA

1. Obtaining MetaPA
The newest version of MetaPA is available at
https://sourceforge.net/projects/metapa/files/

The compressed tar file includes source code, executable file (linux 64) and a sample dataset

A reference database including 5936 prokaryote proteomes from NCBI, is available at
https://sourceforge.net/projects/metapa/files/database/

You can also use your own reference proteins with FATSA format to guide assembly.


2. System requirement
	a) 64-bit operating system
	b) 50G bytes memory
	c) C++ compiler
	d) OpenMP module
	e) Java 1.8 or later


3. Installation
To install MetaPA, please go to the metapa-code directory:
	$ cd src/metapa-code

Build and install:
	$ make


4. Running MetaPA
MetaPA accepts FASTQ files as its input, and outputs assembled protein and CDS consensus sequences with 
FASTA format. Statistic information of raw reads, called ORFs and assembled contigs are also provided.

For assembling metagenomic data, the input/output options are listed below. For most of cases, the max 
length of k-mer as 18 a.a. is recommended.

	[-q1]: high-throughput NGS data sequenced based on DNA library from environmental samples, 
		   with FASTQ format
	[-q2]: the second end sequences of paired-end data, while the file from “-q1” contains the 
		   first end of reads
	[-g]:  reference protein sequences with FASTA format
	[-o]:  output directory

Example:
	$ java -mx50g -jar ./MetaFinder.jar [options] -i reads_1.fq -p reads_2.fq -g ref_guide.fa -o outdir

There is a sample data release with the source code of MetaPA. To run MetaPA on the dataset, just type:
	$ ./run_example.sh
	
There are other useful options:
	[-k]: max length of kmer used for sequence assembly (12-24, default: 18)
	[-n]: times of iteration, or steps from k-min to k-max (default: 3)
	[-c]: contig coverage threshold to remove tips and bubbles (default: 4.0)
	[-f]: the minimum ratio of coverages to divide branches of cross contigs (default: 2.5)
	[-m]: max number of mutations in each read (default: 3)
	[-L]: minimum length of output sequences (amino acids, default: 70)
	[-p]: number of threads for calculation (default: 1)
	

For assembling metatranscriptomic data, MetaPA provide an option [-RNA] to set max length of k-mer as 14 a.a.
with threshold of coverage as 2.
	[-RNA]: metatranscriptomic mode (default: false, true/false)
	
Example:
	$ java -mx50g -jar ./MetaFinder.jar [options] -i reads_1.fq -p reads_2.fq -g ref_guide.fa -o outdir -RNA true







Source: README.txt, updated 2020-01-10