Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
NCBI_5936.c90.faa.gz | 2018-11-27 | 1.0 GB | |
NCBI_5872.c90.faa.gz | 2018-11-27 | 1.0 GB | |
Totals: 2 Items | 2.1 GB | 0 |
MetaPA 3.2.4 manual Description: MetaPA is a de Bruijn graph algorithm to obtain complete protein coding genes by assembling metagenomic and metatranscriptomic short reads. It employs a multiple-step strategy to process an assembly job: (1) predict ORFs from short nucleotide reads and translate them to protein segments; (2) construct de Bruijn graphs in the space of oligopeptides, where a node denotes a k-mer and an edge represents a (k+1)-mer to connect two overlapping k-mers; (3) simplify the graph and decompose it into sub-graphs (denoted by connected components), from which the longest/shortest paths are called to evaluate confidence of associated k-mers; (4) for each short read, evaluate ORF candidates according to summarized confidence score of k-mers; (5) repeat the steps of (1-4) using longer k-mers; (6) in the procedure of calling paths from sub-graphs, read sequences and paired-end information are utilized to make decisions when meeting forks or crosses. If not determined, reference protein sequences are adopted as templates to guide the procedure of path searching; (7) remove redundantly assembled proteins and false sequences predicted from intergenic regions. For more details see the MetaPA paper below. Jiemeng Liu*, Qichao Lian*, Yamao Chen, Ji Qi. (2019) Amino acid based de Bruijn graph algorithm for identifying complete coding genes from metagenomic and metatranscriptomic short reads. Nucleic Acids Research. MetaPA is written in C++ and Java and supports parallel computation with multiple threads. Table of Contents 1. Obtaining MetaPA 2. System requirement 3. Installation 4. Running MetaPA 1. Obtaining MetaPA The newest version of MetaPA is available at https://sourceforge.net/projects/metapa/files/ The compressed tar file includes source code, executable file (linux 64) and a sample dataset A reference database including 5936 prokaryote proteomes from NCBI, is available at https://sourceforge.net/projects/metapa/files/database/ You can also use your own reference proteins with FATSA format to guide assembly. 2. System requirement a) 64-bit operating system b) 50G bytes memory c) C++ compiler d) OpenMP module e) Java 1.8 or later 3. Installation To install MetaPA, please go to the metapa-code directory: $ cd src/metapa-code Build and install: $ make 4. Running MetaPA MetaPA accepts FASTQ files as its input, and outputs assembled protein and CDS consensus sequences with FASTA format. Statistic information of raw reads, called ORFs and assembled contigs are also provided. For assembling metagenomic data, the input/output options are listed below. For most of cases, the max length of k-mer as 18 a.a. is recommended. [-q1]: high-throughput NGS data sequenced based on DNA library from environmental samples, with FASTQ format [-q2]: the second end sequences of paired-end data, while the file from “-q1” contains the first end of reads [-g]: reference protein sequences with FASTA format [-o]: output directory Example: $ java -mx50g -jar ./MetaFinder.jar [options] -i reads_1.fq -p reads_2.fq -g ref_guide.fa -o outdir There is a sample data release with the source code of MetaPA. To run MetaPA on the dataset, just type: $ ./run_example.sh There are other useful options: [-k]: max length of kmer used for sequence assembly (12-24, default: 18) [-n]: times of iteration, or steps from k-min to k-max (default: 3) [-c]: contig coverage threshold to remove tips and bubbles (default: 4.0) [-f]: the minimum ratio of coverages to divide branches of cross contigs (default: 2.5) [-m]: max number of mutations in each read (default: 3) [-L]: minimum length of output sequences (amino acids, default: 70) [-p]: number of threads for calculation (default: 1) For assembling metatranscriptomic data, MetaPA provide an option [-RNA] to set max length of k-mer as 14 a.a. with threshold of coverage as 2. [-RNA]: metatranscriptomic mode (default: false, true/false) Example: $ java -mx50g -jar ./MetaFinder.jar [options] -i reads_1.fq -p reads_2.fq -g ref_guide.fa -o outdir -RNA true