This module performs six small tasks with the fasta file of the assembled contigs: (1) It parses the sequencing coverage calculated by the assembler from the fasta header lines; (2) It computes the percent GC value for each contig from the contig sequence; (3) It sorts the contigs by length; (4) It transcribes all sequence data to uppercase
Filters raw sequencing reads in a fastq or fasta file; (5) It transcribes all ambiguous bases to "N". Uppercase and N's are required by some external programs used by downstream modules; (6) It predicts open reading frames and computes coding density for each contig.
Minutes.
This module requires prodigal for predicting open reading frames to compute coding density.
Average read length (setAverageReadLength, int, 100): Some assemblers only provide the number of reads for each contig, rather than the coverage.
Number of reads regexp (setRegexpNumberOfReads, String, "(?:(?:numreads)|(?:read_count))[=_](\d+)"): The regular expression used to parse the number of reads for a contig from the fasta header file. The default usually works.
Coverage regexp (setRegexpCoverage, String, "co*vg*[a-z]*?[=_]([\d.]+)"): The regular expression used to parse the sequencing coverage for a contig from the fasta header file. The default usually works.
Set processors used (setProcessorsUsed, int, 4): The number of processors/cores/threads used for computations.
Set temp folder (setTempFolder, String, "/temp/metawatt"): Temp folder used for intermediate files.