This module creates a blast database for blastp searches for taxonomic classification of contigs. First you need to prepare a taxonomy file with the module [Make taxonomy file].
For certain genera, like E. coli, many reference genomes are available. However, in most cases you only need a single one for classification. Therefore, the module filters the reference genome sequences at a specified taxonomic rank. For example, if you specify "Genus" as the unique, only a single reference genome is added to the database for each unique genus detected in the taxonomy file.
The module creates a aminoacid fasta file with all the open reading frames (ORFs) of the filtered reference whole genome sequences. The ORFs are parsed from the genbank files. If only a nucleotide fasta file is provided, the module uses Prodigal in metagenome mode to predict ORFs from the nucleotide fasta file provided. The ORFs are saved as an aminoacid fasta file (.faa) in the reference organism folder and added to the database. You could also manually provide ORF sequences in a fasta file that should have the extension ".faa"
This module is not part of a project and can be run via the "Database" menu.
Minutes.
NCBI Blast and Prodigal need to be installed to run this module.
Reference genome folder (File, none or previous selection): The folder that contains the reference genomes.
Taxonomy file (File, none or previous selection): The taxonomy file used.
Taxonomy unique rank (Enumeration, Genus): Only a single reference genome will be added to the database at the specified rank.
Blastp database (File, none or previous selection): The blastp database file to be created.
None except from the blastp database file specified.
Wiki: Classify by blastp
Wiki: Make taxonomy file
Wiki: Pipeline modules