Prokka is a command-line software tool for rapid annotation of prokaryotic genomes (bacteria and archaea). Given a FASTA file of contigs, it predicts genes, rRNAs, tRNAs, and other functional elements, then assigns functions by comparing to reference protein databases and HMM profiles. It outputs GenBank, GFF, and other formats compatible with downstream tools and genome browsers. Prokka handles common complications—overlapping ORFs, frameshifts, alternate start codons—while providing customizable databases so researchers can bias domain or strain-specific annotations. The pipeline is optimized for speed, using multithreading and caching to annotate large microbial genomes in minutes. Because it standardizes names, locus tags, and qualifiers, Prokka is often used as a baseline for comparative microbial genomics, pangenome studies, and functional profiling. Its modular design lets users plug in custom gene callers or databases, making it flexible for diverse research contexts.
Features
- Rapid whole-genome annotation using multiple annotation tools
- Produces standardized output formats (GenBank, EMBL, GFF)
- Designed to process draft assemblies quickly (~10 minutes on typical desktop)
- Coordinates existing bioinformatics tools (e.g., Prodigal for ORF detection)
- Installable via Bioconda for easy setup in bioinformatics environments
- Outputs ready for further downstream analysis or genome browser display