The Bacteria and Archaea hierarchy model used by RDP Classifier has been updated to training set No. 18. The new version has over 800 new genera and 4000 new species added. Major rearrangements for Classifier training set No. 18 include the following:
The class Alphaproteobacteria has been rearranged based on genome data ( Hördt A, et al. Analysis of 1,000+ Type-Strain Genomes Substantially Improves Taxonomic Classification of Alphaproteobacteria. Front Microbiol. 2020;11:468. doi:10.3389/fmicb.2020.00468).
Part of the class Gammaproteobacteria has been rearranged based on genome data (Spring S, et al.. A taxonomic framework for emerging groups of ecologically important marine gammaproteobacteria based on the reconstruction of evolutionary relationships using genome-scale data. Front Microbiol. 2015;6:281. doi:10.3389/fmicb.2015.00281).
The former class Epsilonproteobacteria is moved to a new phylum and renamed as Campylobacterota ( Waite DW, et al. Comparative Genomic Analysis of the Class Epsilonproteobacteria and Proposed Reclassification to Epsilonbacteraeota (phyl. nov.) [published correction appears in Front Microbiol. 2018 Apr 18;9:772]. Front Microbiol. 2017;8:682. doi:10.3389/fmicb.2017.00682).
The phylum Bacteroidetes has been rearranged based on genome data (García-López M, et al. Analysis of 1,000 Type-Strain Genomes Improves Taxonomic Classification of Bacteroidetes. Front Microbiol. 2019;10:2083. doi:10.3389/fmicb.2019.02083).
The phylum Actinobacteria has been rearranged based on genome data (Nouioui I, et al. Genome-Based Taxonomic Classification of the Phylum Actinobacteria. Front Microbiol. 2018;9:2007. doi:10.3389/fmicb.2018.02007)
The phylum Tenericutes has been rearranged based on genome data (Gupta RS, et al. Phylogenetic framework for the phylum Tenericutes based on genome sequence data: proposal for the creation of a new order Mycoplasmoidales ord. nov., containing two new families Mycoplasmoidaceae fam. nov. and Metamycoplasmataceae fam. nov. harbouring Eperythrozoon, Ureaplasma and five novel genera [published correction appears in Antonie Van Leeuwenhoek. 2018 Dec;111(12):2485-2486]. Antonie Van Leeuwenhoek. 2018;111(9):1583-1630. doi:10.1007/s10482-018-1047-3)
Several species from genus Bacillus has been transferred into 6 novel genera based on comparative genomic analyses of Bacillus species (Patel S, Gupta RS. A phylogenomic and comparative genomic framework for resolving the polyphyly of the genus Bacillus: Proposal for six new genera of Bacillus species, Peribacillus gen. nov., Cytobacillus gen. nov., Mesobacillus gen. nov., Neobacillus gen. nov., Metabacillus gen. nov. and Alkalihalobacillus gen. nov. Int J Syst Evol Microbiol. 2020;70(1):406-438. doi:10.1099/ijsem.0.003775)
The genus Lactobacillus has been reclassified into 25 genera based on genome data ( Zheng J, et al. A taxonomic note on the genus Lactobacillus: Description of 23 novel genera, emended description of the genus Lactobacillus Beijerinck 1901, and union of Lactobacillaceae and Leuconostocaceae. Int J Syst Evol Microbiol. 2020;70(4):2782-2858. doi:10.1099/ijsem.0.004107).
In addition to the two files to train the RDP Classifier, new file formats are made available to accommodate the needs of users:
A new file trainset18_062020_speciesrank.fa has been added to the release in RDPClassifier_16S_trainsetNo18_rawtrainingdata. This file is NOT needed to train the classifier. In addition to sequences, it contains genus, species, strain, type status and taxonomy rank, which are useful for closest-species identification using third-party tools (e.g. BLAST).
Two new files in RDPClassifier_16S_trainsetNo18_QiimeFormat to retrain the RDP Classifier included in Qiime2.