"Taxamatch" is an algorithm designed for fuzzy matching of scientific names of taxa - genera alone, or binomials (genus+species) - in taxonomic databases. It utilises both character substitution (similar to Soundex) to catch phonetic errors, and a customised edit distance (ED) approach to catch non-phonetic ones, which can be up to 50% of all errors in real-world queries. Since ED-based queries are typically slow against large data sets, Taxamatch includes a range of optimisations to heavily reduce the number of names to be tested at query time without impacting on recall of likely intended correctly spelled target names, speeding up overall query time by a factor of between x100 and x1000.
For a more complete discussion of the algorithm, refer to this journal article published in 2014: "Taxamatch, an Algorithm for Near (‘Fuzzy’) Matching of Scientific Names in Taxonomic Databases". https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0107510
Taxamatch
Fuzzy name matching algorithm for scientific names of taxa (biology)
Brought to you by:
tonyrees
Downloads:
0 This Week