"Taxamatch" is an algorithm designed for fuzzy matching of scientific names of taxa - genera alone, or binomials (genus+species) - in taxonomic databases. It utilises both character substitution (similar to Soundex) to catch phonetic errors, and a customised edit distance (ED) approach to catch non-phonetic ones, which can be up to 50% of all errors in real-world queries. Since ED-based queries are typically slow against large data sets, Taxamatch includes a range of optimisations to heavily reduce the number of names to be tested at query time without impacting on recall of likely intended correctly spelled target names, speeding up overall query time by a factor of between x100 and x1000.

For a more complete discussion of the algorithm, refer to this journal article published in 2014: "Taxamatch, an Algorithm for Near (‘Fuzzy’) Matching of Scientific Names in Taxonomic Databases". https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0107510

Project Samples

Project Activity

See All Activity >

Follow Taxamatch

Taxamatch Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Taxamatch!

Additional Project Details

Registered

2020-12-21