Menu

Validation

Anonymous Igor Dmitry Katsubo
  • A large validation set consisting of 5719 chemical structure images and associated MOL files is available for download. This set was produced from the US Patent Office Complex Work Units and contain one structure per image, ground truth MOL files and a simple Perl script to benchmark the results of your chemical structure recognition software. The benchmark script takes two arguments - first the folder with ground truth files ("molfiles") and second with your generated files - the filenames of individual structures should be identical. It will compare the structures based on standard InChI. This validation set was made possible courtesy of collaboration with Dr. Steve Boyer and Dr. John Kinney.
    This file has been updated courtesy of Aniko Valko and Keymodule Ltd., UK. The ground truth molfiles have been corrected and invalid images have been removed.
    Download zip archive here.

  • A subset of 450 images from the Japanese Patent Office Chem-Infty dataset containing only organic molecules can be downloaded here: images and ground truth.

    This subset is distributed by permission from the original Chem-Infty dataset authors Koji Nakagawa, Akio Fujiyoshi, and Masakazu Suzuki. This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 2.1 Japan License.

Recognition Results

















Set SizeOSRA 1.4.0 Imago 2.0OSRA 2.0.0
Image2Structure 100084.7%90.2%91.9%
CLEF-2012 865 89.5% 67.0% 96.5%
JPO 450 56.2% 40.4% 62.6%
USPTO5719 81.5%86.9%88.0%
Maybridge UoB 5740 74.0% 63.5% 86.4%

The recall results are shown (fraction of the original structure set returned correctly by the software). The identity match between the recognized structures and the originals was ascertained by standard InChI.


Related

Wiki: Home
Wiki: News

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.