Menu

News

Anonymous Igor
  • OSRA 2.1.3
    Fixed coordinate box calculation and aromatic bond detection.
  • OSRA 2.1.1
    Updated poppler to 0.7.3 and openbabel to 3.0.0. Improved PDF processing and reaction recognition.
  • OSRA 2.1.0 is out.
    Significant improvements in PDF document recognition
    No more dependency on Ghostscript at runtime
    Recognition of simple polymer structures (need to use MDL MOL or SD format output and
    patched version of OpenBabel)
  • OSRA 2.0.1 is released.
    Improved precision. The false positives have been reduced by a factor of 5 on CLEF2012 segmentation set.
  • OSRA 2.0.0
    ** Significantly improved recognition rates. [Validation]
    Added recognition of Iodine, wavy bonds, etc.
    Completely modified confidence function (values not compatible with the earlier versions).
    Updated table detection and removal routines.
    Created binary package for Linux (statically linked, should work on all modern Linux systems).
    ** Windows and OSX versions now support multi-threading processing of PDF files.
  • An updated USPTO validation test set is available courtesy of Aniko Valko and Keymodule Ltd., UK. The ground truth molfiles have been corrected and invalid images have been removed.
  • A subset of 450 images from the Japanese Patent Office http://www.iapr-tc11.org/mediawiki/index.php/Chem-Infty_Dataset:_A_ground-truthed_dataset_of_Chemical_Structure_Images Chem-Infty dataset containing only organic molecules can be downloaded from the [Validation] page.
  • You can find my short paper (not peer-reviewed) written for Document Analysis Systems workshop (June 9th-11th, Boston, MA) here.
  • A large validation set consisting of 5735 chemical structure images and associated MOL files is now available for download. This set was produced from the US Patent Office Complex Work Units and contain one structure per image, ground truth MOL files and a simple CACTVS script to benchmark the results of your chemical structure recognition software. The benchmark script takes two arguments – first the folder with ground truth files ("molfiles") and second with your generated files – the filenames of individual structures should be identical. It will compare the structures based on standard InChI. Download zip archive here. This validation set was made possible courtesy of collaboration with Dr. Steve Boyer and Dr. John Kinney.
  • Igor Filippov has presented the new algorithm used by OSRA for text/graphics separation at GREC 2009. It is the first paper in session 4, you can find it here in "Proceedings".
  • OSRA manuscript has been published: "Optical Structure Recognition Software To Recover Chemical Information: OSRA, An Open Source Solution" J. Chem. Inf. Model., 2009, 49 (3), pp 740-743.

Related

Wiki: Home
Wiki: Validation

MongoDB Logo MongoDB