OSRA (Optical Structure Recognition Application) is a utility designed to convert graphical representations of chemical structures and reactions, as they appear in journal articles, patent documents, textbooks, trade magazines etc., into SMILES or MOL files – a computer recognizable molecular structure format. OSRA can read a document in any of the over 90 graphical formats parseable by GraphicsMagick – including GIF, JPEG, PNG, TIFF, PDF, PS etc., and generate the SMILES or MOL representation of the molecular structure images encountered within that document, or RSMI/RXN for reactions.
Note that any software designed for optical recognition is unlikely to be perfect, and the output produced might, and probably will, contain errors, so curation by a human knowledgeable in chemical structures is highly recommended.
OSRA can process the following types of images:
Download (source & binary)
OSRA is Free and Open Source Software. You are welcome to download and use it, provided that you understand the terms described above. Participation in the development is highly encouraged! We also welcome your feedback – send us your comments, suggestions, criticism, or praise to the contact emails listed here.
Wiki: Success stories