OSRA (Optical Structure Recognition Application) is a utility designed to convert graphical representations of chemical structures and reactions, as they appear in journal articles, patent documents, textbooks, trade magazines etc., into SMILES or MOL files – a computer recognizable molecular structure format. OSRA can read a document in any of the over 90 graphical formats parseable by GraphicsMagick – including GIF, JPEG, PNG, TIFF, PDF, PS etc., and generate the SMILES or MOL representation of the molecular structure images encountered within that document, or RSMI/RXN for reactions.
Note that any software designed for optical recognition is unlikely to be perfect, and the output produced might, and probably will, contain errors, so curation by a human knowledgeable in chemical structures is highly recommended.
OSRA can process the following types of images:
You can download a free version of the source code or support OSRA development by purchasing binary installation executables for Windows, Linux, and OSX.
OSRA is Free and Open Source Software. You are welcome to download and use it, provided that you understand the terms described above. Participation in the development is highly encouraged! We also welcome your feedback – send us your comments, suggestions, criticism, or praise to the contact emails listed here.
To demonstrate the capabilities (and limitations) of OSRA we have created an OSRA Web Interface. Try this sample image from the US Patent Office website first.
Wiki: Batch_Processing_and_Filtering
Wiki: Compilation
Wiki: Contact_information
Wiki: Dependencies
Wiki: Download
Wiki: License
Wiki: Main_Page
Wiki: News
Wiki: Plugins
Wiki: Success stories
Wiki: Usage
Wiki: Validation