Download Latest Version oscar3-a5.tar.gz (28.2 MB)
Email in envelope

Get an email when there's a new version of OSCAR

Home / opsin / 0.6.0
Name Modified Size InfoDownloads / Week
Parent folder
opsin-0.6.0-jar-with-dependencies.jar 2010-02-18 4.6 MB
ReleaseNotes.txt 2010-02-18 3.8 kB
README.txt 2010-02-18 4.6 kB
documentation-0.6.0.zip 2010-02-18 28.2 kB
opsin-0.6.0-src.zip 2010-02-18 480.4 kB
core-0.6.0.jar 2010-02-18 508.5 kB
inchi-0.6.0.jar 2010-02-18 5.8 kB
Totals: 7 Items   5.6 MB 0
OPSIN - Open Parser for Structural IUPAC Nomenclature
version 0.6.0 (see ReleaseNotes.txt for what's new in this version)

Daniel Lowe(Current maintainer), Dr. Peter Corbett and Prof. Peter Murray-Rust

Contact address: dl387@cam.ac.uk

This is a library for IUPAC name-to-structure conversion.
Currently it should be considered to be under development although the interface for using it will remain constant.
OPSIN was formerly a component of OSCAR3 but is now a wholly standalone library.

##################################################

The easiest way to use OPSIN is to use the standalone jar available from sourceforge.
java -jar opsin-0.6.0.jar will give you a command line interface to convert names to CML (Chemical Markup Language)
opsin-0.6.0.jar includes InChI and CML output and all dependendencies
core-0.6.0.jar does not include dependencies and InChI output support
inchi-0.6.0.jar does not include dependencies, depends on JNI-InChI and core, and adds InChI suppport.
The main classes are uk.ac.cam.ch.wwmm.opsin.NameToStructure for CML
and uk.ac.cam.ch.wwmm.opsin.NameToNameToInchi for InChI

To use OPSIN as a library within Java add opsin-0.6.0.jar to your classpath then:

1) Learn about XOM (http://xom.nu), the XML processing framework used
   by OPSIN
2) Create an OPSIN instance, by calling the following static method

NameToStructure nameToStructure = NameToStructure.getInstance();

3) Get CML (as XOM Elements):

Element cmlElement = nameToStructure.parseToCML("acetonitrile");

4) Whatever you like. Maybe print it out, thus:

System.out.println(cmlElement.toXML());

parseToCML will typically take 5-10ms to convert a name to CML making OPSIN suitable for use on a large number of names.

CML can, if desired, be converted to other format such as SD, SMILES, InChI etc. by toolkits such as CDK, OpenBabel and JUMBO.
(NOTE: if you want InChI the most efficient way to generate it is to use the InChI module and either the corresponding parseToInChI method)

##################################################

The workings of OPSIN are more fully described in:

Peter Corbett, Peter Murray-Rust High-throughput identification of
chemistry in life science texts. Proceedings of Computational Life
Sciences (CompLife) 2006, Cambridge, UK, pp. 107-118.

The following lists broadly summarise what OPSIN can currently do and what will be worked on in the future.

Supported nomenclature includes:
alkanes/alkenes/alkynes/heteroatom chains e.g. hexane, hex-1-ene, tetrasiloxane and their cyclic analogues e.g. cyclopropane
All IUPAC 1993 recommended rings
Trivial acids
Hantzsch-Widman e.g. 1,3-oxazole
Spiro systems (using Von baeyer brackets)
All von Baeyer rings e.g. bicyclo[2.2.2]octane
Hydro e.g. 2,3-dihydropyridine
Indicated hydrogen e.g. 1H-benzoimidazole
Heteroatom replacement
Specification of charge e.g. ium/ide
Multiplicative nomenclature e.g. ethylenediaminetetraacetic acid
Fused ring systems e.g. imidazo[4,5-d]pyridine. For a small number of fused ring systems numbering cannot be determined
Ring assemblies e.g. biphenyl
Most prefix and infix functional replacement nomenclature
The following functional classes: amide, anhydrides, esters, diesters, glycols, acids, azides, bromides, chlorides, cyanates, cyanides, fluorides, fulminates, hydrazones, hydroperoxides, imides,
iodides, isocyanates, isocyanides, isoselenocyanates, isothiocyanates, selenocyanates, thiocyanates, alcohols, selenols, thiols, ethers, ketones, oxides, oximes, peroxides, selenides, 
selenones, selenoxides, selones, selenoketones, selenosemicarbazone, semicarbazones, sulfides, sulfones, sulfoxides, tellurides, telluroketones, tellurosemicarbazones, tellurones, telluroxides, thioketones and thiosemicarbazones
Greek letters
Lambda convention
E/Z/R/S stereochemistry
Amino Acids and derivatives

Currently UNsupported nomenclature includes:
Other less common stereochemical terms
Carbohydrates
Steroids
Nucleic acids
Bridged rings 
Fused ring systems involving non 6-membered rings which are not in a "chain" cannot be numbered e.g. indeno[2,1-c]pyridine can be numbered, benzo[cd]indole cannot
Some conjunctive operations e.g. cyclohexaneethanol
The following functional classes: Hydrazides, lactones, sultams, lactams, sultims, lactims, acetals, hemiacetals and ketals

##################################################

Good Luck and let us know if you have problems, comments or suggestions!
You can contact us by posting a message on SourceForge or you can email me directly (dl387@cam.ac.uk)
Source: README.txt, updated 2010-02-18