OPSIN - Open Parser for Structural IUPAC Nomenclature
version 0.5.2 (see ReleaseNotes.txt for what's new in this version)
Daniel Lowe(Current maintainer), Dr. Peter Corbett and Prof. Peter Murray-Rust
Contact address: dl387@cam.ac.uk
Factored out of OSCAR3 (again) by Daniel Lowe. Thanks to Richard Apodaca for doing this previously.
***NB*** OPSIN is not OSCAR3. OPSIN was developed as an OSCAR3
component but can also standalone - hence this package.
This is a library for IUPAC name-to-structure conversion. Currently it
should be considered to be under development although the interface for using it will remain constant.
The workings of OPSIN are more fully described in:
Peter Corbett, Peter Murray-Rust High-throughput identification of
chemistry in life science texts. Proceedings of Computational Life
Sciences (CompLife) 2006, Cambridge, UK, pp. 107-118.
The following lists broadly summarise what OPSIN can currently do and what will be worked on in the future.
Supported nomenclature includes:
alkanes/alkenes/alkynes/heteroatom chains e.g. hexane, hex-1-ene, tetrasiloxane and their cyclic analogues e.g. cyclopropane
All IUPAC 1993 recommended rings
Trivial acids
Hantzsch-Widman e.g. 1,3-oxazole
Spiro systems (using Von baeyer brackets)
All von Baeyer rings e.g. bicyclo[2.2.2]octane
Hydro e.g. 2,3-dihydropyridine
Indicated hydrogen e.g. 1H-benzoimidazole
Heteroatom replacement
Specification of charge e.g. ium/ide
Multiplicative nomenclature e.g. ethylenediaminetetraacetic acid
Fused ring systems with some exceptions e.g. imidazo[4,5-d]pyridine
Ring assemblies e.g. biphenyl
Most prefix and infix functional replacement nomenclature
The following functional classes: esters, diesters, glycols, acids, azides, bromides, chlorides, cyanates, cyanides, fluorides, fulminates, hydroperoxides, iodides, isocyanates, isocyanides, isoselenocyanates, isothiocyanates, selenocyanates, thiocyanates, alcohols, selenols, thiols, ethers, ketones, peroxides, selenides, selenones, selenoxides, selones, selenoketones, sulfides, sulfones, sulfoxides, tellurides, telluroketones, tellurones, telluroxides and thioketones
Locanted E/Z/R/S stereochemistry
Currently UNsupported nomenclature includes:
Any stereochemistry other than locanted E/Z/R/S stereochemistry
Greek letters
Lambda convention
Amino Acids (simple substitutive operations are allowed)
Carbohydrates
Steroids
Nucleic acids
Bridged rings
Fused ring systems built from more than one fusion or that involve non 6-membered rings AND are not in a chain
Some conjunctive operations e.g. cyclohexaneethanol
Some functional replacement nomenclature
The following functional classes: Hydrazides, lactones, lactams, acetals, hemiacetals, oxime, oxides, ketals, hydrazones, anhydrides and semicarbazones
To use OPSIN, you'll first need to build it, using the accompanying
ant build file. (a standalone jar file is also available from sourceforge if you are not familiar with
ant and do not wish to alter the sourcecode)
The command:
ant dist
will make a combined .jar file which also includes OPSIN's
dependencies (included).
To run, the class you want is uk.ac.cam.ch.wwmm.NameToStructure. This class should be chosen automatically even if not specified.
This has a main method, so that you can run:
java -jar opsin-0.5.2.jar
then type names in and get CML (chemical markup language) back.
To use within Java
1) Learn about XOM (http://xom.nu), the XML processing framework used
by OPSIN
2) Create an OPSIN instance, by calling the following static method
NameToStructure nameToStructure = NameToStructure.getInstance();
3) Get CML (as XOM Elements), thus:
Element cmlElement = nameToStructure.parseToCML("acetonitrile");
4) Whatever you like. Maybe print it out, thus:
System.out.println(cmlElement.toXML());
NOTE: For efficiency reuse the same instance of NameToStructure. parseToCML will typically take 5-10ms to convert a name to CML making OPSIN suitable for use on a large number of names.
CML can, if desired, be converted to other format such as SD, SMILES, InChI etc. by toolkits such as CDK, OpenBabel and JUMBO.
(NOTE: if you want InChI the simplest and fastest way is touse the seperately available NameToInchi jar in conjunction with an opsin jar)
Good Luck and let us know if you have problems, comments or suggestions!
You can contact us by posting a message on SourceForge or you can email me directly (dl387@cam.ac.uk)