Menu

Tools_and_data_formats_for_chemical_data_handling

Anonymous Noel O'Boyle

Enabling chemists to send experimental or theoretical data together with a publication requires software (commercial and open access) which can create, handle or transform chemistry related data. This includes chemical drawings, reactions, spectral data and chemical property data.

Data for publication supplements should be submitted in open data formats (XML, CML, ThermoML, JCAMP) or at least in data formats which are well defined (like SD format V3000).Chemical data supplements should not be submitted in PDF format, a format which destroys chemical information and hinders automated machine readability. The publishing of chemical molecule and reaction drawings as picture data (TIFF, BMP, PNG) is needed for the print process, but breaks any simple computer capturing process. Instead such chemical bitmap data needs to be run through an optical character recognition process (OCR) to capture the chemical formulas. This process is not error-free, has a poor accuracy and would not be needed if the chemical meta data is submitted as CML.

Every modern chemistry software can export CML for molecule and reaction drawings and every software which captures experimental thermodynamic or spectroscopic data must support open data exchange formats (JCAMP, netCDF, ThermoML and others).

Tools

Molecule drawings

This section includes tools and data formats for molecules (mol, sdf, cml, SMILES) and reaction data (rnx, rdx, cml, SMARTS, SMIRKS). Chemical drawings should be exported at CML (Chemical markup language) or mol format. No software or vendor specific format or even worse picture formats (BMP, JPEG, TIFF) should be used. If possible a list of InChI codes (InChIKey) should be created from all molecules. Examples see below.

Name Vendor Open/Closed Source Operating System Note

ISISDraw
MDL
Closed
Windows software
(deprecated no CML import/export; copy/paste into other programs is possible)

ChemDraw
CambridgeSoft
Closed
Windows software

ChemSketch
ACDLabs
Closed
Windows software

MarvinSketch
BioRad
Closed
Windows

KnowItAll
BioRad
Closed
Windows

XDrawChem
<http://xdrawchem.sourceforge.net/>
Open
Windows/LINUX/OSX

JChemPaint
<http://cdk.sourceforge.net/>
Open
Platform Independent

Bioclipse
<http://www.bioclipse.net/>
Open
Windows/Linux/OS-X

Chemical reaction drawings

Chemical drawings should be exported into CML or RNX format. Examples see below.

Name Vendor Open/Closed Source Operating System Note

ISISDraw
MDL
Closed
Windows software
(deprecated no CML import/export; copy/paste into other programs is possible)

ChemDraw
CambridgeSoft
Closed
Windows software

ChemSketch
ACDLabs
Closed
Windows software

MarvinSketch
BioRad
Closed
Windows

KnowItAll
BioRad
Closed
Windows

JChemPaint
<http://cdk.sourceforge.net/>
Open
Platform Independent

Bioclipse
<http://www.bioclipse.net/>
Open
Windows/Linux/OS-X

Building and visualising molecules

The below table provides is only intended to provide an overview of the functionality of a limited number of codes. The pages linked to in the "Special Features" section are places for users/developers to highlight particular strengths or unique features of a code.

For a more comprehensive list of the various builders and visualisers that are available, please see the Linux4Chemistry list or Mario Valle's list of Free Chemistry Visualisation Tools.

Program Building Visualising Platforms Open Special Features

Small Mol. Large Struct. Periodic Struct. Internal Minimiser Molecules Isosurfaces Vector Fields Windows Mac OSX Linux

Aten
y
y
y
y
y
y
-
-
y
y
?
[AtenFeatures]

Avogadro
y
y
y
y
y
y
-
y
y
y
y
[AvogadroFeatures]

CCP1GUI
y
-
-
y
y
y
y
y
y
y
?
[CCP1GUIFeatures]

Jmol
y
y
y
y
y
y
-
y
y
y
y
[JmolFeatures]

Molden
y
-
-
y
y
y
-
y
y
y
?
[MoldenFeatures]

Molekel

-

-
y
y
-
y
y
y
?
[MolekelFeatures]

Zeobuilder
y
y
y
-
y
-
-
-
-
y
?
[ZeobuilderFeatures]

Jamberoo
y
y
y
y
y
y
[ZeobuilderFeatures]

Chemical file format converters

Such converter tools can be used to convert chemical data into accepted data formats (CML, MOL, SDF, PDB).

Chemical property data handling and storage

Pure experimental and calculated molecular property data (mp, bp, logP, pKa, solubility, toxicity, molecular descriptors, toxicity data) should be supplied in open data formats like XML, allowed but discouraged are also TXT (TAB separeated) or XLS (BIFF4 or later) format. If molecular data is available the files should can be exported in SDF format together with molecular information. Forbidden are supplements in PDF format. Large files should be compressed in .gz or .zip format.

Name Vendor Open/Closes Source Note

Bioclipse
Bioclipse team
Open

EXCEL
Microsoft
Closed

Calc Spreadsheet
OpenOffice
Open

Instant-JChem
ChemAxon
Closed

ACDLabs
Closed
several spectral data packages

7ZIP
 ?
free compression and decompression tool for WIN/LINUX/OSX

TRC
 ?
tools for ThermoML conversion, capturing of experimental data and data format conversions

Spectral data and hyphenated techniques data

Here we are talking about NMR, MS, UV, IR, GC-MS, LC-MS, LC-UV.

  • Vendor specfic software (hardware dependent)
  • BioClipse - BioClipse team
  • ACDLabs - several spectral data packages
  • GRAMS - Thermo Grams/AI

Data Formats

Molecular data

Allowed but discouraged are vendor specific formats (like .skc in case of ISIS Draw or SMILES). Large files should be compressed in .gz format (GNU ZIP) or .zip format.

  • CML (Chemical Markup Language)
  • SD file format (V2000, V3000 form MDL)
  • MOL format (MDL)
  • PDB format
  • SMARTS (Daylight)
  • InChI (IUPAC and NIST)
  • InChIKey (IUPAC and NIST) short InChI hash code - IUPAC
Quantum Chemistry
Nuclear Magnetic Resonance (NMR)
  • JCAMP
Mass Spectrometry (IR)
  • JCAMP
Infrared Data (IR)
  • JCAMP
Optical Spectroscopy in general
  • JCAMP
Crystal Structures
  • CIF
  • PDB
GC-MS data
LC-MS data
Thermodynamic Property Data
  • ThermoML - IUPAC and NIST standard format for thermodynamic property data (bp, entropy, solubility and 120 other properties)

BACK to Open Data in Chemistry


Related

Blue Obelisk Wiki: AtenFeatures
Blue Obelisk Wiki: Dat_file
Blue Obelisk Wiki: Log_file

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.