#30 SMILESGenerator Canonicity should be switchable

John May
cdk.smiles (5)

The first thing the smiles generator dose is label the
atoms in the molecule
using the CannonicaLabler. If you extract an interface
and have two
implementations, one that dose canonical labling and
the other that randomly

You can then have the labeler as a bean property in the
SmilesParser and set
whichever is required.

From the couple of tests that I just ran ring
perception took almost twice as
long as the labling, although I did use molecules with
large ring systems.


package org.openscience.cdk.smiles;

import org.openscience.cdk.AtomContainer;

* Created by IntelliJ IDEA.
* User: oliver
* Date: 9/02/2004
* Time: 22:13:29

* @author Oliver Horlacher
public interface Labler {
void lable(AtomContainer atomContainer) throws

package org.openscience.cdk.smiles;

import org.openscience.cdk.AtomContainer;
import org.openscience.cdk.Atom;

* Created by IntelliJ IDEA.
* User: oliver
* Date: 9/02/2004
* Time: 22:14:45

* @author Oliver Horlacher
public class RandomLabler implements Labler{

public void lable(AtomContainer atomContainer) throws

NumberFormatException {
Atom[] atoms = atomContainer.getAtoms();
for (int i = 0; i < atoms.length; i++) {
Atom atom = atoms[i];
atom.setProperty("CanonicalLable", new

On Mon, 09 Feb 2004 9:50 pm, Christoph Steinbeck wrote:

Hi everybody,

Nicolas Job recently sent me the following
evaluation of SMILES
generation speed in CDK, noting that there is a
significant lack of
I was wondering if other list members could comment
on this, since the
generation of canonical SMILES certainly is one of
the strongest virtues
of CDK (Thanks again to Oliver Horlacher).

Here are my few cents:

  1. Canonicalization scale unfavorably with growing
    molecule size. As has
    been shown it can at best be done in polynomial time
    [1], but this paper
    is of more academic value.

  2. There are various issues that make creation of
    SMILES more
    demanding than parsing. Besides canonical labeling,
    there is a need for
    ring perception in order to detect aromaticity, and
    unfortunately, this
    demands the Set of All Rings (SAR), which is
    notoriously time consuming
    to find.

  3. Does somebody have any other kit available that
    generates canonical
    SMILES? I would love to see how CDK compares to
    other implementations.

  4. Certainly, one could go for a non-canonical
    implementation of a
    SMILESGenerator and see if this helps. In many
    cases, canonicity is not
    important. I have not yet checked Oliver's code, but
    maybe one can
    simply switch of canonicalization.

Any other comments welcome.



[1] Faulon, J. L. Isomorphism, automorphism
partitioning, and canonical
labeling can be solved in polynomial-time for
molecular graphs. Journal
of Chemical Information and Computer Sciences 1998,
38, 432-444.


  • Egon Willighagen

    • assigned_to: Christoph Steinbeck --> John May
    • Group: -->
    • John May

      John May - 2016-08-10

      Sure you just set the order of the input atoms and generate a 'Generic' or
      'Isomeric' SMILES. This outputs do not reorder the input molecule.

  • Egon Willighagen

    John, is it possible to canonicalize a SMILES using other atom labeling/priotization schemes, like the InChI atom numbering?

  • John May

    John May - 2016-08-10
    • status: open --> closed

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks