#30 SMILESGenerator Canonicity should be switchable

open
cdk.smiles (4)
5
2012-10-08
2004-02-18
No

The first thing the smiles generator dose is label the
atoms in the molecule
using the CannonicaLabler. If you extract an interface
and have two
implementations, one that dose canonical labling and
the other that randomly
labels.

You can then have the labeler as a bean property in the
SmilesParser and set
whichever is required.

From the couple of tests that I just ran ring
perception took almost twice as
long as the labling, although I did use molecules with
large ring systems.

Olli

package org.openscience.cdk.smiles;

import org.openscience.cdk.AtomContainer;

/
* Created by IntelliJ IDEA.
* User: oliver
* Date: 9/02/2004
* Time: 22:13:29

* @author Oliver Horlacher
/
public interface Labler {
void lable(AtomContainer atomContainer) throws
NumberFormatException;
}

package org.openscience.cdk.smiles;

import org.openscience.cdk.AtomContainer;
import org.openscience.cdk.Atom;

/
* Created by IntelliJ IDEA.
* User: oliver
* Date: 9/02/2004
* Time: 22:14:45

* @author Oliver Horlacher
/
public class RandomLabler implements Labler{

public void lable(AtomContainer atomContainer) throws

NumberFormatException {
Atom[] atoms = atomContainer.getAtoms();
for (int i = 0; i < atoms.length; i++) {
Atom atom = atoms[i];
atom.setProperty("CanonicalLable", new
Long(i));
}
}
}

On Mon, 09 Feb 2004 9:50 pm, Christoph Steinbeck wrote:

Hi everybody,

Nicolas Job recently sent me the following
evaluation of SMILES
generation speed in CDK, noting that there is a
significant lack of
performance.
I was wondering if other list members could comment
on this, since the
generation of canonical SMILES certainly is one of
the strongest virtues
of CDK (Thanks again to Oliver Horlacher).

Here are my few cents:

  1. Canonicalization scale unfavorably with growing
    molecule size. As has
    been shown it can at best be done in polynomial time
    [1], but this paper
    is of more academic value.

  2. There are various issues that make creation of
    SMILES more
    demanding than parsing. Besides canonical labeling,
    there is a need for
    ring perception in order to detect aromaticity, and
    unfortunately, this
    demands the Set of All Rings (SAR), which is
    notoriously time consuming
    to find.

  3. Does somebody have any other kit available that
    generates canonical
    SMILES? I would love to see how CDK compares to
    other implementations.

  4. Certainly, one could go for a non-canonical
    implementation of a
    SMILESGenerator and see if this helps. In many
    cases, canonicity is not
    important. I have not yet checked Oliver's code, but
    maybe one can
    simply switch of canonicalization.

Any other comments welcome.

Cheers,

Chris

[1] Faulon, J. L. Isomorphism, automorphism
partitioning, and canonical
labeling can be solved in polynomial-time for
molecular graphs. Journal
of Chemical Information and Computer Sciences 1998,
38, 432-444.

Discussion