we should integrate cdk 1.6 as soon as it is available.
There is a new SMILES generator, which is a lot faster than the current one (10-100x faster!). Furthermore it resolves some issues with non canonical smiles we currently have. We should use absolute smiles!
There is a good overview in the 1.5.4 release notes: https://github.com/cdk/cdk/wiki/1.5.4-Release-Notes
Diff:
Hmm, if we should use unique or absolute smiles may need a bit of discussion.
Do we want to preserve Stereochemistry information?
Does this mean, that different 3D conformations can be imported? I think, this would be a nice advantage above the current situation, where different conformations are merged to a single entity.
Maybe we should give an option about this. The user might be interested in both.
Last edit: Till Schäfer 2013-12-20
Currently we are using SmilesGenerator.createChiralSMILES() which incorporates stereo information from 2d coordinates. I think we should use similar SMILES when switching to a new CDK version. However, I do not think that it is possible to distinguish different conformation adequately using SMILES.
Here is a valuable blog post about the changes in the smiles parser/generator (cdk version 1.5.4 / 1.6)
http://efficientbits.blogspot.nl/2013/12/new-smiles-behaviour-parsing-cdk-154.html
Last edit: Till Schäfer 2014-03-27
this feature request is related to bug 100
Last edit: Till Schäfer 2014-03-27
Regarding Atom typing, that is not automatically done anymore in 1.5 / 1.6 after smiles parsing, this mailing list post might be also interesting:
AtomTyping isn’t as slow as it used to be but for me the main reason I tended to use it was to add hydrogens. Now these are present on inputs I used I only add atom typing for the following reasons;
1) check whether an atom type is known to the CDK, an unknown type could indicate a dodgy molecule
2) hybridisation is needed
3) a method/algorithm needs it
we will not wait for 1.6.
The major update has been done for release 2.6.0.