Here is the story.
1) Create (somehow) a structure as IMolecule (e.g. benzene with
alternating single/double bonds)
2) Run HueckelAromaticityDetector to perceive aromaticity
3) Write the structure into CML
4) Read the structure back from CML into a new IMolecule
5) Now the new IMolecule has bond orders 1.5 , while the origin one as
1) has bond orders 1.0 and 2.0

Ah... that sounds like a bug. CDK in the early days used 1.5 bond order ==
an aromatic bond, but drop that in favour of
IBond.setFlag(CDKConstancs.ISAROMATIC). Seems that the CMLReader has not
been updated.

This doesn't break isomorphism test and fingerprints, but does break
types recognition as per HybridizationStateATMatcher. In this example,
atom types of the original molecule will be C.sp2, while in the new
molecule will be Caromatic.sp2.

Atom type perception is a tricky thing indeed. What suprises me that the
first type is not aromatic too! Because you did do atomiticity detection.
Actually, since the atom type list contains the the concept of aromaticity,
it must be done prior to calling the perception tool. Maybe a second bug?

Not sure this breaks anything else than my atom environments code, but I
wondering what's the best way to handle the issue.

Attached is JUnit test.


    Nina, can this bug be reproduced in trunk/?

    An update (tested with nightly build 23.08.2008 )
    + Aromatic bond order issue is fixed
    - Aromatic flags on atoms are lost during CML roundtrip

    The results of atom typing differ depending whether hydrogens are assigned and of course on atom type matcher.

    1. With implicit hydrogen assigned
      1.1. The results with CDKAtomTypeMatcher
    2. all atoms before and after the CML roundtrip are of type C.sp2

    1.2. The results with SybylAtomTypeMatcher -
    - all atoms before and after the CML roundtrip are of type C.ar

    As I understood, aromatic atom type is intentionally not perceived by CDKAtomTypeMatcher.
    Is it the same for the aromatic atom flags?

    1. Hydrogens unset
      2.1. The results with CDKAtomTypeMatcher
    2. atoms before the CML roundtrip are of type C.sp2
    3. atoms after the CML roundtrip are of type NULL

    2.2. The results with SybylAtomTypeMatcher -
    - atoms before the CML roundtrip are of type C.ar
    - atoms after the CML roundtrip are of type NULL

    Most probably the reason is that in CML hydrogen count is written as zero, despite that it is UNSET , and subsequently set as zero in the molecule read.

    <atom id="a1" elementType="C" formalCharge="0" hydrogenCount="0" isotopeNumber="12"/>

    Is there an agreement how to store unset properties in CML?

  • Nina, I added unit tests for the original problems and the new hydrogen count == unset problems. All succeed. So closing this bug. Please file a new report if you find additional problems.