#231 SMILES are not unique

cdk-1.2.x
closed
nobody
9
2012-11-03
2004-08-23
Anonymous
No

I converted the attached mol file into a cml file using
MDLReader.read and CMLWriter.write.

Then I produced the SMILES String for the mol file using
SMILESGenerator.createSMILES.

Afterwards, I generated the SMILES String from the CML
file using the same method. It should have been the
same string as the one generated above, since both
files describe the same molecule. But they weren't,
although they were both correct.

I got the following strings:
mol -> SMILES:
[H]C35(CCC1(O)(CC5(CC1(=C))(C(C(O)=O)C4([H])(C2(C)
(CCCC34(COC2(=O)))))))

mol-> cml-> SMILES:
OC(=O)C3C15(CC(=C)C(O)(CCC1([H])C24(CCCC(C)(C
(=O)OC2)C34([H])))C5)

On https://sourceforge.net/mailarchive/forum.php?
forum_id=2178&max_rows=25&style=flat&viewmonth=200
204&viewday=6
I read the following:
"In order to use the canonical SMILES generator you
need to properly
configure the atoms in the molecule for which you want
to generate the
SMILES. The SmilesGenerator needs to know the
correct atomic mass in
order to generate the correct SMILES notation and it
also needs to know
the implicite hydrogen count in order to generate the
canonical order of
atoms."

So maybe this is the problem?
Suggestions/questions/comments to:
winnie.wefelmeyer@vtt.fi

Thanks for helping out!

Discussion

1 2 > >> (Page 1 of 2)
  • mol file

     
    Attachments
  • Stefan Kuhn
    Stefan Kuhn
    2006-09-19

    Logged In: YES
    user_id=452972

    I made a test, SmilesGeneratorTest.testSFBug1014344() and it
    seems to work

     
  • And it open again... I've added a few related tests, and at least the CanonicalLabeler is working fine... I'd guess that one would really decide which atoms comes first... a [H] or a O ... but guess not :(

    The CML roundtripping is not implemented for exact mass and natural abundance (tests added), which might cause it to be open again. Then again, that has never been implemented before...

    It seems to have been triggered by Element.atomicNumber defaulting to UNSET, but have not been able to find the source...

     
  • Rajarshi Guha
    Rajarshi Guha
    2008-12-16

    The problem appears to be in configuring isotope information on the atoms read in rfom the CML file. I have attached a patch to CMLCoreModule that makes fixes this bug - basically it configures the atoms using IsotopeFactory. However it does cause 2 tests to fail in CML2Test but I'm not sure whether this is due to this patch or no.

     
  • Rajarshi Guha
    Rajarshi Guha
    2008-12-16

    Patch to CMLCoreModule (in 1.2.x)

     
  • Rajarshi Guha
    Rajarshi Guha
    2008-12-16

    Went ahead and committed this fix since the fails in CML2Test still occur when this fix is not made

     
  • Reopened because the applied fix causes regressions. This needs a closer look.

     
  • Roundtripping of abundance has to look like:

    <cml>
    <isotopeList>
    <isotope id="iso1">
    <abundance units="units:percentage">1.0</abundance>
    </isotope>
    </isotopeList>
    <molecule>
    <atomArray>
    <atom id="a1" elementType="C" isotopeRef="iso1">
    </atom>
    </atomArray>
    </molecule>
    </cml>

    but this is currently outside the scope of the CML Convertor class...

     
  • Added new unit test for just reading isotope abundance and exact mass, which needs a CML2.5 construct. See commit 13662.

     
  • But just for the record. We have never read this information, so I do not think it is really the cause of this bug showing up again. But only confounded.

     
1 2 > >> (Page 1 of 2)