Importing of a sdf with a chiral molecule which is indicated in the counts
line should lead to a chiral representation of the molecule in ScaffoldHunter
and to the creation of a chiral SMILES. But with the enclosed sdf one could get the likewise enclosed image without chirality information.
image without chirality information
This might be related to this bug report:
https://sourceforge.net/p/cdk/bugs/1257/
In this case it will be fixed when updating the CDK version we use, see https://sourceforge.net/p/scaffoldhunter/feature-requests/31/
I tested the dataset with both the old CDK 1.4 and the new CDK 1.5. It seems that there is no difference between both visual representations. Both look the same as the picture in the first post. The SMILES, however, look different in both versions:
Scaffold:
1.4: O=S(c1nc2ccccc2([nH]1))Cc3ncccc3
1.5: O=S(C1=NC2=CC=CC=C2N1)CC=3N=CC=CC3
Molecule:
1.4: O=S(c\1n\c\2c\c(OC)c\c\c\2([nH]1))Cc\3n\c\c(c(OC)c\3C)C
1.5: CC1=C(C(=C(CS@@C2=NC3=C(C=CC(=C3)OC)N2)N=C1)C)OC
I don't know whether these SMILES are equivalent or if the CDK 1.5 SMILES is more "correct" in any sense.
For the molecule SMILES CDK 1.5 consider chirality, but it seems to be unable to detect the aromaticity (in contrast CDK 1.4 does).
@Sven: can you retest this after my lastes push [e88450] to the cdk-update branch and provide the new SMILES? Aromaticity was not corrly applied before.
Related
Commit: [e88450]
Now I get the following SMILES:
Scaffold:
1.4: O=S(c1nc2ccccc2([nH]1))Cc3ncccc3
1.5: O=S(c1nc2ccccc2[nH]1)Cc3ncccc3
Molecule:
1.4: O=S(c\1n\c\2c\c(OC)c\c\c\2([nH]1))Cc\3n\c\c(c(OC)c\3C)C
1.5: Cc1cnc(CS@@c2[nH]c3ccc(cc3n2)OC)c(C)c1OC
Interestingly, the SMILES now look much more like the CDK 1.4 SMILES without all these double bonds explicitly written down. The SVG still look the same - I guess our current SVG implementation simply does not support chirality?
The generated SMILES now seem to be correct. What do you think, Lina?
Indeed our SVG implementation does not support chirality information. I will create a separate feature request for this. Switching to the new CDK depiction should solve the problem.
I overlooked that the SMILES for the molecule generated from CDK 1.5 was wrong both times. There is an oxygen atom missing. Therefore, also the chirality couldn't be correct. In fact without the oxygen there is no chirality at the sulfur atom.
I did not notice the missing atom. This has actually been caused by formatting issues. These are the SMILES posted above without formatting applied:
1.4:
O=S(c\1n\c\2c\c\(OC)c\c\c\2([nH]1))Cc\3n\c\c\(c\(OC)c\3C)C
1.5:
Cc1cnc(C[S@@](=O)c2[nH]c3ccc(cc3n2)OC)c(C)c1OC
Does this look fine for you now?
It is a quite puzzling issue. I think the SMILES miss the representation of the electron pair but I also guess that there exists a rule I don't know (something like the electron pair stays implicit at the end). Without this rule it seems to be not clear whether it is the S- or the R-enantiomer. To get the S-enantiomer the electron pair has to stand in the smiles between the oxygen (=O) and the bemzimidazole (c2[nH]c3ccc(cc3n2)).
How do you create the canonical chiral SMILES? Maybe we can ask the person how created the SMILES generator for more information or we can have a look into the algorithm. The question is: Where is the free electron pair implicitly represented in the SMILES?
I think the canonical SMILES were generated with CDK 1.5.11, the source code should be available. Unfortunately, I do not have the chemical background to fully understand the problem. I would propose the following: Sven could post a minimal code example using CDK to read the SDF and to generate the SMILES (just like this is done internally in Scaffold Hunter) and the result it produces. Then Lina could write an email to the CDK mailing list (or open a bug report for the CDK tracker) with the code and an explanation why the result is not correct.
I think having the code used to parse the SDF and to process the molecule is importat to solve the issue.
The SMILES were created by the CDK. Unfortunenately the source code for the actual SMILES generation is not provided by the CDK, as they use some class named "Graph" from another library.
I have created a small example, which reads the exact sdf file and produces the above SMILES for the molecule. In order to run the code, the cdk library must be linked and the sdf file must be in the "resources" folder. Should I append a runnable jar, an eclipse project or just my code?
Just the code should be fine.
OK, here it is. But keep in mind, the code requires a properly set up project with CDK 1.5.11 in it and the Esomeprazole.sdf from the first post.
The SMILES is fine. There is a rule for the electron pair, which implicitly represent the electron pair between "@" and "]" in the SMILES (see CDK mailing list November 2015).
The current cdk-depcition branch is able to generate an SVG with chirality visualization. Once the branches are merged, this bug will be resolved.