[Rdkit-discuss] bad inchi or parsing problem?
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Jason B. <jas...@gm...> - 2017-09-14 16:53:03
|
mol =
Chem.inchi.MolFromInchi("InChI=1S/C16H10N6O2/c23-21(15-9-17-11-5-1-3-7-13(11)19-15)22(24)16-10-18-12-6-2-4-8-14(12)20-16/h1-10H")
gives sanitization errors about a four-valent nitrogen. Turning off
sanitization results a bad structure, visually and chemically
[image: Inline image 1]
It puts a negative charge on one of the nitrogens
20 7 N chg: 1 deg: 3 exp: 4 imp: 0 hyb: 0 arom?: 0 chi: 0
21 7 N chg: -1 deg: 3 exp: 4 imp: 0 hyb: 0 arom?: 0 chi: 0
22 8 O chg: 0 deg: 1 exp: 2 imp: 0 hyb: 0 arom?: 0 chi: 0
23 8 O chg: 0 deg: 1 exp: 2 imp: 0 hyb: 0 arom?: 0 chi: 0,
Is it written this way in the inchi string? I don't know how to parse an
inchi like I can a smiles (are they human readable at all?)
But if I use cactus to convert the inchi to a smiles
<https://cactus.nci.nih.gov/chemical/structure/InChI=1S/C16H10N6O2/c23-21(15-9-17-11-5-1-3-7-13(11)19-15)22(24)16-10-18-12-6-2-4-8-14(12)20-16/h1-10H/smiles>,
I get "[O-][N+](c1cnc2ccccc2n1)=[N+]([O-])c3cnc4ccccc4n3", which seems like
a more sensible way to divvy up the available electrons.
[image: Inline image 2]
Is this a bad inchi string that cactus is quietly fixing for me? If I use
CDK to convert the inchi to a smiles, then I get
"C1=CC=C2C(=C1)N=CC(=N2)N(=N(=O)C3=NC4=CC=CC=C4N=C3)=O", which doesn't seem
right since it has pentavalent nitrogens. But somehow rdkit fixes this
case to give the same structure as from the cactus output.
Is this a bad inchi string? Is it ambiguous?
Jason
|