Re: [Rdkit-devel] SDF String Generation Include Stereo information
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Greg L. <gre...@gm...> - 2020-07-29 11:56:19
|
Hi Emanuel, The chirality bit doesn't have anything to do with double bond stereochemistry.[1] So that's not what's going on here The RDKit has the ability to pass the mol block provided directly to the InChI code without interpreting it. I believe that the ChEMBL team is using that to generate InChIs. In any case, where I use that to pass the molblock downloaded from ChEMBL ( https://www.ebi.ac.uk/chembl/api/data/molecule/CHEMBL6223.sdf) to the InChI code I get the same InChI that is found in ChEMBL. In this particular case I believe the bug may be in the NAOMI code. -greg [1] According to the documentation it tells you about whether or not a molfile with specified atomic stereochemistry represents a single stereoisomer (the one drawn) or that only the relative configurations of the specified stereocenters is known and that the structure is either a single diastereomer or a mixture of the two stereoisomers. On Wed, Jul 29, 2020 at 11:17 AM Emanuel Ehmki <ema...@gm...> wrote: > Dear All, > > I am currently working with the RDKit generated SDF String that is stored > in the ChEMBL COMPOUND_STRUCTURES table in the ChEMBL database release 26. > My workflow is: > > - pull SDF (V2000) from SQL table > - generate internal molecule representation (NAOMI ChemBio tool-kit if > that means anything to you) > - generate InChI string and key from molecule > - compare with InChI string and key that are stored in the ChEMBL > database > > When comparing the InChI string for the molecule with the id CHEMBL6223, I > get two differing strings due to different stereochemistry (last characters) > > ChEMBL > InChI: InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/ > *b12-10+* > NAOMI InChI > : InChI=1S/C16H13IO2/c17-10-12-8-9-15(16(18)19-12)14-7-3-5-11-4-1-2-6-13(11)14/h1-7,10,15H,8-9H2/ > *b12-10-* > > While researching why that happens I realized that the SDF string doesn't > make use of the chirality bit that can be set in the counts line. > When digging deeper I found the disabled block in the MolFileWriter.cpp -> > MolToMolBlock function > > https://github.com/rdkit/rdkit/blob/f14f8a60de0ecf4bf5294d73b177d19055e0096d/Code/GraphMol/FileParsers/MolFileWriter.cpp#L1395 > > Do I understand correctly that RDKit does not store any information about > chirality in V2000 and includes chiral information only in V3000 SDF format? > > Does anyone know when ChEMBL might switch to that version? > > Kind regards, > Emanuel > _______________________________________________ > Rdkit-devel mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-devel > |