Thread: [Rdkit-discuss] Building a molecule from scratch
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Noel O'B. <bao...@gm...> - 2008-04-15 10:41:35
|
I am running into trouble building a molecule from scratch. Let's say the molecule represented by "F/C=C/F" (taken from the Daylight SMILES tutorial). Here's the code to build it (according to me): import Chem # Set some lookups _bondtypes = {1: Chem.BondType.SINGLE, 2: Chem.BondType.DOUBLE, 3: Chem.BondType.TRIPLE} _bondstereo = {0: Chem.rdchem.BondStereo.STEREONONE, 1: Chem.rdchem.BondStereo.STEREOE, 2: Chem.rdchem.BondStereo.STEREOZ} # Build F/C=C/F rdmol = Chem.Mol() rdedmol = Chem.EditableMol(rdmol) for atomnum in [9, 6, 6, 9]: rdatom = Chem.Atom(atomnum) rdedmol.AddAtom(rdatom) for bond in [(0,1,1), (1,2,2), (2,3,1)]: rdedmol.AddBond(bond[0], bond[1], _bondtypes[bond[2]]) rdmol = rdedmol.GetMol() for stereoID, newbond in zip([0, 1, 0], rdmol.GetBonds()): newbond.SetStereo(_bondstereo[stereoID]) Chem.SanitizeMol(rdmol) # Gets here fine # Print the result print Chem.MolToSmiles(rdmol, isomericSmiles=True) # RuntimeError: Pre-condition Violation What am I missing? It works fine if I use [0, 0, 0] for stereoID, so it's something to do with the way I handle the stereoisomer. Noel |
From: Greg L. <gre...@gm...> - 2008-04-15 17:25:42
|
On Tue, Apr 15, 2008 at 12:41 PM, Noel O'Boyle <bao...@gm...> wrote: > I am running into trouble building a molecule from scratch. Let's say > the molecule represented by "F/C=C/F" (taken from the Daylight SMILES > tutorial). Here's the code to build it (according to me): > > import Chem > > # Set some lookups > _bondtypes = {1: Chem.BondType.SINGLE, > 2: Chem.BondType.DOUBLE, > 3: Chem.BondType.TRIPLE} > _bondstereo = {0: Chem.rdchem.BondStereo.STEREONONE, > 1: Chem.rdchem.BondStereo.STEREOE, > 2: Chem.rdchem.BondStereo.STEREOZ} > > # Build F/C=C/F > rdmol = Chem.Mol() > rdedmol = Chem.EditableMol(rdmol) ah... my friend EditableMol. As you are discovering, this is a very good way to shoot yourself in the foot. (Particularly if there's no documentation telling you how to aim the gun) :-) > for atomnum in [9, 6, 6, 9]: > rdatom = Chem.Atom(atomnum) > rdedmol.AddAtom(rdatom) > > for bond in [(0,1,1), (1,2,2), (2,3,1)]: > rdedmol.AddBond(bond[0], > bond[1], > _bondtypes[bond[2]]) > > rdmol = rdedmol.GetMol() > for stereoID, newbond in zip([0, 1, 0], rdmol.GetBonds()): > newbond.SetStereo(_bondstereo[stereoID]) The problem is here. I shouldn't even have exposed SetStereo to Python. To indicate bond stereochemistry, you need to set the directionalities of the neighboring single bonds. So you'd do: [13] >>> rdmol.GetBondWithIdx(0).SetBondDir(Chem.BondDir.ENDUPRIGHT) [15] >>> rdmol.GetBondWithIdx(2).SetBondDir(Chem.BondDir.ENDUPRIGHT) Notice that this does not set the stereochemistry of the double bond: [16] >>> rdmol.GetBondWithIdx(1).GetStereo() Out[16]: Chem.rdchem.BondStereo.STEREONONE The RDKit is, at the moment, lazy about stereochemistry : it doesn't do the perception until it's required. So if you ask for SMILES: [17] >>> Chem.MolToSmiles(rdmol,True) Out[17]: 'F/C=C/F' you get stereochemistry calculated: [18] >>> rdmol.GetBondWithIdx(1).GetStereo() Out[18]: Chem.rdchem.BondStereo.STEREOE If you want the stereochem information earlier, you can call: Chem.AssignBondStereoCodes(rdmol) > Chem.SanitizeMol(rdmol) > # Gets here fine > > # Print the result > print Chem.MolToSmiles(rdmol, isomericSmiles=True) > # RuntimeError: Pre-condition Violation There actually is an error message associated with this, but it doesn't make it through the C++ / Python wrapper. That's something for me to look into. > What am I missing? It works fine if I use [0, 0, 0] for stereoID, so > it's something to do with the way I handle the stereoisomer. If you're interested in the actual reason behind the error, I'm happy to explain, but I'd understand if you just want to know how to make it go away. :-) -greg |
From: Noel O'B. <bao...@gm...> - 2008-04-15 21:47:11
|
On 15/04/2008, Greg Landrum <gre...@gm...> wrote: > The problem is here. I shouldn't even have exposed SetStereo to > Python. To indicate bond stereochemistry, you need to set the > directionalities of the neighboring single bonds. So you'd do: > [13] >>> rdmol.GetBondWithIdx(0).SetBondDir(Chem.BondDir.ENDUPRIGHT) > [15] >>> rdmol.GetBondWithIdx(2).SetBondDir(Chem.BondDir.ENDUPRIGHT) > > Notice that this does not set the stereochemistry of the double bond: > > [16] >>> rdmol.GetBondWithIdx(1).GetStereo() > Out[16]: Chem.rdchem.BondStereo.STEREONONE > > The RDKit is, at the moment, lazy about stereochemistry : it doesn't > do the perception until it's required. So if you ask for SMILES: > > [17] >>> Chem.MolToSmiles(rdmol,True) > Out[17]: 'F/C=C/F' > > you get stereochemistry calculated: > [18] >>> rdmol.GetBondWithIdx(1).GetStereo() > Out[18]: Chem.rdchem.BondStereo.STEREOE > > If you want the stereochem information earlier, you can call: > Chem.AssignBondStereoCodes(rdmol) I'm afraid that the chunks are still too big - you need to break it down a bit more... What exactly do UPRIGHT and DOWNRIGHT mean and how exactly do they indicate the stereochemistry of the double bond? Do they correspond to the SMILES usage of slashes? Noel |
From: Greg L. <gre...@gm...> - 2008-04-16 03:55:26
|
On Tue, Apr 15, 2008 at 11:47 PM, Noel O'Boyle <bao...@gm...> wrote: > On 15/04/2008, Greg Landrum <gre...@gm...> wrote: > > The problem is here. I shouldn't even have exposed SetStereo to > > Python. To indicate bond stereochemistry, you need to set the > > directionalities of the neighboring single bonds. So you'd do: > > [13] >>> rdmol.GetBondWithIdx(0).SetBondDir(Chem.BondDir.ENDUPRIGHT) > > [15] >>> rdmol.GetBondWithIdx(2).SetBondDir(Chem.BondDir.ENDUPRIGHT) > > > > Notice that this does not set the stereochemistry of the double bond: > > > > [16] >>> rdmol.GetBondWithIdx(1).GetStereo() > > Out[16]: Chem.rdchem.BondStereo.STEREONONE > > > > The RDKit is, at the moment, lazy about stereochemistry : it doesn't > > do the perception until it's required. So if you ask for SMILES: > > > > [17] >>> Chem.MolToSmiles(rdmol,True) > > Out[17]: 'F/C=C/F' > > > > you get stereochemistry calculated: > > [18] >>> rdmol.GetBondWithIdx(1).GetStereo() > > Out[18]: Chem.rdchem.BondStereo.STEREOE > > > > If you want the stereochem information earlier, you can call: > > Chem.AssignBondStereoCodes(rdmol) > > I'm afraid that the chunks are still too big - you need to break it > down a bit more... > > What exactly do UPRIGHT and DOWNRIGHT mean and how exactly do they > indicate the stereochemistry of the double bond? Do they correspond to > the SMILES usage of slashes? Exactly. Since Z/E labels are only defined when you have the whole molecule built (because you need to be able to assign priorities to the atoms), that's done "at the end". Stereochemistry about the double bond is defined at construction time using the SMILES approach of indicating the directionality of the neighboring single bonds. The two possible labels for this are ENDUPRIGHT (forwards slash: "/") and ENDDOWNRIGHT (backwards slash: "\"). The stereochemistry perception done by Chem.AssignBondStereoCodes uses these directionality indicators and the atomic CIP ranks to determine the absolute (Z/E) stereochemistry of the double bond. Note that, as in SMILES, you only need to provide the directionality of one neighbor bond on each side of the double bond. So "C/C(Cl)=C(/C)" is fine, you don't need to provide a direction on the C1-Cl2 bond (and probably shouldn't, since that just increases the chance of errors). Does that help? -greg |