Re: [Rdkit-discuss] Generating all stereochem possibilities from smile
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: James J. <tot...@gm...> - 2016-12-11 21:23:21
|
Thank you all again for the great responses! :) There seems to be a consensus between Andrew and Pavel that the solution is with SetBondDir(). I've tried Pavel's functions get_unspec_double_bonds() and set_double_bond_stereo() in python 2 (couldn't figure out how to get RDkit set up in python 3, but it seems like python 3 is only used for read_input). In python 2 get_unspec_double_bonds() worked, but set_double_bond_stereo() did not change the stereo of the bond. I've also tried the SetBondDir() function myself with no avail. However, Brian's answer makes the logical sense with setStereo(). I did not find a function like that in python (It is in the C++ code however: http://www.rdkit.org/C++_Docs/Bond_8h_source.html#l00278). Oddly enough there is the GetStereo() function for python. I'm wondering if maybe I could use SetProp() to do this? I couldn't find how exactly GetStereo() works, if it is a property that is set or calculated. But I would assume if it was a property I could do something like the following: SetProp("d_stereo","STEREOE"). However, I did not find the right property name to do this. Again, thank you all for the help. It is very much appreciated! - James On Fri, Dec 9, 2016 at 11:46 PM, Pavel Polishchuk <pav...@uk...> wrote: > I just want to share my script, which I use for enumeration of > stereoisomers. Enumeration of double bonds was added quite recently and > thus I didn't test it extensively. > I put it on github: <https://github.com/DrrDom/rdk> > https://github.com/DrrDom/rdk > It seems to work well on quite complex queries like > CC1CCC(CC1)C1CC=C(C\C(=C/I)C(=CF)C1C1C[C@@H](C)CC=C1)C1CC[C@H](O)C(Br)C1 > > to generate all possible stereoisomers: > gen_stereo_rdkit3.py -i input.smi -o output.smi -t -d > > to discard compounds with more than 4 undefined centers/double bonds: > gen_stereo_rdkit3.py -i input.smi -o output.smi -t -d -u 4 > > Try it on your own risk :) > If you will find mistakes let me know. > > Pavel. > > > On 12/10/2016 03:44 AM, Brian Cole wrote: > > So I may need a little guidance on this one from someone with a little > more historical knowledge of RDKit. > > I found the findPotentialStereoBonds function inside Chirality.cpp that > appears to do what we want, detect double bonds with unspecified E/Z stereo > chemistry. That's at least what the comment in the file leads me to > believe: > > // Find bonds than can be cis/trans in a molecule and mark them as "any" > ... > void findPotentialStereoBonds(ROMol &mol, bool cleanIt) { > > > But when you look at the code, it's not setting bonds to Bond::STEREOANY, > it's setting them to Bond::STEREONONE here: > > ... > // proceed only if we either want to clean the stereocode on this > bond > // or if none is set on it yet > > if (cleanIt || dblBond->getStereo() == Bond::STEREONONE) { > dblBond->setStereo(Bond::STEREONONE); > ... > > Seems odd to me check that the stereo is Bond::STEREONONE, and then set it > to that value. > > While findPotentialStereoBonds isn't exposed in Python, there is some Java > docs written for it in the Java wrappers: > %javamethodmodifiers RDKit::MolOps::findPotentialStereoBonds ( > ROMol & mol, bool cleanIt = false ) > " > /** > <p> > finds bonds that could be cis/trans in a molecule and mark them as > Bond::STEREONONE > <p> > > > So that documentation is actually correct. It does find bonds that could > be cis/trans and mark them as Bond::STEREONONE. That just isn't very useful > since all bonds are marked Bond::STEREONONE by default. > > The only direct test case of findPotentialStereoBonds I can find is the > following: > > MolOps::findPotentialStereoBonds(*m, false); > TEST_ASSERT(m->getNumAtoms() == 15); > TEST_ASSERT(m->getBondWithIdx(1)->getBondType() == Bond::DOUBLE); > TEST_ASSERT(m->getBondWithIdx(1)->getStereoAtoms().size() == 2); > TEST_ASSERT(m->getBondWithIdx(3)->getBondType() == Bond::DOUBLE); > TEST_ASSERT(m->getBondWithIdx(3)->getStereoAtoms().size() == 2); > > That doesn't even test the getStereo() return value. Though it does test > getStereoAtoms() return value. So it looks like the proper thing to do to > detect unspecified bond stereo is to check for a non-empty getStereoAtoms() > vector? > > So ideally, I would like to write an bond stereo enumerator that tests if > a double bond has unspecified stereo (testing the size of getStereoAtoms() > would work) and then sets the stereo, that is: > > bond.SetStereo(Bond::STEREOE); > or > bond.SetStereo(Bond::STEREOZ); > > And then hopefully the Chem.MolToSmiles will "do the right thing". > However, this commented out code in the wrapper has given me pause: > > // this is no longer exposed because it requires that stereo atoms > // be set. This is a task that is tricky and "dangerous". > //.def("SetStereo",&Bond::setStereo, > // "Set the CIP-classification of the bond as a > BondStereo\n") > > So apparently I shouldn't expose an "easy" way to do this. What is the > trickiness and dangerousness of this API? And could we make an easy way to > enumerate bond stereo? > > Thanks! > > > > On Fri, Dec 9, 2016 at 5:44 PM, Brian Cole <co...@gm...> wrote: > >> This has me quite curious now, how do we detect unspecified bond stereo >> chemistry in RDKit? >> >> m = Chem.MolFromSmiles("FC=CF") >> assert m.HasProp("_StereochemDone") >> for bond in m.GetBonds(): >> print(bond.GetBondDir(), bond.GetStereo()) >> >> Yields: >> >> (rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE) >> (rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE) >> (rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE) >> >> Trying to force the issue: >> >> Chem.AssignStereochemistry(m, cleanIt=True, force=True, flagPossibleStereoCenters=True) >> >> assert m.HasProp("_StereochemDone") >> for bond in m.GetBonds(): >> print(bond.GetBondDir(), bond.GetStereo()) >> >> Still yields: >> >> (rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE) >> (rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE) >> (rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE) >> >> I can see the setStereo(Bond::STEREOANY) in the code here: >> >> if ((*bondIt)->getBondDir() == Bond::EITHERDOUBLE) { >> (*bondIt)->setStereo(Bond::STEREOANY); >> >> Guess I have to figure out how to detect EITHERDOUBLE? >> >> -Brian >> >> On Fri, Dec 9, 2016 at 5:03 PM, Andrew Dalke <da...@da...> >> wrote: >>> >>> On Dec 9, 2016, at 9:50 PM, Brian Kelley wrote: > >>> from rdkit import >>> Chem > >>> m = Chem.MolFromSmiles("F/C=C/F") > >>> for bond in >>> m.GetBonds(): > ... print bond.GetStereo() > ... > STEREONONE > STEREOE >>> > STEREONONE > > However, setting bond stereo doesn't appear to be exposed. I >>> thought I knew a way to change it, via GetBondDir()/SetBondDir() on the >>> directional bonds. It doesn't seem to work. First, get the existing bond >>> directions: >>> from rdkit import Chem >>> mol = >>> Chem.MolFromSmiles("F/C=C/F") >>> for bond in mol.GetBonds(): ... >>> print(bond.GetBondDir()) ... ENDUPRIGHT NONE ENDUPRIGHT I'll change the >>> first "/" to a "\" (That's "\\" when escaped for a normal Python string.) >>> >>> mol.GetBondWithIdx(0) <rdkit.Chem.rdchem.Bond object at 0x10fe26600> >>> >>> mol.GetBondWithIdx(0).GetBondDir() rdkit.Chem.rdchem.BondDir.ENDUPRIGHT >>> >>> mol.GetBondWithIdx(0).SetBondDir(Chem.BondDir.ENDDOWNRIGHT) If I >>> generate the SMILES I see the old "F/C=C/F" instead of the new direction. I >>> need to clear internal computed properties when I make structure edits: >>> >>> Chem.MolToSmiles(mol, isomericSmiles=True) 'F/C=C/F' >>> >>> mol.ClearComputedProps() >>> Chem.MolToSmiles(mol, isomericSmiles=True) >>> 'F/C=C\\F' I'll set the directions to "NONE" and clear computed properties. >>> The SMILES is correct: >>> mol.GetBondWithIdx(0).SetBondDir(Chem.BondDir.NONE) >>> >>> mol.GetBondWithIdx(2).SetBondDir(Chem.BondDir.NONE) >>> >>> mol.ClearComputedProps() >>> Chem.MolToSmiles(mol, isomericSmiles=True) >>> 'FC=CF' although the stereo setting is unchanged: >>> for bond in >>> mol.GetBonds(): ... print(bond.GetStereo()) ... STEREONONE STEREOE >>> STEREONONE I expected it to give me all STEREONONEs, as if it were the same >>> as the following: >>> mol2 = Chem.MolFromSmiles("FC=CF") >>> for bond in >>> mol2.GetBonds(): ... print(bond.GetStereo()) ... STEREONONE STEREONONE >>> STEREONONE Andrew >>> da...@da... >>> ------------------------------------------------------------------------------ >>> Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon >>> Phi processor-based developer platforms. With one year of Intel Parallel >>> Studio XE. Training and support from Colfax. Order your platform today. >>> http://sdm.link/xeonphi _______________________________________________ >>> Rdkit-discuss mailing list Rdk...@li... >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> ------------------------------------------------------------------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today.http://sdm.link/xeonphi > > _______________________________________________ > Rdkit-discuss mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > ------------------------------------------------------------ > ------------------ > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today.http://sdm.link/xeonphi > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > |