Re: [Rdkit-discuss] Output from UFFOptimizeMolecules(m)??
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Marshall L. <mar...@em...> - 2009-03-07 04:26:53
|
Greg- Your suggestion fixed my problems with the file I was working with. I am a novice with both to RDKit and Python. I have a new question for you: I am using the model script you provided, but with a different set of compounds. ##### SCRIPT to embed and optimize ######### import Chem from Chem import AllChem suppl = Chem.SDMolSupplier('usdrug.700.sdf') mols = [] for i,mol in enumerate(suppl): num = mol.GetNumAtoms() print 'embed:',i AllChem.EmbedMolecule(mol) print 'optimize:',i AllChem.UFFOptimizeMolecule(mol,500) mols.append(mol) w = Chem.SDWriter('output.sdf') for mol in mols: w.write(mol) ########################## ###### OUTPUT error ######## embed: 529 optimize: 529 embed: 530 optimize: 530 embed: 531 optimize: 531 embed: 532 optimize: 532 Traceback (most recent call last): File "2-3D.py", line 12, in <module> AllChem.UFFOptimizeMolecule(mol,500) ValueError: Bad Conformer Id ############################### I think I narrowed the molecule down to either of these two (with the first looking more likely to be the problem): http://www.emolecules.com/cgi-bin/more?vid=5852064 http://www.emolecules.com/cgi-bin/more?vid=1985607 AND THEN, I tried to stop using the UFFOptimizeMolecule function and use only EmbedMolecule and I received this error: ############ embed: 613 embed: 614 embed: 615 embed: 616 embed: 617 embed: 618 [20:03:38] Explicit valence for atom # 0 N greater than permitted [20:03:38] Unexpected error hit on line 38532 [20:03:38] ERROR: moving to the begining of the next molecule embed: 619 Traceback (most recent call last): File "2-3D.py", line 10, in <module> AllChem.EmbedMolecule(mol) Boost.Python.ArgumentError: Python argument types in Chem.rdDistGeom.EmbedMolecule(NoneType) did not match C++ signature: EmbedMolecule(RDKit::ROMol {lvalue} mol, unsigned int maxAttempts=0, int randomSeed=-1, bool clearConfs=True, bool useRandomCoords=False, double boxSizeMult=2.0, bool randNegEig=True, unsigned int numZeroFail=1, boost::python::dict {lvalue} coordMap={}) ###################### I think this is the #618 and #619 compound(s) that caused the problem: http://www.emolecules.com/cgi-bin/more?vid=5753883 http://www.emolecules.com/cgi-bin/more?vid=15920212 If there are valence problems, I can just record those errors and throw out those molecules. But I do not understand why the FileParseException does not succeed in proceeding to the next molecule. I am assuming it has to do with the way the for-loop is constructed, but I wasn't able to figure it out. Thank you for your great support. I really appreciate it. -Marshall On Mar 4, 2009, at 9:35 PM, Greg Landrum wrote: > Dear Marshall, > > On Wed, Mar 4, 2009 at 7:01 PM, Marshall Levesque > <mar...@em...> wrote: >> >> I have written a script to automate converting compounds from a 2D >> SD file >> into a 3D, optimized SD file. With the compounds I'm testing with, >> I get a >> hangup in python that has to be killed. I'm wondering if it has >> anything to >> do with this bug you listed: >> >> - UFF optimization not terminating when atoms are on top of each >> other >> (issue 2378119) >> >> Here is the code I'm trying to run: >> >> ############ >> import Chem >> from Chem import AllChem >> suppl = Chem.SDMolSupplier('original65cmpds.sdf') >> >> max = len(suppl) >> >> for i in range(0, max): >> AllChem.EmbedMolecule(suppl[i]) >> AllChem.UFFOptimizeMolecule(suppl[i]) >> print i >> >> w = Chem.SDWriter('output.sdf') >> for mol in suppl: w.write(mol) >> ########## > > There are two things going on here. > > The first is, indeed, a bug in the optimizer that causes it to get > stuck in some circumstances. I will track this down. > > However, even if this bug were not present, the code above would not > behave the way you expect. When you ask an SDMolSupplier for a > particular object, it parses the corresponding part of the SD file and > returns a *new* object, so the calls to EmbedMolecule(), > UFFOptimizeMolecule(), and w.write() in your code above are all > working on different objects. Here's a simple demonstration: > > [4]>>> suppl = Chem.SDMolSupplier('original65cmpds.sdf') > > [5]>>> ma = suppl[0] > > [6]>>> mb = suppl[0] > > [7]>>> ma is mb > Out[7] False > > [8]>>> ma > Out[8] <rdkit.Chem.rdchem.Mol object at 0xa72b02c> > > [9]>>> mb > Out[9] <rdkit.Chem.rdchem.Mol object at 0xa72b09c> > > notice that ma and mb are pointing to different pieces of memory > (that's what the "at 0x...." things indicate). > > Here's a version of your script that does what you want, and that > doesn't hang: > > ########## > suppl = Chem.SDMolSupplier('original65cmpds.sdf') > > mols = [] > for i,mol in enumerate(suppl): > print 'embed: ',i > AllChem.EmbedMolecule(mol) > print 'optimize: ',i > AllChem.UFFOptimizeMolecule(mol) > mols.append(mol) > > w = Chem.SDWriter('output.sdf') > for mol in mols: w.write(mol) > ########## > > Regards, > -greg |