Re: [Rdkit-discuss] Output from UFFOptimizeMolecules(m)??

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Greg-

Your suggestion fixed my problems with the file I was working with.  I  
am a novice with both to RDKit and Python.

I have a new question for you:  I am using the model script you  
provided, but with a different set of compounds.

##### SCRIPT to embed and optimize  #########
import Chem
from Chem import AllChem
suppl = Chem.SDMolSupplier('usdrug.700.sdf')

mols = []
for i,mol in enumerate(suppl):
        num = mol.GetNumAtoms()
         print 'embed:',i
         AllChem.EmbedMolecule(mol)
        print 'optimize:',i
        AllChem.UFFOptimizeMolecule(mol,500)
         mols.append(mol)

w = Chem.SDWriter('output.sdf')
for mol in mols: w.write(mol)

##########################

###### OUTPUT error ########
embed: 529
optimize: 529
embed: 530
optimize: 530
embed: 531
optimize: 531
embed: 532
optimize: 532
Traceback (most recent call last):
   File "2-3D.py", line 12, in <module>
     AllChem.UFFOptimizeMolecule(mol,500)
ValueError: Bad Conformer Id

###############################

I think I narrowed the molecule down to either of these two (with the  
first looking more likely to be the problem):
http://www.emolecules.com/cgi-bin/more?vid=5852064
http://www.emolecules.com/cgi-bin/more?vid=1985607

AND THEN, I tried to stop using the UFFOptimizeMolecule function and  
use only EmbedMolecule and I received this error:

############
embed: 613
embed: 614
embed: 615
embed: 616
embed: 617
embed: 618
[20:03:38] Explicit valence for atom # 0 N greater than permitted
[20:03:38] Unexpected error hit on line 38532
[20:03:38] ERROR: moving to the begining of the next molecule
embed: 619
Traceback (most recent call last):
   File "2-3D.py", line 10, in <module>
     AllChem.EmbedMolecule(mol)
Boost.Python.ArgumentError: Python argument types in
     Chem.rdDistGeom.EmbedMolecule(NoneType)
did not match C++ signature:
     EmbedMolecule(RDKit::ROMol {lvalue} mol, unsigned int  
maxAttempts=0, int randomSeed=-1, bool clearConfs=True, bool  
useRandomCoords=False, double boxSizeMult=2.0, bool randNegEig=True,  
unsigned int numZeroFail=1, boost::python::dict {lvalue} coordMap={})

######################

I think this is the #618 and #619 compound(s) that caused the problem:
http://www.emolecules.com/cgi-bin/more?vid=5753883
http://www.emolecules.com/cgi-bin/more?vid=15920212

If there are valence problems, I can just record those errors and  
throw out those molecules.  But I do not understand why the  
FileParseException does not succeed in proceeding to the next  
molecule.  I am assuming it has to do with the way the for-loop is  
constructed, but I wasn't able to figure it out.

Thank you for your great support.  I really appreciate it.

-Marshall

On Mar 4, 2009, at 9:35 PM, Greg Landrum wrote:

> Dear Marshall,
>
> On Wed, Mar 4, 2009 at 7:01 PM, Marshall Levesque
> <mar...@em...> wrote:
>>
>> I have written a script to automate converting compounds from a 2D  
>> SD file
>> into a 3D, optimized SD file. With the compounds I'm testing with,  
>> I get a
>> hangup in python that has to be killed. I'm wondering if it has  
>> anything to
>> do with this bug you listed:
>>
>> - UFF optimization not terminating when atoms are on top of each  
>> other
>> (issue 2378119)
>>
>> Here is the code I'm trying to run:
>>
>> ############
>> import Chem
>> from Chem import AllChem
>> suppl = Chem.SDMolSupplier('original65cmpds.sdf')
>>
>> max = len(suppl)
>>
>> for i in range(0, max):
>>        AllChem.EmbedMolecule(suppl[i])
>>        AllChem.UFFOptimizeMolecule(suppl[i])
>>        print i
>>
>> w = Chem.SDWriter('output.sdf')
>> for mol in suppl: w.write(mol)
>> ##########
>
> There are two things going on here.
>
> The first is, indeed, a bug in the optimizer that causes it to get
> stuck in some circumstances. I will track this down.
>
> However, even if this bug were not present, the code above would not
> behave the way you expect. When you ask an SDMolSupplier for a
> particular object, it parses the corresponding part of the SD file and
> returns a *new* object, so the calls to EmbedMolecule(),
> UFFOptimizeMolecule(), and w.write() in your code above are all
> working on different objects. Here's a simple demonstration:
>
> [4]>>> suppl = Chem.SDMolSupplier('original65cmpds.sdf')
>
> [5]>>> ma = suppl[0]
>
> [6]>>> mb = suppl[0]
>
> [7]>>> ma is mb
> Out[7] False
>
> [8]>>> ma
> Out[8] <rdkit.Chem.rdchem.Mol object at 0xa72b02c>
>
> [9]>>> mb
> Out[9] <rdkit.Chem.rdchem.Mol object at 0xa72b09c>
>
> notice that ma and mb are pointing to different pieces of memory
> (that's what the "at 0x...." things indicate).
>
> Here's a version of your script that does what you want, and that  
> doesn't hang:
>
> ##########
> suppl = Chem.SDMolSupplier('original65cmpds.sdf')
>
> mols = []
> for i,mol in enumerate(suppl):
>       print 'embed: ',i
>       AllChem.EmbedMolecule(mol)
>       print 'optimize: ',i
>       AllChem.UFFOptimizeMolecule(mol)
>       mols.append(mol)
>
> w = Chem.SDWriter('output.sdf')
> for mol in mols: w.write(mol)
> ##########
>
> Regards,
> -greg

Re: [Rdkit-discuss] Output from UFFOptimizeMolecules(m)??

Open-Source Cheminformatics and Machine Learning

Re: [Rdkit-discuss] Output from UFFOptimizeMolecules(m)??