Re: [Rdkit-discuss] protonating proper tertiary amines
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Bennion, B. <ben...@ll...> - 2016-08-31 23:01:56
|
While I figure out how to implement your test suggestion in the most realistic fashion, I just want to clarify that the test output in the previous email was from only one machine for testing purposes. On one compute node with one thread, two reagents are combined in a synthesis function and then the tertNitrogenProt function is called and the current molecule is passed through to be searched for the tertiary nitrogen and protonated if found. What I am not clear on, is whether the properties are passed properly from the synthesis function to the tertNitrogenProt function. Question, what are the outputs of UpdatePropertyCache()? When I test for output, only the word _none_ is printed. I don't know if that means there were no properties present, or they did not need to be updated. Thank you for your suggestions. Brian ________________________________ From: Greg Landrum [gre...@gm...] Sent: Wednesday, August 31, 2016 11:01 AM To: Bennion, Brian Cc: rdk...@li... Subject: Re: [Rdkit-discuss] protonating proper tertiary amines hmm, perplexing. How about we try something simple. Instead of doing real molecules that may be proprietary, how about constructing a simple input that has 10 copies of CCN(CC)CC and running that. Then you can safely send the output. It would also help if you could also just run the function on one machine (not using the PP stuff) to see if you can reproduce the problem there. -greg On Wed, Aug 31, 2016 at 6:24 AM, Bennion, Brian <ben...@ll...<redir.aspx?REF=zTUNLAk_ZUQ0kxyjBS8ZIc2YaFsqOcf1qOP4aqk6Wxh3yQvD8dHTCAFtYWlsdG86YmVubmlvbjFAbGxubC5nb3Y.>> wrote: Hello Greg, The source that I am use is shown below. Also, I need to clarify that all this code is wrapped around the ParallelPython job control code. It allows me to send each reaction to a separate cpu on my large clusters. I have been able to use your steps in your email to check my rdkit install from the python interpreter. Next I manually input my compound as a smiles string and performed your set of commands and things work as expected. However, when wrapped within the PP code the updatepropertycache has no effect. My only thought is that I have not properly passed the molecule between python modules (not sure if that makes any sense). This is the log output for one cycle of the code. The smiles string has been clipped to not reveal proprietary data. The important thing here is that the formal charge is correctly assigned but that the implicit hyrdogen atoms are not updated. LOG Tertiary nitrogen found in oxime: ((5, 6, 7, 8),) This is the symbol and charge for the tertiary nitrogen before: N 0 C(=O)N([H])C([H])([H])C([H])([H])C1(C([H])([H])N(C([H])([H])[H])C([H])([H]) This is the symbol and charge for the tertiary nitrogen after: N 1 test3-10: SANITIZE_NONE C14H27N3O3 C14H27N3O3+ C14H27N3O3+ 3 def tertNitrogenProt(molecule,molName1,w_sdf,w_smi): patt=rdkit.Chem.MolFromSmarts('[#6]-[#7]([#6])-[#6]') matches=molecule.GetSubstructMatches(patt) tertNHnum=0 if matches: print "Tertiary nitrogen found in: ", matches for i in matches: moleculeStrings=rdkit.Chem.MolToSmiles(molecule,isomericSmiles=True) atomSymbol9=molecule.GetAtomWithIdx(i[1]).GetSymbol() formalCharge9=molecule.GetAtomWithIdx(i[1]).GetFormalCharge() print "This is the symbol and charge for the tertiary nitrogen before: ",atomSymbol9,formalCharge9,moleculeStrings #set the formal charge on the protonated tertiary nitrogen to zero test7=rdkit.Chem.AllChem.CalcMolFormula(molecule) molecule.GetAtomWithIdx(i[1]).SetFormalCharge(1) atomSymbol9=molecule.GetAtomWithIdx(i[1]).GetSymbol() formalCharge9=molecule.GetAtomWithIdx(i[1]).GetFormalCharge() test8=rdkit.Chem.AllChem.CalcMolFormula(molecule) print "This is the symbol and charge for the tertiary nitrogen after: ",atomSymbol9,formalCharge9 #update property cache and check for nonsense molecule.UpdatePropertyCache() moleculeH=rdkit.Chem.AddHs(molecule) test3=rdkit.Chem.SanitizeMol(moleculeH) test9=rdkit.Chem.AllChem.CalcMolFormula(moleculeH) test10=moleculeH.GetAtomWithIdx(i[1]).GetDegree() print "test3-10: ",test3,test7,test8,test9,test10 #start generating 3 coordinates and optimize the conformation rdkit.Chem.AllChem.EmbedMolecule(moleculeH) rdkit.Chem.AllChem.UFFOptimizeMolecule(moleculeH,1500) molName6=molName1+'NH+_'+str(tertNHnum)+'_XOH' #find molecular formal charge moleculeCharge=rdkit.Chem.GetFormalCharge(moleculeH) moleculeH.SetProp('i_user_TOTAL_CHARGE',repr(moleculeCharge)) moleculeH.SetProp('_Name',molName6) w_sdf.write(moleculeH) w_smi.write(moleculeH) molName3=molName1+'NH+_'+str(tertNHnum)+'_XO' totalMolecules=oximeSubStructSearch(moleculeH,molName3,w_sdf,w_smi) tertNHnum += 1 else: print "No tertiary nitrogen matches" return(molecule,tertNHnum) return (moleculeH,tertNHnum) ###################################################################################################### ________________________________ From: Greg Landrum [gre...@gm...<redir.aspx?REF=GHtr0KGAy512vwdnDLK_sAgSzJMq0r5xu8HXtndTYJ13yQvD8dHTCAFtYWlsdG86Z3JlZy5sYW5kcnVtQGdtYWlsLmNvbQ..>] Sent: Monday, August 29, 2016 10:41 PM To: Bennion, Brian Cc: rdk...@li...<redir.aspx?REF=W25yGkhV3F9-2thj5wxSd90xE1T_AzUJwbA_HfZaE2l3yQvD8dHTCAFtYWlsdG86cmRraXQtZGlzY3Vzc0BsaXN0cy5zb3VyY2Vmb3JnZS5uZXQ.> Subject: Re: [Rdkit-discuss] protonating proper tertiary amines Hi Brian, On Tue, Aug 30, 2016 at 6:41 AM, Bennion, Brian <ben...@ll...<redir.aspx?REF=e7qYMdWwW7Ur20D7-6L43Hiay2uX06xTtxjmPGk15Qp3yQvD8dHTCAFodHRwOi8vVXJsQmxvY2tlZEVycm9yLmFzcHg.>> wrote: I have seemed to hit a wall with what seems like a simple task. First, I have ~9800 compounds that have a primary amine for a reaction that I am completing in rdkit. About 250 of those compounds have a tertiary alkylamine that is most likely protonated at pH 7.4. The dataset is a set of smiles strings for which the tertiary amine is not protonated. I thought this would be easy enough to fix, just use a smarts substructure search, set the formal charge on any hits to one and then AddHs, sanitize, embed, and then minimize. Well, what I get is [N+] with all the other carbons with explicit atoms in the resulting smiles files, and if output to sdf I get a positively charged diradical positioned at the tertiary nitrogen. Yes, what's happening here is that AddHs() is using the implicit valence on the N atoms to determine how many Hs to add. Since the implicit valence is not recomputed when you set the formal charge, you end up with the wrong number of Hs attached to the N. A call to UpdatePropertyCache() will fix this: n [16]: m = Chem.MolFromSmiles('CN') In [17]: AllChem.CalcMolFormula(m) Out[17]: 'CH5N' In [18]: m.GetAtomWithIdx(1).SetFormalCharge(1) In [19]: AllChem.CalcMolFormula(m) Out[19]: 'CH5N+' In [20]: m.UpdatePropertyCache() In [21]: AllChem.CalcMolFormula(m) Out[21]: 'CH6N+' In [22]: mh = Chem.AddHs(m) In [24]: mh.GetAtomWithIdx(1).GetDegree() Out[24]: 4 Thank you for such a great tool You're welcome! Thanks for saying thanks. :-) Hope this helps, -greg |