Re: [Rdkit-discuss] Ligand conversion problem from 2D to 3D
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: He, A. <he...@bu...> - 2024-02-06 20:05:16
|
Hi Greg,
Thanks so much for your kind suggestions. That also helped with our projects!
Emre, great to hear from you. I also just wanted to say that not all entries in ZINC can be transformed into 3D structures. We encountered a couple of instances where the annotated stereo is nonphysical, especially at closely-spaced stereo centers in small cycles. I had thought to design a check to capture these instances, but eventually I just gave up and discarded the entries because no other available methods (including ETKDG) or software can build 3D structures for 3D calculations. It would still be helpful to consider a check even for 2D calculations.
Best regards,
Amy
From: Greg Landrum <gre...@gm...>
Date: Tuesday, February 6, 2024 at 6:44 AM
To: Emre Apaydın <emr...@gm...>
Cc: He, Amy <he...@bu...>, rdk...@li... <rdk...@li...>
Subject: Re: [Rdkit-discuss] Ligand conversion problem from 2D to 3D
Hi Emre, Both of those compounds look like they have conflicting stereochemistry information in the ring systems, i. e. the stereo which is specified cannot actually exist. There's something else going on as well (that looks like a bug)
Hi Emre,
Both of those compounds look like they have conflicting stereochemistry information in the ring systems, i.e. the stereo which is specified cannot actually exist. There's something else going on as well (that looks like a bug) but this is already a big enough problem.
The easiest thing to do with these compounds (or things which have this kind of problem) is to just disable the stereochemistry entirely by removing all instances of the @ symbol from the input strings:
In [26]: ps = rdDistGeom.ETKDGv3()
In [27]: m = Chem.AddHs(Chem.MolFromSmiles('COc1cc(Br)c(C[N+]2(CCOCC[C@H]3CC[C@H]4C[C@H]3C4(C)C)CCO
...: CC2)cc1OC'.replace('@','')))
In [28]: rdDistGeom.EmbedMolecule(m,ps)
Out[28]: 0
In [29]: m = Chem.AddHs(Chem.MolFromSmiles('CC(C)(C)OC(=O)N[C@H](C[Si](C)(C)C)C(=O)N1CCC[C@H]1C(=O)
...: N[C@@H](CCCCN)B1O[C@@H]2C[C@H]3C[C@@H](C3(C)C)[C@]2(C)O1'.replace('@','')))
In [30]: rdDistGeom.EmbedMolecule(m,ps)
Out[30]: 0
-greg
On Tue, Feb 6, 2024 at 9:42 AM Emre Apaydın <emr...@gm...<mailto:emr...@gm...>> wrote:
Thank you so much for your help. I managed to convert most of the molecules from 2D to 3D, but no matter which ETKDG version, which embedding parameter I try, I cannot convert these two molecules; ZINC000101210593, ZINC000196058327. Is there an alternative feature or method I can try? I would be grateful if you could help me. Thank you!
He, Amy <he...@bu...<mailto:he...@bu...>>, 11 Oca 2024 Per, 01:58 tarihinde şunu yazdı:
Hi Emre!
You can get more detailed info on failed conformer generations through rdDistGeom.EmbedFailureCauses, see:
https://greglandrum.github.io/rdkit-blog/posts/2023-05-17-understanding-confgen-errors.html<https://urldefense.com/v3/__https:/greglandrum.github.io/rdkit-blog/posts/2023-05-17-understanding-confgen-errors.html__;!!KGKeukY!zC3dbTgBTpt6IY3K9mQhnSMqzjaX5IpXP3eSNuuiYvDoiIL42h3Xzv5cqkiPUC-E9jJku-W-Wyl54FstHYVR9uYkJGCMiB5D0Q$>
Bests,
--
Amy He
Hadad Lab @ OSU
He...@os...<mailto:He...@os...>
From: Emre Apaydın <emr...@gm...<mailto:emr...@gm...>>
Date: Wednesday, January 10, 2024 at 8:47 AM
To: rdk...@li...<mailto:rdk...@li...> <rdk...@li...<mailto:rdk...@li...>>
Subject: [Rdkit-discuss] Ligand conversion problem from 2D to 3D
Hello, I want to convert the 2D ligands I downloaded as sdf format from the ZINC library to 3D, but almost half of them are not converted to 3D. Some of them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 ligands are not converted
Hello,
I want to convert the 2D ligands I downloaded as sdf format from the ZINC library to 3D, but almost half of them are not converted to 3D. Some of them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 ligands are not converted to 3D in this way. When I run the script, I do not get any warning or error in IDE. When I look at the output of my Try, Except commands, I see "ZINC000008214373.sdf : rdDistGeom.EmbedMolecule(mol, etkdgv3) = Failed
ZINC000008214373.sdf : rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed" It outputs like this for ligands that are not translated to 3D. When I try different methods, the ligands are converted to 3D. I wonder if there is something missing or wrong with my script. I would be grateful if you can help me.
Thank you!
```
from rdkit import Chem
from rdkit.Chem import rdDistGeom
from rdkit.Chem import rdForceFieldHelpers
from rdkit.Chem import rdPartialCharges
import os
ligands_dir = "ligands"
output_dir = "new_ligands"
status_file = "process_status.txt"
if not os.path.exists(output_dir):
os.makedirs(output_dir)
sdf_files = [f for f in os.listdir(ligands_dir) if f.endswith(".sdf")]
with open(status_file, 'w') as status:
for sdf_file in sdf_files:
input_path = os.path.join(ligands_dir, sdf_file)
output_path = os.path.join(output_dir, sdf_file)
mol = Chem.MolFromMolFile(input_path)
# Add hydrogens
try:
mol = Chem.AddHs(mol, addCoords=True)
except:
status.write(f"{sdf_file} : Chem.AddHs(mol) = Failed\\n")
continue
# 3D embedding
etkdgv3 = rdDistGeom.ETKDGv3()
embed_status = rdDistGeom.EmbedMolecule(mol, etkdgv3)
if embed_status == -1:
status.write(f"{sdf_file} : rdDistGeom.EmbedMolecule(mol, etkdgv3) = Failed\\n")
# Compute Gasteiger charges
try:
rdPartialCharges.ComputeGasteigerCharges(mol)
except:
status.write(f"{sdf_file} : rdPartialCharges.ComputeGasteigerCharges(mol) = Failed\\n")
# UFF energy minimization
try:
rdForceFieldHelpers.UFFOptimizeMolecule(mol)
except:
status.write(f"{sdf_file} : rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed\\n")
Chem.MolToMolFile(mol, output_path)
```
_______________________________________________
Rdkit-discuss mailing list
Rdk...@li...<mailto:Rdk...@li...>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss<https://urldefense.com/v3/__https:/lists.sourceforge.net/lists/listinfo/rdkit-discuss__;!!KGKeukY!zC3dbTgBTpt6IY3K9mQhnSMqzjaX5IpXP3eSNuuiYvDoiIL42h3Xzv5cqkiPUC-E9jJku-W-Wyl54FstHYVR9uYkJGBfYAHZhw$>
|