rdkit-discuss Mailing List for RDKit (Page 3)
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
(27) |
Oct
(4) |
Nov
(20) |
Dec
(4) |
2008 |
Jan
(12) |
Feb
(2) |
Mar
(23) |
Apr
(40) |
May
(30) |
Jun
(6) |
Jul
(35) |
Aug
(60) |
Sep
(31) |
Oct
(33) |
Nov
(35) |
Dec
(3) |
2009 |
Jan
(16) |
Feb
(77) |
Mar
(88) |
Apr
(57) |
May
(33) |
Jun
(27) |
Jul
(55) |
Aug
(26) |
Sep
(12) |
Oct
(45) |
Nov
(42) |
Dec
(23) |
2010 |
Jan
(64) |
Feb
(17) |
Mar
(30) |
Apr
(55) |
May
(30) |
Jun
(65) |
Jul
(112) |
Aug
(26) |
Sep
(67) |
Oct
(20) |
Nov
(67) |
Dec
(23) |
2011 |
Jan
(57) |
Feb
(43) |
Mar
(50) |
Apr
(66) |
May
(95) |
Jun
(73) |
Jul
(64) |
Aug
(47) |
Sep
(22) |
Oct
(56) |
Nov
(51) |
Dec
(34) |
2012 |
Jan
(64) |
Feb
(45) |
Mar
(65) |
Apr
(85) |
May
(76) |
Jun
(47) |
Jul
(75) |
Aug
(72) |
Sep
(31) |
Oct
(77) |
Nov
(61) |
Dec
(41) |
2013 |
Jan
(68) |
Feb
(63) |
Mar
(36) |
Apr
(73) |
May
(61) |
Jun
(69) |
Jul
(98) |
Aug
(60) |
Sep
(74) |
Oct
(102) |
Nov
(92) |
Dec
(63) |
2014 |
Jan
(112) |
Feb
(84) |
Mar
(72) |
Apr
(59) |
May
(96) |
Jun
(54) |
Jul
(91) |
Aug
(54) |
Sep
(38) |
Oct
(47) |
Nov
(33) |
Dec
(39) |
2015 |
Jan
(41) |
Feb
(115) |
Mar
(66) |
Apr
(87) |
May
(63) |
Jun
(53) |
Jul
(61) |
Aug
(59) |
Sep
(115) |
Oct
(42) |
Nov
(60) |
Dec
(20) |
2016 |
Jan
(52) |
Feb
(72) |
Mar
(100) |
Apr
(125) |
May
(61) |
Jun
(106) |
Jul
(62) |
Aug
(74) |
Sep
(151) |
Oct
(151) |
Nov
(117) |
Dec
(148) |
2017 |
Jan
(106) |
Feb
(75) |
Mar
(106) |
Apr
(67) |
May
(85) |
Jun
(144) |
Jul
(53) |
Aug
(73) |
Sep
(188) |
Oct
(106) |
Nov
(118) |
Dec
(74) |
2018 |
Jan
(96) |
Feb
(43) |
Mar
(40) |
Apr
(111) |
May
(77) |
Jun
(112) |
Jul
(64) |
Aug
(85) |
Sep
(73) |
Oct
(117) |
Nov
(97) |
Dec
(47) |
2019 |
Jan
(63) |
Feb
(112) |
Mar
(109) |
Apr
(61) |
May
(51) |
Jun
(41) |
Jul
(57) |
Aug
(68) |
Sep
(47) |
Oct
(126) |
Nov
(117) |
Dec
(96) |
2020 |
Jan
(84) |
Feb
(82) |
Mar
(80) |
Apr
(100) |
May
(78) |
Jun
(68) |
Jul
(76) |
Aug
(69) |
Sep
(76) |
Oct
(73) |
Nov
(69) |
Dec
(42) |
2021 |
Jan
(44) |
Feb
(30) |
Mar
(85) |
Apr
(65) |
May
(41) |
Jun
(72) |
Jul
(55) |
Aug
(9) |
Sep
(44) |
Oct
(44) |
Nov
(30) |
Dec
(40) |
2022 |
Jan
(35) |
Feb
(29) |
Mar
(55) |
Apr
(30) |
May
(31) |
Jun
(27) |
Jul
(49) |
Aug
(15) |
Sep
(17) |
Oct
(25) |
Nov
(15) |
Dec
(40) |
2023 |
Jan
(32) |
Feb
(10) |
Mar
(10) |
Apr
(21) |
May
(33) |
Jun
(31) |
Jul
(12) |
Aug
(17) |
Sep
(14) |
Oct
(12) |
Nov
(8) |
Dec
(12) |
2024 |
Jan
(10) |
Feb
(18) |
Mar
(7) |
Apr
(4) |
May
(6) |
Jun
(4) |
Jul
(5) |
Aug
(6) |
Sep
(8) |
Oct
(1) |
Nov
(1) |
Dec
|
2025 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Eduardo M. <edu...@gm...> - 2024-02-18 12:11:06
|
Hello, Are you passionate about computational chemistry, eager to explore chemical space, thrilled by deep learning, and fascinated by aromatic molecules? If so, we have the perfect opportunity for you! The Poranne Research Group, at the Technion - Israel Institute of Technology, is currently recruiting graduate students to join our dynamic team. As part of our team, you'll delve into cutting-edge research, collaborate with experts in the field, and contribute to groundbreaking discoveries in the interfaces of physical organic chemistry, organic electronics and machine learning. Don't miss out on this opportunity to be part of our vibrant research group. Visit our group website <https://poranne-group.github.io/> to learn more about our research and how you can apply. Best, Eduardo |
From: Santiago F. <san...@me...> - 2024-02-14 14:11:35
|
Good morning I am trying to convert some reactions in SMILES format to RXN and MRV format, but I am not able to obtain a good representation. The reaction components are really messed up in the final output. For example, using this reaction SMILES, these are the steps I am following: string reaction = "C.CO>cno.NN>coc.[H][H]"; RDKit::ChemicalReaction* result = RDKit::RxnSmartsToChemicalReaction(reaction, nullptr, true, true); RDDepict::compute2DCoordsForReaction(*result); string mrvReaction = RDKit::ChemicalReactionToMrvBlock(*result, false); or string rxnReaction = RDKit::ChemicalReactionToRxnBlock(*result, true, true); I am attaching both results. Best regards Santiago [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg] [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com> SANTIAGO FRAGA Software Developer san...@me...<mailto:%20s...@me...> MESTRELAB RESEARCH S.L. PHONE +34881976775 FAX +34981941079 Feliciano Barrera, 9B-Bajo 15706 Santiago de Compostela (SPAIN) Follow us: [Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/> |
From: Greg L. <gre...@gm...> - 2024-02-07 05:43:06
|
Hi Amy, On Tue, Feb 6, 2024 at 8:20 PM He, Amy <he...@bu...> wrote: > > > Emre, great to hear from you. I also just wanted to say that not all > entries in ZINC can be transformed into 3D structures. We encountered a > couple of instances where the annotated stereo is nonphysical, especially > at closely-spaced stereo centers in small cycles. I had thought to design a > check to capture these instances, but eventually I just gave up and > discarded the entries because no other available methods (including ETKDG) > or software can build 3D structures for 3D calculations. It would still be > helpful to consider a check even for 2D calculations. > > Yeah, this is a difficult one. It would be nice to have a check to catch at least the "simple" cases like these cage structures where there's only one possible relative stereo possible, but I haven't managed to find it yet. -greg |
From: He, A. <he...@bu...> - 2024-02-06 20:05:16
|
Hi Greg, Thanks so much for your kind suggestions. That also helped with our projects! Emre, great to hear from you. I also just wanted to say that not all entries in ZINC can be transformed into 3D structures. We encountered a couple of instances where the annotated stereo is nonphysical, especially at closely-spaced stereo centers in small cycles. I had thought to design a check to capture these instances, but eventually I just gave up and discarded the entries because no other available methods (including ETKDG) or software can build 3D structures for 3D calculations. It would still be helpful to consider a check even for 2D calculations. Best regards, Amy From: Greg Landrum <gre...@gm...> Date: Tuesday, February 6, 2024 at 6:44 AM To: Emre Apaydın <emr...@gm...> Cc: He, Amy <he...@bu...>, rdk...@li... <rdk...@li...> Subject: Re: [Rdkit-discuss] Ligand conversion problem from 2D to 3D Hi Emre, Both of those compounds look like they have conflicting stereochemistry information in the ring systems, i. e. the stereo which is specified cannot actually exist. There's something else going on as well (that looks like a bug) Hi Emre, Both of those compounds look like they have conflicting stereochemistry information in the ring systems, i.e. the stereo which is specified cannot actually exist. There's something else going on as well (that looks like a bug) but this is already a big enough problem. The easiest thing to do with these compounds (or things which have this kind of problem) is to just disable the stereochemistry entirely by removing all instances of the @ symbol from the input strings: In [26]: ps = rdDistGeom.ETKDGv3() In [27]: m = Chem.AddHs(Chem.MolFromSmiles('COc1cc(Br)c(C[N+]2(CCOCC[C@H]3CC[C@H]4C[C@H]3C4(C)C)CCO ...: CC2)cc1OC'.replace('@',''))) In [28]: rdDistGeom.EmbedMolecule(m,ps) Out[28]: 0 In [29]: m = Chem.AddHs(Chem.MolFromSmiles('CC(C)(C)OC(=O)N[C@H](C[Si](C)(C)C)C(=O)N1CCC[C@H]1C(=O) ...: N[C@@H](CCCCN)B1O[C@@H]2C[C@H]3C[C@@H](C3(C)C)[C@]2(C)O1'.replace('@',''))) In [30]: rdDistGeom.EmbedMolecule(m,ps) Out[30]: 0 -greg On Tue, Feb 6, 2024 at 9:42 AM Emre Apaydın <emr...@gm...<mailto:emr...@gm...>> wrote: Thank you so much for your help. I managed to convert most of the molecules from 2D to 3D, but no matter which ETKDG version, which embedding parameter I try, I cannot convert these two molecules; ZINC000101210593, ZINC000196058327. Is there an alternative feature or method I can try? I would be grateful if you could help me. Thank you! He, Amy <he...@bu...<mailto:he...@bu...>>, 11 Oca 2024 Per, 01:58 tarihinde şunu yazdı: Hi Emre! You can get more detailed info on failed conformer generations through rdDistGeom.EmbedFailureCauses, see: https://greglandrum.github.io/rdkit-blog/posts/2023-05-17-understanding-confgen-errors.html<https://urldefense.com/v3/__https:/greglandrum.github.io/rdkit-blog/posts/2023-05-17-understanding-confgen-errors.html__;!!KGKeukY!zC3dbTgBTpt6IY3K9mQhnSMqzjaX5IpXP3eSNuuiYvDoiIL42h3Xzv5cqkiPUC-E9jJku-W-Wyl54FstHYVR9uYkJGCMiB5D0Q$> Bests, -- Amy He Hadad Lab @ OSU He...@os...<mailto:He...@os...> From: Emre Apaydın <emr...@gm...<mailto:emr...@gm...>> Date: Wednesday, January 10, 2024 at 8:47 AM To: rdk...@li...<mailto:rdk...@li...> <rdk...@li...<mailto:rdk...@li...>> Subject: [Rdkit-discuss] Ligand conversion problem from 2D to 3D Hello, I want to convert the 2D ligands I downloaded as sdf format from the ZINC library to 3D, but almost half of them are not converted to 3D. Some of them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 ligands are not converted Hello, I want to convert the 2D ligands I downloaded as sdf format from the ZINC library to 3D, but almost half of them are not converted to 3D. Some of them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 ligands are not converted to 3D in this way. When I run the script, I do not get any warning or error in IDE. When I look at the output of my Try, Except commands, I see "ZINC000008214373.sdf : rdDistGeom.EmbedMolecule(mol, etkdgv3) = Failed ZINC000008214373.sdf : rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed" It outputs like this for ligands that are not translated to 3D. When I try different methods, the ligands are converted to 3D. I wonder if there is something missing or wrong with my script. I would be grateful if you can help me. Thank you! ``` from rdkit import Chem from rdkit.Chem import rdDistGeom from rdkit.Chem import rdForceFieldHelpers from rdkit.Chem import rdPartialCharges import os ligands_dir = "ligands" output_dir = "new_ligands" status_file = "process_status.txt" if not os.path.exists(output_dir): os.makedirs(output_dir) sdf_files = [f for f in os.listdir(ligands_dir) if f.endswith(".sdf")] with open(status_file, 'w') as status: for sdf_file in sdf_files: input_path = os.path.join(ligands_dir, sdf_file) output_path = os.path.join(output_dir, sdf_file) mol = Chem.MolFromMolFile(input_path) # Add hydrogens try: mol = Chem.AddHs(mol, addCoords=True) except: status.write(f"{sdf_file} : Chem.AddHs(mol) = Failed\\n") continue # 3D embedding etkdgv3 = rdDistGeom.ETKDGv3() embed_status = rdDistGeom.EmbedMolecule(mol, etkdgv3) if embed_status == -1: status.write(f"{sdf_file} : rdDistGeom.EmbedMolecule(mol, etkdgv3) = Failed\\n") # Compute Gasteiger charges try: rdPartialCharges.ComputeGasteigerCharges(mol) except: status.write(f"{sdf_file} : rdPartialCharges.ComputeGasteigerCharges(mol) = Failed\\n") # UFF energy minimization try: rdForceFieldHelpers.UFFOptimizeMolecule(mol) except: status.write(f"{sdf_file} : rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed\\n") Chem.MolToMolFile(mol, output_path) ``` _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss<https://urldefense.com/v3/__https:/lists.sourceforge.net/lists/listinfo/rdkit-discuss__;!!KGKeukY!zC3dbTgBTpt6IY3K9mQhnSMqzjaX5IpXP3eSNuuiYvDoiIL42h3Xzv5cqkiPUC-E9jJku-W-Wyl54FstHYVR9uYkJGBfYAHZhw$> |
From: Greg L. <gre...@gm...> - 2024-02-06 11:44:40
|
Hi Emre, Both of those compounds look like they have conflicting stereochemistry information in the ring systems, i.e. the stereo which is specified cannot actually exist. There's something else going on as well (that looks like a bug) but this is already a big enough problem. The easiest thing to do with these compounds (or things which have this kind of problem) is to just disable the stereochemistry entirely by removing all instances of the @ symbol from the input strings: In [26]: ps = rdDistGeom.ETKDGv3() In [27]: m = Chem.AddHs(Chem.MolFromSmiles('COc1cc(Br)c(C[N+]2(CCOCC[C@H ]3CC[C@H]4C[C@H]3C4(C)C)CCO ...: CC2)cc1OC'.replace('@',''))) In [28]: rdDistGeom.EmbedMolecule(m,ps) Out[28]: 0 In [29]: m = Chem.AddHs(Chem.MolFromSmiles('CC(C)(C)OC(=O)N[C@H ](C[Si](C)(C)C)C(=O)N1CCC[C@H]1C(=O) ...: N[C@@H](CCCCN)B1O[C@@H]2C[C@H]3C[C@@H](C3(C)C)[C@ ]2(C)O1'.replace('@',''))) In [30]: rdDistGeom.EmbedMolecule(m,ps) Out[30]: 0 -greg On Tue, Feb 6, 2024 at 9:42 AM Emre Apaydın <emr...@gm...> wrote: > Thank you so much for your help. I managed to convert most of the > molecules from 2D to 3D, but no matter which ETKDG version, which embedding > parameter I try, I cannot convert these two molecules; ZINC000101210593, > ZINC000196058327. Is there an alternative feature or method I can try? I > would be grateful if you could help me. Thank you! > > He, Amy <he...@bu...>, 11 Oca 2024 Per, 01:58 tarihinde > şunu yazdı: > >> Hi Emre! >> >> >> >> You can get more detailed info on failed conformer generations through >> *rdDistGeom.EmbedFailureCauses*, see: >> >> >> https://greglandrum.github.io/rdkit-blog/posts/2023-05-17-understanding-confgen-errors.html >> >> >> >> Bests, >> >> >> >> >> >> -- >> >> Amy He >> >> Hadad Lab @ OSU >> >> He...@os... >> >> >> >> *From: *Emre Apaydın <emr...@gm...> >> *Date: *Wednesday, January 10, 2024 at 8:47 AM >> *To: *rdk...@li... < >> rdk...@li...> >> *Subject: *[Rdkit-discuss] Ligand conversion problem from 2D to 3D >> >> Hello, I want to convert the 2D ligands I downloaded as sdf format from >> the ZINC library to 3D, but almost half of them are not converted to 3D. >> Some of them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 >> ligands are not converted >> >> Hello, >> >> >> >> I want to convert the 2D ligands I downloaded as sdf format from the ZINC >> library to 3D, but almost half of them are not converted to 3D. Some of >> them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 ligands >> are not converted to 3D in this way. When I run the script, I do not get >> any warning or error in IDE. When I look at the output of my Try, Except >> commands, I see "ZINC000008214373.sdf : rdDistGeom.EmbedMolecule(mol, >> etkdgv3) = Failed >> ZINC000008214373.sdf : rdForceFieldHelpers.UFFOptimizeMolecule(mol) = >> Failed" It outputs like this for ligands that are not translated to 3D. >> When I try different methods, the ligands are converted to 3D. I wonder if >> there is something missing or wrong with my script. I would be grateful if >> you can help me. >> >> >> >> Thank you! >> >> >> ``` >> from rdkit import Chem >> from rdkit.Chem import rdDistGeom >> from rdkit.Chem import rdForceFieldHelpers >> from rdkit.Chem import rdPartialCharges >> import os >> >> ligands_dir = "ligands" >> output_dir = "new_ligands" >> status_file = "process_status.txt" >> >> if not os.path.exists(output_dir): >> os.makedirs(output_dir) >> >> sdf_files = [f for f in os.listdir(ligands_dir) if f.endswith(".sdf")] >> >> with open(status_file, 'w') as status: >> for sdf_file in sdf_files: >> input_path = os.path.join(ligands_dir, sdf_file) >> output_path = os.path.join(output_dir, sdf_file) >> mol = Chem.MolFromMolFile(input_path) >> >> # Add hydrogens >> try: >> mol = Chem.AddHs(mol, addCoords=True) >> except: >> status.write(f"{sdf_file} : Chem.AddHs(mol) = Failed\\n") >> continue >> >> # 3D embedding >> etkdgv3 = rdDistGeom.ETKDGv3() >> embed_status = rdDistGeom.EmbedMolecule(mol, etkdgv3) >> if embed_status == -1: >> status.write(f"{sdf_file} : rdDistGeom.EmbedMolecule(mol, >> etkdgv3) = Failed\\n") >> >> # Compute Gasteiger charges >> try: >> rdPartialCharges.ComputeGasteigerCharges(mol) >> except: >> status.write(f"{sdf_file} : >> rdPartialCharges.ComputeGasteigerCharges(mol) = Failed\\n") >> >> # UFF energy minimization >> try: >> rdForceFieldHelpers.UFFOptimizeMolecule(mol) >> except: >> status.write(f"{sdf_file} : >> rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed\\n") >> >> Chem.MolToMolFile(mol, output_path) >> ``` >> > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Emre A. <emr...@gm...> - 2024-02-06 08:40:36
|
Thank you so much for your help. I managed to convert most of the molecules from 2D to 3D, but no matter which ETKDG version, which embedding parameter I try, I cannot convert these two molecules; ZINC000101210593, ZINC000196058327. Is there an alternative feature or method I can try? I would be grateful if you could help me. Thank you! He, Amy <he...@bu...>, 11 Oca 2024 Per, 01:58 tarihinde şunu yazdı: > Hi Emre! > > > > You can get more detailed info on failed conformer generations through > *rdDistGeom.EmbedFailureCauses*, see: > > > https://greglandrum.github.io/rdkit-blog/posts/2023-05-17-understanding-confgen-errors.html > > > > Bests, > > > > > > -- > > Amy He > > Hadad Lab @ OSU > > He...@os... > > > > *From: *Emre Apaydın <emr...@gm...> > *Date: *Wednesday, January 10, 2024 at 8:47 AM > *To: *rdk...@li... < > rdk...@li...> > *Subject: *[Rdkit-discuss] Ligand conversion problem from 2D to 3D > > Hello, I want to convert the 2D ligands I downloaded as sdf format from > the ZINC library to 3D, but almost half of them are not converted to 3D. > Some of them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 > ligands are not converted > > Hello, > > > > I want to convert the 2D ligands I downloaded as sdf format from the ZINC > library to 3D, but almost half of them are not converted to 3D. Some of > them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 ligands > are not converted to 3D in this way. When I run the script, I do not get > any warning or error in IDE. When I look at the output of my Try, Except > commands, I see "ZINC000008214373.sdf : rdDistGeom.EmbedMolecule(mol, > etkdgv3) = Failed > ZINC000008214373.sdf : rdForceFieldHelpers.UFFOptimizeMolecule(mol) = > Failed" It outputs like this for ligands that are not translated to 3D. > When I try different methods, the ligands are converted to 3D. I wonder if > there is something missing or wrong with my script. I would be grateful if > you can help me. > > > > Thank you! > > > ``` > from rdkit import Chem > from rdkit.Chem import rdDistGeom > from rdkit.Chem import rdForceFieldHelpers > from rdkit.Chem import rdPartialCharges > import os > > ligands_dir = "ligands" > output_dir = "new_ligands" > status_file = "process_status.txt" > > if not os.path.exists(output_dir): > os.makedirs(output_dir) > > sdf_files = [f for f in os.listdir(ligands_dir) if f.endswith(".sdf")] > > with open(status_file, 'w') as status: > for sdf_file in sdf_files: > input_path = os.path.join(ligands_dir, sdf_file) > output_path = os.path.join(output_dir, sdf_file) > mol = Chem.MolFromMolFile(input_path) > > # Add hydrogens > try: > mol = Chem.AddHs(mol, addCoords=True) > except: > status.write(f"{sdf_file} : Chem.AddHs(mol) = Failed\\n") > continue > > # 3D embedding > etkdgv3 = rdDistGeom.ETKDGv3() > embed_status = rdDistGeom.EmbedMolecule(mol, etkdgv3) > if embed_status == -1: > status.write(f"{sdf_file} : rdDistGeom.EmbedMolecule(mol, > etkdgv3) = Failed\\n") > > # Compute Gasteiger charges > try: > rdPartialCharges.ComputeGasteigerCharges(mol) > except: > status.write(f"{sdf_file} : > rdPartialCharges.ComputeGasteigerCharges(mol) = Failed\\n") > > # UFF energy minimization > try: > rdForceFieldHelpers.UFFOptimizeMolecule(mol) > except: > status.write(f"{sdf_file} : > rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed\\n") > > Chem.MolToMolFile(mol, output_path) > ``` > |
From: Lewis M. <lew...@gm...> - 2024-02-05 22:20:04
|
Good catch, thank you Diogo! Recognising the difficulties of tautomer enumeration: For my own purposes, the ideal behaviour would be to get the set of all three plausible tautomers of 'mol1' no matter what the input SMILES. Looks like there's already a Github Issue up (https://github.com/rdkit/rdkit/issues/5937) but I can add this if it has a different cause. thanks all Lewis On Tue, Feb 6, 2024 at 7:23 AM Diogo Martins <dio...@gm...> wrote: > Hello, > > I think it's a bug because the tautomers depend on how the input SMILES is > written. Both represent mol1: > > Sc1ncc2c(c1)cccc2 > Sc1cc2ccccc2cn1 > > However the resulting tautomers differ depending on which is used as input. > > Best regards, > Diogo > > On Mon, 5 Feb 2024 at 11:38, Lewis Martin <lew...@gm...> > wrote: > >> Thank you very much for the detective work, Wim! This is helpful. >> >> It looks like the _reverse_ transition is possible, though. If I start by >> generating tautomers of "mol2", then "mol1" is recovered, which indicates >> this is an allowed transform. Is it possible that one direction is allowed >> but not the reverse? >> >> Failing a solution there, does anyone know if it is possible to add >> SMIRKS to the allowed tautomers through the python interface? >> Thanks, >> Lewis >> >> On Mon, Feb 5, 2024 at 9:52 PM Wim Dehaen <wim...@gm...> wrote: >> >>> hi lewis, >>> if i am not mistaken this is because the tautomer transfor "1,3 aromatic >>> heteroatom H shift" does not account for other chalcogens than oxygen, so >>> no selenium, tellurium or sulfur. >>> you can find the list of transforms here: >>> https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46 >>> (poiting to the line with the relevant transform). >>> best wishes >>> wim >>> >>> On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin <lew...@gm...> >>> wrote: >>> >>>> Hi all, >>>> I'm looking at scoring tautomers, and using the 'tautobase' dataset >>>> used by Weider et al* at: >>>> >>>> https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt >>>> >>>> This dataset has pairs of tautomers with experimental logK values to >>>> determine the preferred tautomer. >>>> >>>> In at least one case, depending on which tautomer you use as the >>>> 'entry' point, the enumerated tautomers by RDKit either do or don't include >>>> both of the pair of input molecules. *I'm hoping there's a way to >>>> uniquely recover the full set of possible tautomers from using any input >>>> tautomer. * >>>> >>>> Here's a code example: >>>> >>>> from rdkit import Chem >>>>> >>>> from rdkit.Chem import Draw >>>> >>>> from rdkit.Chem.Draw import IPythonConsole >>>>> IPythonConsole.drawOptions.addStereoAnnotation = True >>>>> from rdkit.Chem.MolStandardize import rdMolStandardize >>>>> >>>>> #same result if you don't do any of these params. >>>> >>>> tautomer_params = >>>>> Chem.MolStandardize.rdMolStandardize.CleanupParameters() >>>>> tautomer_params.tautomerRemoveSp3Stereo = False >>>>> tautomer_params.tautomerRemoveBondStereo = False >>>>> tautomer_params.tautomerRemoveIsotopicHs = False >>>>> tautomer_params.tautomerReassignStereo = False >>>>> tautomer_params.doCanonical = True >>>>> >>>>> enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params) >>>>> >>>>> smi1 = 'Sc1cc2ccccc2cn1' >>>>> smi2 = 'S=c1cc2ccccc2c[nH]1' >>>>> mol1 = Chem.MolFromSmiles(smi1) >>>>> mol2 = Chem.MolFromSmiles(smi2) >>>>> >>>>> #choose mol1 or mol2 to be source of tautomers: >>>>> #choose mol1, and look at the tautomers. Note that mol2 isn't present! >>>>> tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in >>>>> enumerator.Enumerate(mol1)] >>>>> >>>>> Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not >>>>> present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))], >>>>> molsPerRow=4) >>>>> >>>> >>>> And a picture of this in a notebook for an at-a-glance view: >>>> https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03 >>>> >>>> Does anyone know a way to recover "mol2" within tautomers of "mol1"? >>>> >>>> Thank you! >>>> Lewis >>>> >>>> >>>> _______________________________________________ >>>> Rdkit-discuss mailing list >>>> Rdk...@li... >>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>> >>> _______________________________________________ >> Rdkit-discuss mailing list >> Rdk...@li... >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > |
From: Diogo M. <dio...@gm...> - 2024-02-05 20:23:24
|
Hello, I think it's a bug because the tautomers depend on how the input SMILES is written. Both represent mol1: Sc1ncc2c(c1)cccc2 Sc1cc2ccccc2cn1 However the resulting tautomers differ depending on which is used as input. Best regards, Diogo On Mon, 5 Feb 2024 at 11:38, Lewis Martin <lew...@gm...> wrote: > Thank you very much for the detective work, Wim! This is helpful. > > It looks like the _reverse_ transition is possible, though. If I start by > generating tautomers of "mol2", then "mol1" is recovered, which indicates > this is an allowed transform. Is it possible that one direction is allowed > but not the reverse? > > Failing a solution there, does anyone know if it is possible to add SMIRKS > to the allowed tautomers through the python interface? > Thanks, > Lewis > > On Mon, Feb 5, 2024 at 9:52 PM Wim Dehaen <wim...@gm...> wrote: > >> hi lewis, >> if i am not mistaken this is because the tautomer transfor "1,3 aromatic >> heteroatom H shift" does not account for other chalcogens than oxygen, so >> no selenium, tellurium or sulfur. >> you can find the list of transforms here: >> https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46 >> (poiting to the line with the relevant transform). >> best wishes >> wim >> >> On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin <lew...@gm...> >> wrote: >> >>> Hi all, >>> I'm looking at scoring tautomers, and using the 'tautobase' dataset used >>> by Weider et al* at: >>> >>> https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt >>> >>> This dataset has pairs of tautomers with experimental logK values to >>> determine the preferred tautomer. >>> >>> In at least one case, depending on which tautomer you use as the 'entry' >>> point, the enumerated tautomers by RDKit either do or don't include both of >>> the pair of input molecules. *I'm hoping there's a way to uniquely >>> recover the full set of possible tautomers from using any input tautomer. * >>> >>> Here's a code example: >>> >>> from rdkit import Chem >>>> >>> from rdkit.Chem import Draw >>> >>> from rdkit.Chem.Draw import IPythonConsole >>>> IPythonConsole.drawOptions.addStereoAnnotation = True >>>> from rdkit.Chem.MolStandardize import rdMolStandardize >>>> >>>> #same result if you don't do any of these params. >>> >>> tautomer_params = >>>> Chem.MolStandardize.rdMolStandardize.CleanupParameters() >>>> tautomer_params.tautomerRemoveSp3Stereo = False >>>> tautomer_params.tautomerRemoveBondStereo = False >>>> tautomer_params.tautomerRemoveIsotopicHs = False >>>> tautomer_params.tautomerReassignStereo = False >>>> tautomer_params.doCanonical = True >>>> >>>> enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params) >>>> >>>> smi1 = 'Sc1cc2ccccc2cn1' >>>> smi2 = 'S=c1cc2ccccc2c[nH]1' >>>> mol1 = Chem.MolFromSmiles(smi1) >>>> mol2 = Chem.MolFromSmiles(smi2) >>>> >>>> #choose mol1 or mol2 to be source of tautomers: >>>> #choose mol1, and look at the tautomers. Note that mol2 isn't present! >>>> tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in >>>> enumerator.Enumerate(mol1)] >>>> >>>> Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not >>>> present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))], >>>> molsPerRow=4) >>>> >>> >>> And a picture of this in a notebook for an at-a-glance view: >>> https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03 >>> >>> Does anyone know a way to recover "mol2" within tautomers of "mol1"? >>> >>> Thank you! >>> Lewis >>> >>> >>> _______________________________________________ >>> Rdkit-discuss mailing list >>> Rdk...@li... >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Lewis M. <lew...@gm...> - 2024-02-05 19:35:02
|
Thank you very much for the detective work, Wim! This is helpful. It looks like the _reverse_ transition is possible, though. If I start by generating tautomers of "mol2", then "mol1" is recovered, which indicates this is an allowed transform. Is it possible that one direction is allowed but not the reverse? Failing a solution there, does anyone know if it is possible to add SMIRKS to the allowed tautomers through the python interface? Thanks, Lewis On Mon, Feb 5, 2024 at 9:52 PM Wim Dehaen <wim...@gm...> wrote: > hi lewis, > if i am not mistaken this is because the tautomer transfor "1,3 aromatic > heteroatom H shift" does not account for other chalcogens than oxygen, so > no selenium, tellurium or sulfur. > you can find the list of transforms here: > https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46 > (poiting to the line with the relevant transform). > best wishes > wim > > On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin <lew...@gm...> > wrote: > >> Hi all, >> I'm looking at scoring tautomers, and using the 'tautobase' dataset used >> by Weider et al* at: >> >> https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt >> >> This dataset has pairs of tautomers with experimental logK values to >> determine the preferred tautomer. >> >> In at least one case, depending on which tautomer you use as the 'entry' >> point, the enumerated tautomers by RDKit either do or don't include both of >> the pair of input molecules. *I'm hoping there's a way to uniquely >> recover the full set of possible tautomers from using any input tautomer. * >> >> Here's a code example: >> >> from rdkit import Chem >>> >> from rdkit.Chem import Draw >> >> from rdkit.Chem.Draw import IPythonConsole >>> IPythonConsole.drawOptions.addStereoAnnotation = True >>> from rdkit.Chem.MolStandardize import rdMolStandardize >>> >>> #same result if you don't do any of these params. >> >> tautomer_params = Chem.MolStandardize.rdMolStandardize.CleanupParameters() >>> tautomer_params.tautomerRemoveSp3Stereo = False >>> tautomer_params.tautomerRemoveBondStereo = False >>> tautomer_params.tautomerRemoveIsotopicHs = False >>> tautomer_params.tautomerReassignStereo = False >>> tautomer_params.doCanonical = True >>> >>> enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params) >>> >>> smi1 = 'Sc1cc2ccccc2cn1' >>> smi2 = 'S=c1cc2ccccc2c[nH]1' >>> mol1 = Chem.MolFromSmiles(smi1) >>> mol2 = Chem.MolFromSmiles(smi2) >>> >>> #choose mol1 or mol2 to be source of tautomers: >>> #choose mol1, and look at the tautomers. Note that mol2 isn't present! >>> tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in >>> enumerator.Enumerate(mol1)] >>> >>> Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not >>> present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))], >>> molsPerRow=4) >>> >> >> And a picture of this in a notebook for an at-a-glance view: >> https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03 >> >> Does anyone know a way to recover "mol2" within tautomers of "mol1"? >> >> Thank you! >> Lewis >> >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdk...@li... >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > |
From: Wim D. <wim...@gm...> - 2024-02-05 10:53:04
|
hi lewis, if i am not mistaken this is because the tautomer transfor "1,3 aromatic heteroatom H shift" does not account for other chalcogens than oxygen, so no selenium, tellurium or sulfur. you can find the list of transforms here: https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46 (poiting to the line with the relevant transform). best wishes wim On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin <lew...@gm...> wrote: > Hi all, > I'm looking at scoring tautomers, and using the 'tautobase' dataset used > by Weider et al* at: > > https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt > > This dataset has pairs of tautomers with experimental logK values to > determine the preferred tautomer. > > In at least one case, depending on which tautomer you use as the 'entry' > point, the enumerated tautomers by RDKit either do or don't include both of > the pair of input molecules. *I'm hoping there's a way to uniquely > recover the full set of possible tautomers from using any input tautomer. * > > Here's a code example: > > from rdkit import Chem >> > from rdkit.Chem import Draw > > from rdkit.Chem.Draw import IPythonConsole >> IPythonConsole.drawOptions.addStereoAnnotation = True >> from rdkit.Chem.MolStandardize import rdMolStandardize >> >> #same result if you don't do any of these params. > > tautomer_params = Chem.MolStandardize.rdMolStandardize.CleanupParameters() >> tautomer_params.tautomerRemoveSp3Stereo = False >> tautomer_params.tautomerRemoveBondStereo = False >> tautomer_params.tautomerRemoveIsotopicHs = False >> tautomer_params.tautomerReassignStereo = False >> tautomer_params.doCanonical = True >> >> enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params) >> >> smi1 = 'Sc1cc2ccccc2cn1' >> smi2 = 'S=c1cc2ccccc2c[nH]1' >> mol1 = Chem.MolFromSmiles(smi1) >> mol2 = Chem.MolFromSmiles(smi2) >> >> #choose mol1 or mol2 to be source of tautomers: >> #choose mol1, and look at the tautomers. Note that mol2 isn't present! >> tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in >> enumerator.Enumerate(mol1)] >> >> Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not >> present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))], >> molsPerRow=4) >> > > And a picture of this in a notebook for an at-a-glance view: > https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03 > > Does anyone know a way to recover "mol2" within tautomers of "mol1"? > > Thank you! > Lewis > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Lewis M. <lew...@gm...> - 2024-02-05 02:24:15
|
Hi all, I'm looking at scoring tautomers, and using the 'tautobase' dataset used by Weider et al* at: https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt This dataset has pairs of tautomers with experimental logK values to determine the preferred tautomer. In at least one case, depending on which tautomer you use as the 'entry' point, the enumerated tautomers by RDKit either do or don't include both of the pair of input molecules. *I'm hoping there's a way to uniquely recover the full set of possible tautomers from using any input tautomer. * Here's a code example: from rdkit import Chem > from rdkit.Chem import Draw from rdkit.Chem.Draw import IPythonConsole > IPythonConsole.drawOptions.addStereoAnnotation = True > from rdkit.Chem.MolStandardize import rdMolStandardize > > #same result if you don't do any of these params. tautomer_params = Chem.MolStandardize.rdMolStandardize.CleanupParameters() > tautomer_params.tautomerRemoveSp3Stereo = False > tautomer_params.tautomerRemoveBondStereo = False > tautomer_params.tautomerRemoveIsotopicHs = False > tautomer_params.tautomerReassignStereo = False > tautomer_params.doCanonical = True > > enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params) > > smi1 = 'Sc1cc2ccccc2cn1' > smi2 = 'S=c1cc2ccccc2c[nH]1' > mol1 = Chem.MolFromSmiles(smi1) > mol2 = Chem.MolFromSmiles(smi2) > > #choose mol1 or mol2 to be source of tautomers: > #choose mol1, and look at the tautomers. Note that mol2 isn't present! > tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in > enumerator.Enumerate(mol1)] > > Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not > present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))], > molsPerRow=4) > And a picture of this in a notebook for an at-a-glance view: https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03 Does anyone know a way to recover "mol2" within tautomers of "mol1"? Thank you! Lewis |
From: David C. <dav...@gm...> - 2024-02-04 17:02:40
|
Thanks Joel, That's really helpful. Armed with that information, I was able to dig further into the source code and find some unit tests that do very similar to what you have set up, but within a context manager. Hopefully the mail client won't mangle the formatting. Cheers, Dave #!/usr/bin/env python import logging from contextlib import contextmanager from io import StringIO from rdkit import rdBase, Chem @contextmanager def log_to_python(level=None): """ Temporarily redirect logging to Python streams, optionally setting a specific log level. """ rdBase.LogToPythonLogger() pylog = logging.getLogger("rdkit") if level is not None: original_level = pylog.level pylog.setLevel(level) yield pylog if level is not None: pylog.setLevel(original_level) rdBase.LogToCppStreams() @contextmanager def capture_logging(level=None): """ Temporarily redirect logging to a Python StringIO, optionally setting a specific log level. """ log_stream = StringIO() stream_handler = logging.StreamHandler(stream=log_stream) with log_to_python(level) as pylog: pylog.addHandler(stream_handler) yield log_stream pylog.removeHandler(stream_handler) with capture_logging(logging.WARNING) as log_stream: mol = Chem.MolFromSmiles('c1ccccc1(C)(C)') print(f'and the message is {log_stream.getvalue()}') # this clears the log message. If you want to keep all the messages # as one long string, don't do this. log_stream.truncate(0) mol = Chem.MolFromSmiles('C(C)(C)(C)(C)C') print(f'and now the message is {log_stream.getvalue()}') On Thu, Feb 1, 2024 at 9:20 PM Joel Duerksen <jo...@d2...> wrote: > I've changed my strategy a few times. I'll share my most recent approach, > quite possibly this is flawed, but for now it works for me. Fixes and > corrections welcome. > I think these are all the relevant bits below... > > ... > ... other stuff, import rdkit, etc... > ... > # fancy foot work to capture RDKIT messages > from io import StringIO > from rdkit import rdBase > rdBase.LogToPythonLogger() > > import logging > logger = logging.getLogger('rdkit') > logger.setLevel(logging.WARNING) # make explicit though this is the > default, currently > > ... more code... > ... I'll show some of the context of how I'm using it.. reading an SDF > file here, and want to trap every message > > f_in = open_sdf(INPUT_FILE, 'rb') > # please don't change the molecule while reading it in > suppl = Chem.ForwardSDMolSupplier(f_in, sanitize=False, > removeHs=False, strictParsing=False) > > with StringIO() as log_stream: > log_handler = logging.StreamHandler(log_stream) > myLogger = logging.getLogger('rdkit') > myLogger.addHandler(log_handler) > > # usually don't pull from a generator this way, but we're > trying to catch log messages for each molecule individually > while True: > try: > # Python 3 > log_stream.truncate(0) > log_stream.seek(0) > curmol = next(suppl) > except StopIteration: > break > > ... capture everything in the stream from reading the molecule with > log_stream.getvalue() > re.sub(r'\[[0-9]{2}:[0-9]{2}:[0-9]{2}[^]]*\]', '', > log_stream.getvalue()) # remove the timestamps [12:14:19] (maybe could turn > them off?) > > ... after closing SDF file I'm removing the handler > # remove our log handler > myLogger.handlers.clear() > > > On Thu, Feb 1, 2024 at 12:49 PM David Cosgrove <dav...@gm...> > wrote: > >> Hi, >> I'd like to be able to redirect the various logging streams to files from >> within the code. I know that I can turn them off: >> >> RDLogger.DisableLog('rdApp.*') >> >> but that's more extreme than I want. >> I have found the function RDLogger.AttachFileToLog() but can't work out >> how to use it. The naive >> >> RDLogger.AttachFileToLog('rdApp.*', 'logging.file', 1) >> >> didn't produce a file anywhere I could see. I have not been able to find >> an example of its use. >> >> find . -name \*.py -exec grep AttachFileToLog {} \; -print >> >> from the top of the source tree produces >> >> from rdkit.rdBase import AttachFileToLog, DisableLog, EnableLog, >> LogMessage >> ./rdkit/RDLogger.py >> >> but the functions don't seem to be used within that file. >> >> Any pointers gratefully received. >> Dave >> >> >> -- >> David Cosgrove >> Freelance computational chemistry and chemoinformatics developer >> http://cozchemix.co.uk >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdk...@li... >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > > > -- > Regards, > Joel > > ---- > *Joel L. Duerksen *jo...@d2... > *Innovative Machine Learning and Data Science Solutions* > > -- David Cosgrove Freelance computational chemistry and chemoinformatics developer http://cozchemix.co.uk |
From: Joel D. <jo...@d2...> - 2024-02-01 22:10:42
|
I've changed my strategy a few times. I'll share my most recent approach, quite possibly this is flawed, but for now it works for me. Fixes and corrections welcome. I think these are all the relevant bits below... ... ... other stuff, import rdkit, etc... ... # fancy foot work to capture RDKIT messages from io import StringIO from rdkit import rdBase rdBase.LogToPythonLogger() import logging logger = logging.getLogger('rdkit') logger.setLevel(logging.WARNING) # make explicit though this is the default, currently ... more code... ... I'll show some of the context of how I'm using it.. reading an SDF file here, and want to trap every message f_in = open_sdf(INPUT_FILE, 'rb') # please don't change the molecule while reading it in suppl = Chem.ForwardSDMolSupplier(f_in, sanitize=False, removeHs=False, strictParsing=False) with StringIO() as log_stream: log_handler = logging.StreamHandler(log_stream) myLogger = logging.getLogger('rdkit') myLogger.addHandler(log_handler) # usually don't pull from a generator this way, but we're trying to catch log messages for each molecule individually while True: try: # Python 3 log_stream.truncate(0) log_stream.seek(0) curmol = next(suppl) except StopIteration: break ... capture everything in the stream from reading the molecule with log_stream.getvalue() re.sub(r'\[[0-9]{2}:[0-9]{2}:[0-9]{2}[^]]*\]', '', log_stream.getvalue()) # remove the timestamps [12:14:19] (maybe could turn them off?) ... after closing SDF file I'm removing the handler # remove our log handler myLogger.handlers.clear() On Thu, Feb 1, 2024 at 12:49 PM David Cosgrove <dav...@gm...> wrote: > Hi, > I'd like to be able to redirect the various logging streams to files from > within the code. I know that I can turn them off: > > RDLogger.DisableLog('rdApp.*') > > but that's more extreme than I want. > I have found the function RDLogger.AttachFileToLog() but can't work out > how to use it. The naive > > RDLogger.AttachFileToLog('rdApp.*', 'logging.file', 1) > > didn't produce a file anywhere I could see. I have not been able to find > an example of its use. > > find . -name \*.py -exec grep AttachFileToLog {} \; -print > > from the top of the source tree produces > > from rdkit.rdBase import AttachFileToLog, DisableLog, EnableLog, LogMessage > ./rdkit/RDLogger.py > > but the functions don't seem to be used within that file. > > Any pointers gratefully received. > Dave > > > -- > David Cosgrove > Freelance computational chemistry and chemoinformatics developer > http://cozchemix.co.uk > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Regards, Joel ---- *Joel L. Duerksen *jo...@d2... *Innovative Machine Learning and Data Science Solutions* |
From: David C. <dav...@gm...> - 2024-02-01 17:48:52
|
Hi, I'd like to be able to redirect the various logging streams to files from within the code. I know that I can turn them off: RDLogger.DisableLog('rdApp.*') but that's more extreme than I want. I have found the function RDLogger.AttachFileToLog() but can't work out how to use it. The naive RDLogger.AttachFileToLog('rdApp.*', 'logging.file', 1) didn't produce a file anywhere I could see. I have not been able to find an example of its use. find . -name \*.py -exec grep AttachFileToLog {} \; -print from the top of the source tree produces from rdkit.rdBase import AttachFileToLog, DisableLog, EnableLog, LogMessage ./rdkit/RDLogger.py but the functions don't seem to be used within that file. Any pointers gratefully received. Dave -- David Cosgrove Freelance computational chemistry and chemoinformatics developer http://cozchemix.co.uk |
From: Tomkinson, N. <nic...@as...> - 2024-01-31 15:17:58
|
Thanks for that Greg. Nick From: Greg Landrum <gre...@gm...> Sent: Wednesday, January 31, 2024 2:48 PM To: Tomkinson, Nicholas <nic...@as...> Cc: rdk...@li... Subject: Re: [Rdkit-discuss] V2000 to V3000 enhanced stereo question Thanks for that example Nick. We can't handle this automatically since there are multiple interpretations of what the chiral flag means, but I think some relatively straightforward post-processing can do what you're looking for. https://gist.github.com/greglandrum/f85097a8489ba4a5825b0981b1fd2408<https://gist.github.com/greglandrum/f85097a8489ba4a5825b0981b1fd2408> If people think it's useful, this is something which we could add to the RDKit itself. -greg On Wed, Jan 31, 2024 at 2:53 PM Tomkinson, Nicholas <nic...@as...<mailto:nic...@as...>> wrote: Hi Greg – sure. So - [cid:image001.png@01DA5458.9B7BD490] If I have a V2000 with or without the chiral flag: ACCLDraw01312413482D 8 8 0 0 1 0 0 0 0 0999 V2000 4.6334 -6.5969 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.6563 -6.0064 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 6.6791 -6.5969 0.0000 N 0 0 3 0 0 0 0 0 0 0 0 0 6.6791 -7.7781 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.6563 -8.3686 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.6334 -7.7781 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0 5.6563 -4.8257 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.6109 -8.3684 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 3 2 1 0 0 0 0 4 3 1 0 0 0 0 5 4 1 0 0 0 0 1 6 1 0 0 0 0 6 5 1 0 0 0 0 2 7 1 1 0 0 0 6 8 1 1 0 0 0 M END ACCLDraw01312413492D 8 8 0 0 0 0 0 0 0 0999 V2000 4.6334 -6.5969 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.6563 -6.0064 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 6.6791 -6.5969 0.0000 N 0 0 3 0 0 0 0 0 0 0 0 0 6.6791 -7.7781 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.6563 -8.3686 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.6334 -7.7781 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0 5.6563 -4.8257 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.6109 -8.3684 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 3 2 1 0 0 0 0 4 3 1 0 0 0 0 5 4 1 0 0 0 0 1 6 1 0 0 0 0 6 5 1 0 0 0 0 2 7 1 1 0 0 0 6 8 1 1 0 0 0 M END I’d expect the enhanced collections to be output in V3000 format. In this case the chiral flag is also set but that’s not a biggy for me. (I wish the chiral flag didn’t exist in V3000.) ACCLDraw01312413472D 0 0 0 0 0 999 V3000 M V30 BEGIN CTAB M V30 COUNTS 8 8 0 0 1 M V30 BEGIN ATOM M V30 1 C 4.6334 -6.5969 0 0 M V30 2 C 5.6563 -6.0064 0 0 CFG=2 M V30 3 N 6.6791 -6.5969 0 0 CFG=3 M V30 4 C 6.6791 -7.7781 0 0 M V30 5 C 5.6563 -8.3686 0 0 M V30 6 C 4.6334 -7.7781 0 0 CFG=1 M V30 7 C 5.6563 -4.8257 0 0 M V30 8 C 3.6109 -8.3684 0 0 M V30 END ATOM M V30 BEGIN BOND M V30 1 1 1 2 M V30 2 1 3 2 M V30 3 1 4 3 M V30 4 1 5 4 M V30 5 1 1 6 M V30 6 1 6 5 M V30 7 1 2 7 CFG=1 M V30 8 1 6 8 CFG=1 M V30 END BOND M V30 BEGIN COLLECTION M V30 MDLV30/STEABS ATOMS=(2 2 6) M V30 END COLLECTION M V30 END CTAB M END ACCLDraw01312413492D 0 0 0 0 0 999 V3000 M V30 BEGIN CTAB M V30 COUNTS 8 8 0 0 0 M V30 BEGIN ATOM M V30 1 C 4.6334 -6.5969 0 0 M V30 2 C 5.6563 -6.0064 0 0 CFG=2 M V30 3 N 6.6791 -6.5969 0 0 CFG=3 M V30 4 C 6.6791 -7.7781 0 0 M V30 5 C 5.6563 -8.3686 0 0 M V30 6 C 4.6334 -7.7781 0 0 CFG=1 M V30 7 C 5.6563 -4.8257 0 0 M V30 8 C 3.6109 -8.3684 0 0 M V30 END ATOM M V30 BEGIN BOND M V30 1 1 1 2 M V30 2 1 3 2 M V30 3 1 4 3 M V30 4 1 5 4 M V30 5 1 1 6 M V30 6 1 6 5 M V30 7 1 2 7 CFG=1 M V30 8 1 6 8 CFG=1 M V30 END BOND M V30 BEGIN COLLECTION M V30 MDLV30/STERAC1 ATOMS=(2 6 2) M V30 END COLLECTION M V30 END CTAB M END Cheers Nick From: Greg Landrum <gre...@gm...<mailto:gre...@gm...>> Sent: Wednesday, January 31, 2024 1:45 PM To: Tomkinson, Nicholas <nic...@as...<mailto:nic...@as...>> Cc: rdk...@li...<mailto:rdk...@li...> Subject: Re: [Rdkit-discuss] V2000 to V3000 enhanced stereo question Hi Nick, Can you provide an example of exactly what you would like to have happen? -greg On Tue, Jan 30, 2024 at 5:46 PM Tomkinson, Nicholas <nic...@as...<mailto:nic...@as...>> wrote: I am trying to convert a simple V2000 molfile with or without the chiral flag into a V3000 molfile but this does not create an enhanced stereo collection in the V3000 molfile. This is a requirement for another application that does not handle V2000/V3000 mixtures well. Is there anyway of forcing the writing of the enhanced collection in this context? Thanks Nick ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com<https://www.astrazeneca.com> _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss> ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com<https://www.astrazeneca.com> ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com<https://www.astrazeneca.com> |
From: Greg L. <gre...@gm...> - 2024-01-31 14:48:53
|
Thanks for that example Nick. We can't handle this automatically since there are multiple interpretations of what the chiral flag means, but I think some relatively straightforward post-processing can do what you're looking for. https://gist.github.com/greglandrum/f85097a8489ba4a5825b0981b1fd2408 If people think it's useful, this is something which we could add to the RDKit itself. -greg On Wed, Jan 31, 2024 at 2:53 PM Tomkinson, Nicholas < nic...@as...> wrote: > Hi Greg – sure. So - > > > > > > If I have a V2000 with or without the chiral flag: > > > > > > ACCLDraw01312413482D > > > > 8 8 0 0 1 0 0 0 0 0999 V2000 > > 4.6334 -6.5969 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > 5.6563 -6.0064 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 > > 6.6791 -6.5969 0.0000 N 0 0 3 0 0 0 0 0 0 0 0 0 > > 6.6791 -7.7781 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > 5.6563 -8.3686 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > 4.6334 -7.7781 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0 > > 5.6563 -4.8257 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > 3.6109 -8.3684 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > 1 2 1 0 0 0 0 > > 3 2 1 0 0 0 0 > > 4 3 1 0 0 0 0 > > 5 4 1 0 0 0 0 > > 1 6 1 0 0 0 0 > > 6 5 1 0 0 0 0 > > 2 7 1 1 0 0 0 > > 6 8 1 1 0 0 0 > > M END > > > > > > ACCLDraw01312413492D > > > > 8 8 0 0 0 0 0 0 0 0999 V2000 > > 4.6334 -6.5969 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > 5.6563 -6.0064 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 > > 6.6791 -6.5969 0.0000 N 0 0 3 0 0 0 0 0 0 0 0 0 > > 6.6791 -7.7781 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > 5.6563 -8.3686 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > 4.6334 -7.7781 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0 > > 5.6563 -4.8257 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > 3.6109 -8.3684 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > 1 2 1 0 0 0 0 > > 3 2 1 0 0 0 0 > > 4 3 1 0 0 0 0 > > 5 4 1 0 0 0 0 > > 1 6 1 0 0 0 0 > > 6 5 1 0 0 0 0 > > 2 7 1 1 0 0 0 > > 6 8 1 1 0 0 0 > > M END > > > > > > I’d expect the enhanced collections to be output in V3000 format. In this > case the chiral flag is also set but that’s not a biggy for me. (I wish the > chiral flag didn’t exist in V3000.) > > > > > > ACCLDraw01312413472D > > > > 0 0 0 0 0 999 V3000 > > M V30 BEGIN CTAB > > M V30 COUNTS 8 8 0 0 1 > > M V30 BEGIN ATOM > > M V30 1 C 4.6334 -6.5969 0 0 > > M V30 2 C 5.6563 -6.0064 0 0 CFG=2 > > M V30 3 N 6.6791 -6.5969 0 0 CFG=3 > > M V30 4 C 6.6791 -7.7781 0 0 > > M V30 5 C 5.6563 -8.3686 0 0 > > M V30 6 C 4.6334 -7.7781 0 0 CFG=1 > > M V30 7 C 5.6563 -4.8257 0 0 > > M V30 8 C 3.6109 -8.3684 0 0 > > M V30 END ATOM > > M V30 BEGIN BOND > > M V30 1 1 1 2 > > M V30 2 1 3 2 > > M V30 3 1 4 3 > > M V30 4 1 5 4 > > M V30 5 1 1 6 > > M V30 6 1 6 5 > > M V30 7 1 2 7 CFG=1 > > M V30 8 1 6 8 CFG=1 > > M V30 END BOND > > M V30 BEGIN COLLECTION > > M V30 MDLV30/STEABS ATOMS=(2 2 6) > > M V30 END COLLECTION > > M V30 END CTAB > > M END > > > > > > ACCLDraw01312413492D > > > > 0 0 0 0 0 999 V3000 > > M V30 BEGIN CTAB > > M V30 COUNTS 8 8 0 0 0 > > M V30 BEGIN ATOM > > M V30 1 C 4.6334 -6.5969 0 0 > > M V30 2 C 5.6563 -6.0064 0 0 CFG=2 > > M V30 3 N 6.6791 -6.5969 0 0 CFG=3 > > M V30 4 C 6.6791 -7.7781 0 0 > > M V30 5 C 5.6563 -8.3686 0 0 > > M V30 6 C 4.6334 -7.7781 0 0 CFG=1 > > M V30 7 C 5.6563 -4.8257 0 0 > > M V30 8 C 3.6109 -8.3684 0 0 > > M V30 END ATOM > > M V30 BEGIN BOND > > M V30 1 1 1 2 > > M V30 2 1 3 2 > > M V30 3 1 4 3 > > M V30 4 1 5 4 > > M V30 5 1 1 6 > > M V30 6 1 6 5 > > M V30 7 1 2 7 CFG=1 > > M V30 8 1 6 8 CFG=1 > > M V30 END BOND > > M V30 BEGIN COLLECTION > > M V30 MDLV30/STERAC1 ATOMS=(2 6 2) > > M V30 END COLLECTION > > M V30 END CTAB > > M END > > > > Cheers > > > > Nick > > > > > > > > *From:* Greg Landrum <gre...@gm...> > *Sent:* Wednesday, January 31, 2024 1:45 PM > *To:* Tomkinson, Nicholas <nic...@as...> > *Cc:* rdk...@li... > *Subject:* Re: [Rdkit-discuss] V2000 to V3000 enhanced stereo question > > > > Hi Nick, > > > > Can you provide an example of exactly what you would like to have happen? > > > > -greg > > > > > > On Tue, Jan 30, 2024 at 5:46 PM Tomkinson, Nicholas < > nic...@as...> wrote: > > I am trying to convert a simple V2000 molfile with or without the chiral > flag into a V3000 molfile but this does not create an enhanced stereo > collection in the V3000 molfile. This is a requirement for another > application that does not handle V2000/V3000 mixtures well. Is there anyway > of forcing the writing of the enhanced collection in this context? > > > > Thanks > > > > Nick > > > ------------------------------ > > AstraZeneca UK Limited is a company incorporated in England and Wales with > registered number:03674842 and its registered office at 1 Francis Crick > Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. > > This e-mail and its attachments are intended for the above named recipient > only and may contain confidential and privileged information. If they have > come to you in error, you must not copy or show them to anyone; instead, > please reply to this e-mail, highlighting the error to the sender and then > immediately delete the message. For information about how AstraZeneca UK > Limited and its affiliates may process information, personal data and > monitor communications, please see our privacy notice at > www.astrazeneca.com > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > ------------------------------ > > AstraZeneca UK Limited is a company incorporated in England and Wales with > registered number:03674842 and its registered office at 1 Francis Crick > Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. > > This e-mail and its attachments are intended for the above named recipient > only and may contain confidential and privileged information. If they have > come to you in error, you must not copy or show them to anyone; instead, > please reply to this e-mail, highlighting the error to the sender and then > immediately delete the message. For information about how AstraZeneca UK > Limited and its affiliates may process information, personal data and > monitor communications, please see our privacy notice at > www.astrazeneca.com > |
From: Giovanni T. <gio...@gl...> - 2024-01-31 14:26:05
|
Hello Nick, We faced a (seemingly) related problem a while ago. In our case we were trying to convert V2000 CTABs to CXSMILES, and we were expecting that the V2000 chirality flag would translate to an enhanced stereo string in the CXSMILES. That is not so, by design. See my question, and the answer it got, here: V2000 chiral flag does not seem to be read by Chem.MolFromMolBlock() * Issue #6062 * rdkit/rdkit * GitHub<https://github.com/rdkit/rdkit/issues/6062> I imagine that the reason why the V2000 to V3000 conversion does not use the V2000 chirality flag is conceptually the same, but indeed worth checking. FYI, the practical solution for our workflow was: * create a function 'chiral_flag_from_molblock' that detects if a CTAB is V2000 or V3000; if V2000, reads the flag (by simple text parsing) and returns it (0 or 1), if V3000, returns -1 * create a function 'CTAB_to_CXSMILES' that calls the above; for V3000, the rdkit-generated CXSMILES is (or usually is) already correct; for V2000, if the flag is 1, the SMILES is identical to the CXSMILES; if the flag is 0, the function loops through all atoms, identifies those that have tetrahedral stereochemistry, and uses their indices to put together an '&1' enhanced stereo group string, which is then appended to the SMILES (as a V2000 CTAB with chirality flag 0 can only represent a racemic mixture where all configurations are inverted together, so it only needs one '&' group - of course with all the exceptions and issues you can imagine: meso stereoisomers or moieties, etc) Probably not ideal, but lacking any suggestion or a better 'native' solution, that's what we went for, and it seems to have worked so far. [I'll mention for completeness that we also run a further standardisation function on CXSMILES, which takes care of removing the enhanced stereo flags from meso moieties]. I hope this helps. Regards [cid:image001.png@01DA5433.970407E0] [cid:image002.png@01DA5433.970407E0]<https://twitter.com/GalapagosGlobal> [cid:image003.png@01DA5433.970407E0] <https://www.linkedin.com/company/glpg> [cid:image004.png@01DA5433.970407E0] <https://www.youtube.com/c/GalapagosGlobal> [cid:image005.png@01DA5433.970407E0] <https://www.glpg.com/> Giovanni Tricarico Principal Scientist Chemoinformatics +32 15 6514 30<callto:+32%2015%206514%2030> gio...@gl...<mailto:gio...@gl...> Galapagos NV Generaal De Wittelaan L11 A3 2800 Mechelen, Belgium From: Tomkinson, Nicholas <nic...@as...> Sent: Tuesday, January 30, 2024 5:28 PM To: rdk...@li... Subject: [Rdkit-discuss] V2000 to V3000 enhanced stereo question Some people who received this message don't often get email from nic...@as...<mailto:nic...@as...>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> I am trying to convert a simple V2000 molfile with or without the chiral flag into a V3000 molfile but this does not create an enhanced stereo collection in the V3000 molfile. This is a requirement for another application that does not handle V2000/V3000 mixtures well. Is there anyway of forcing the writing of the enhanced collection in this context? Thanks Nick ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com<https://www.astrazeneca.com/> This e-mail and its attachment(s) (if any) may contain confidential and/or proprietary information and is intended for its addressee(s) only. Any unauthorized use of the information contained herein (including, but not limited to, alteration, reproduction, communication, distribution or any other form of dissemination) is strictly prohibited. If you are not the intended addressee, please notify the originator promptly and delete this e-mail and its attachment(s) (if any) subsequently. Neither Galapagos nor any of its affiliates shall be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message (by a third party) or as a result of a virus being passed on. |
From: Tomkinson, N. <nic...@as...> - 2024-01-31 13:53:21
|
Hi Greg – sure. So - [cid:image001.png@01DA544C.50E20B90] If I have a V2000 with or without the chiral flag: ACCLDraw01312413482D 8 8 0 0 1 0 0 0 0 0999 V2000 4.6334 -6.5969 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.6563 -6.0064 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 6.6791 -6.5969 0.0000 N 0 0 3 0 0 0 0 0 0 0 0 0 6.6791 -7.7781 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.6563 -8.3686 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.6334 -7.7781 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0 5.6563 -4.8257 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.6109 -8.3684 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 3 2 1 0 0 0 0 4 3 1 0 0 0 0 5 4 1 0 0 0 0 1 6 1 0 0 0 0 6 5 1 0 0 0 0 2 7 1 1 0 0 0 6 8 1 1 0 0 0 M END ACCLDraw01312413492D 8 8 0 0 0 0 0 0 0 0999 V2000 4.6334 -6.5969 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.6563 -6.0064 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 6.6791 -6.5969 0.0000 N 0 0 3 0 0 0 0 0 0 0 0 0 6.6791 -7.7781 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.6563 -8.3686 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.6334 -7.7781 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0 5.6563 -4.8257 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.6109 -8.3684 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 3 2 1 0 0 0 0 4 3 1 0 0 0 0 5 4 1 0 0 0 0 1 6 1 0 0 0 0 6 5 1 0 0 0 0 2 7 1 1 0 0 0 6 8 1 1 0 0 0 M END I’d expect the enhanced collections to be output in V3000 format. In this case the chiral flag is also set but that’s not a biggy for me. (I wish the chiral flag didn’t exist in V3000.) ACCLDraw01312413472D 0 0 0 0 0 999 V3000 M V30 BEGIN CTAB M V30 COUNTS 8 8 0 0 1 M V30 BEGIN ATOM M V30 1 C 4.6334 -6.5969 0 0 M V30 2 C 5.6563 -6.0064 0 0 CFG=2 M V30 3 N 6.6791 -6.5969 0 0 CFG=3 M V30 4 C 6.6791 -7.7781 0 0 M V30 5 C 5.6563 -8.3686 0 0 M V30 6 C 4.6334 -7.7781 0 0 CFG=1 M V30 7 C 5.6563 -4.8257 0 0 M V30 8 C 3.6109 -8.3684 0 0 M V30 END ATOM M V30 BEGIN BOND M V30 1 1 1 2 M V30 2 1 3 2 M V30 3 1 4 3 M V30 4 1 5 4 M V30 5 1 1 6 M V30 6 1 6 5 M V30 7 1 2 7 CFG=1 M V30 8 1 6 8 CFG=1 M V30 END BOND M V30 BEGIN COLLECTION M V30 MDLV30/STEABS ATOMS=(2 2 6) M V30 END COLLECTION M V30 END CTAB M END ACCLDraw01312413492D 0 0 0 0 0 999 V3000 M V30 BEGIN CTAB M V30 COUNTS 8 8 0 0 0 M V30 BEGIN ATOM M V30 1 C 4.6334 -6.5969 0 0 M V30 2 C 5.6563 -6.0064 0 0 CFG=2 M V30 3 N 6.6791 -6.5969 0 0 CFG=3 M V30 4 C 6.6791 -7.7781 0 0 M V30 5 C 5.6563 -8.3686 0 0 M V30 6 C 4.6334 -7.7781 0 0 CFG=1 M V30 7 C 5.6563 -4.8257 0 0 M V30 8 C 3.6109 -8.3684 0 0 M V30 END ATOM M V30 BEGIN BOND M V30 1 1 1 2 M V30 2 1 3 2 M V30 3 1 4 3 M V30 4 1 5 4 M V30 5 1 1 6 M V30 6 1 6 5 M V30 7 1 2 7 CFG=1 M V30 8 1 6 8 CFG=1 M V30 END BOND M V30 BEGIN COLLECTION M V30 MDLV30/STERAC1 ATOMS=(2 6 2) M V30 END COLLECTION M V30 END CTAB M END Cheers Nick From: Greg Landrum <gre...@gm...> Sent: Wednesday, January 31, 2024 1:45 PM To: Tomkinson, Nicholas <nic...@as...> Cc: rdk...@li... Subject: Re: [Rdkit-discuss] V2000 to V3000 enhanced stereo question Hi Nick, Can you provide an example of exactly what you would like to have happen? -greg On Tue, Jan 30, 2024 at 5:46 PM Tomkinson, Nicholas <nic...@as...<mailto:nic...@as...>> wrote: I am trying to convert a simple V2000 molfile with or without the chiral flag into a V3000 molfile but this does not create an enhanced stereo collection in the V3000 molfile. This is a requirement for another application that does not handle V2000/V3000 mixtures well. Is there anyway of forcing the writing of the enhanced collection in this context? Thanks Nick ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com<https://www.astrazeneca.com> _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss> ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com<https://www.astrazeneca.com> |
From: Greg L. <gre...@gm...> - 2024-01-31 13:45:54
|
Hi Nick, Can you provide an example of exactly what you would like to have happen? -greg On Tue, Jan 30, 2024 at 5:46 PM Tomkinson, Nicholas < nic...@as...> wrote: > I am trying to convert a simple V2000 molfile with or without the chiral > flag into a V3000 molfile but this does not create an enhanced stereo > collection in the V3000 molfile. This is a requirement for another > application that does not handle V2000/V3000 mixtures well. Is there anyway > of forcing the writing of the enhanced collection in this context? > > > > Thanks > > > > Nick > > > ------------------------------ > > AstraZeneca UK Limited is a company incorporated in England and Wales with > registered number:03674842 and its registered office at 1 Francis Crick > Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. > > This e-mail and its attachments are intended for the above named recipient > only and may contain confidential and privileged information. If they have > come to you in error, you must not copy or show them to anyone; instead, > please reply to this e-mail, highlighting the error to the sender and then > immediately delete the message. For information about how AstraZeneca UK > Limited and its affiliates may process information, personal data and > monitor communications, please see our privacy notice at > www.astrazeneca.com > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Tomkinson, N. <nic...@as...> - 2024-01-31 12:03:03
|
Thanks Tricarico - I was afraid this might be the answer, but thanks for your suggestion. I'm not entirely sure I understand how adding an enhanced stereo collection reflecting the status of the chiral flag when going from V2000 to V3000 is a problem; it would be good to see some examples. I know the chiral flag is a nightmare in when reading V3000 but when reading V2000 if it's not set correctly then the file is broken and setting the enhanced collection doesn't make it more broken. It would be nice if creating an enhanced collection from the chiral flag (when reading V2000 only) was available as an option. Cheers Nick From: Giovanni Tricarico <gio...@gl...> Sent: Wednesday, January 31, 2024 9:55 AM To: Tomkinson, Nicholas <nic...@as...>; rdk...@li... Subject: RE: V2000 to V3000 enhanced stereo question Hello Nick, We faced a (seemingly) related problem a while ago. In our case we were trying to convert V2000 CTABs to CXSMILES, and we were expecting that the V2000 chirality flag would translate to an enhanced stereo string in the CXSMILES. That is not so, by design. See my question, and the answer it got, here: V2000 chiral flag does not seem to be read by Chem.MolFromMolBlock() * Issue #6062 * rdkit/rdkit * GitHub<https://github.com/rdkit/rdkit/issues/6062> I imagine that the reason why the V2000 to V3000 conversion does not use the V2000 chirality flag is conceptually the same, but indeed worth checking. FYI, the practical solution for our workflow was: * create a function 'chiral_flag_from_molblock' that detects if a CTAB is V2000 or V3000; if V2000, reads the flag (by simple text parsing) and returns it (0 or 1), if V3000, returns -1 * create a function 'CTAB_to_CXSMILES' that calls the above; for V3000, the rdkit-generated CXSMILES is (or usually is) already correct; for V2000, if the flag is 1, the SMILES is identical to the CXSMILES; if the flag is 0, the function loops through all atoms, identifies those that have tetrahedral stereochemistry, and uses their indices to put together an '&1' enhanced stereo group string, which is then appended to the SMILES (as a V2000 CTAB with chirality flag 0 can only represent a racemic mixture where all configurations are inverted together, so it only needs one '&' group - of course with all the exceptions and issues you can imagine: meso stereoisomers or moieties, etc) Probably not ideal, but lacking any suggestion or a better 'native' solution, that's what we went for, and it seems to have worked so far. [I'll mention for completeness that we also run a further standardisation function on CXSMILES, which takes care of removing the enhanced stereo flags from meso moieties]. I hope this helps. Regards [cid:image001.png@01DA542E.D3A90830] [cid:image002.png@01DA542E.D3A90830]<https://twitter.com/GalapagosGlobal> [cid:image003.png@01DA542E.D3A90830] <https://www.linkedin.com/company/glpg> [cid:image004.png@01DA542E.D3A90830] <https://www.youtube.com/c/GalapagosGlobal> [cid:image005.png@01DA542E.D3A90830] <https://www.glpg.com/> Giovanni Tricarico Principal Scientist Chemoinformatics +32 15 6514 30<callto:+32%2015%206514%2030> gio...@gl...<mailto:gio...@gl...> Galapagos NV Generaal De Wittelaan L11 A3 2800 Mechelen, Belgium From: Tomkinson, Nicholas <nic...@as...<mailto:nic...@as...>> Sent: Tuesday, January 30, 2024 5:28 PM To: rdk...@li...<mailto:rdk...@li...> Subject: [Rdkit-discuss] V2000 to V3000 enhanced stereo question Some people who received this message don't often get email from nic...@as...<mailto:nic...@as...>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> I am trying to convert a simple V2000 molfile with or without the chiral flag into a V3000 molfile but this does not create an enhanced stereo collection in the V3000 molfile. This is a requirement for another application that does not handle V2000/V3000 mixtures well. Is there anyway of forcing the writing of the enhanced collection in this context? Thanks Nick ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com<https://www.astrazeneca.com/> This e-mail and its attachment(s) (if any) may contain confidential and/or proprietary information and is intended for its addressee(s) only. Any unauthorized use of the information contained herein (including, but not limited to, alteration, reproduction, communication, distribution or any other form of dissemination) is strictly prohibited. If you are not the intended addressee, please notify the originator promptly and delete this e-mail and its attachment(s) (if any) subsequently. Neither Galapagos nor any of its affiliates shall be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message (by a third party) or as a result of a virus being passed on. ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com<https://www.astrazeneca.com> |
From: Tomkinson, N. <nic...@as...> - 2024-01-30 16:44:32
|
I am trying to convert a simple V2000 molfile with or without the chiral flag into a V3000 molfile but this does not create an enhanced stereo collection in the V3000 molfile. This is a requirement for another application that does not handle V2000/V3000 mixtures well. Is there anyway of forcing the writing of the enhanced collection in this context? Thanks Nick ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com<https://www.astrazeneca.com> |
From: He, A. <he...@bu...> - 2024-01-10 23:34:29
|
Hi Emre! You can get more detailed info on failed conformer generations through rdDistGeom.EmbedFailureCauses, see: https://greglandrum.github.io/rdkit-blog/posts/2023-05-17-understanding-confgen-errors.html Bests, -- Amy He Hadad Lab @ OSU He...@os... From: Emre Apaydın <emr...@gm...> Date: Wednesday, January 10, 2024 at 8:47 AM To: rdk...@li... <rdk...@li...> Subject: [Rdkit-discuss] Ligand conversion problem from 2D to 3D Hello, I want to convert the 2D ligands I downloaded as sdf format from the ZINC library to 3D, but almost half of them are not converted to 3D. Some of them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 ligands are not converted Hello, I want to convert the 2D ligands I downloaded as sdf format from the ZINC library to 3D, but almost half of them are not converted to 3D. Some of them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 ligands are not converted to 3D in this way. When I run the script, I do not get any warning or error in IDE. When I look at the output of my Try, Except commands, I see "ZINC000008214373.sdf : rdDistGeom.EmbedMolecule(mol, etkdgv3) = Failed ZINC000008214373.sdf : rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed" It outputs like this for ligands that are not translated to 3D. When I try different methods, the ligands are converted to 3D. I wonder if there is something missing or wrong with my script. I would be grateful if you can help me. Thank you! ``` from rdkit import Chem from rdkit.Chem import rdDistGeom from rdkit.Chem import rdForceFieldHelpers from rdkit.Chem import rdPartialCharges import os ligands_dir = "ligands" output_dir = "new_ligands" status_file = "process_status.txt" if not os.path.exists(output_dir): os.makedirs(output_dir) sdf_files = [f for f in os.listdir(ligands_dir) if f.endswith(".sdf")] with open(status_file, 'w') as status: for sdf_file in sdf_files: input_path = os.path.join(ligands_dir, sdf_file) output_path = os.path.join(output_dir, sdf_file) mol = Chem.MolFromMolFile(input_path) # Add hydrogens try: mol = Chem.AddHs(mol, addCoords=True) except: status.write(f"{sdf_file} : Chem.AddHs(mol) = Failed\\n") continue # 3D embedding etkdgv3 = rdDistGeom.ETKDGv3() embed_status = rdDistGeom.EmbedMolecule(mol, etkdgv3) if embed_status == -1: status.write(f"{sdf_file} : rdDistGeom.EmbedMolecule(mol, etkdgv3) = Failed\\n") # Compute Gasteiger charges try: rdPartialCharges.ComputeGasteigerCharges(mol) except: status.write(f"{sdf_file} : rdPartialCharges.ComputeGasteigerCharges(mol) = Failed\\n") # UFF energy minimization try: rdForceFieldHelpers.UFFOptimizeMolecule(mol) except: status.write(f"{sdf_file} : rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed\\n") Chem.MolToMolFile(mol, output_path) ``` |
From: Emre A. <emr...@gm...> - 2024-01-10 13:46:58
|
Hello, I want to convert the 2D ligands I downloaded as sdf format from the ZINC library to 3D, but almost half of them are not converted to 3D. Some of them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 ligands are not converted to 3D in this way. When I run the script, I do not get any warning or error in IDE. When I look at the output of my Try, Except commands, I see "ZINC000008214373.sdf : rdDistGeom.EmbedMolecule(mol, etkdgv3) = Failed ZINC000008214373.sdf : rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed" It outputs like this for ligands that are not translated to 3D. When I try different methods, the ligands are converted to 3D. I wonder if there is something missing or wrong with my script. I would be grateful if you can help me. Thank you! ``` from rdkit import Chem from rdkit.Chem import rdDistGeom from rdkit.Chem import rdForceFieldHelpers from rdkit.Chem import rdPartialCharges import os ligands_dir = "ligands" output_dir = "new_ligands" status_file = "process_status.txt" if not os.path.exists(output_dir): os.makedirs(output_dir) sdf_files = [f for f in os.listdir(ligands_dir) if f.endswith(".sdf")] with open(status_file, 'w') as status: for sdf_file in sdf_files: input_path = os.path.join(ligands_dir, sdf_file) output_path = os.path.join(output_dir, sdf_file) mol = Chem.MolFromMolFile(input_path) # Add hydrogens try: mol = Chem.AddHs(mol, addCoords=True) except: status.write(f"{sdf_file} : Chem.AddHs(mol) = Failed\\n") continue # 3D embedding etkdgv3 = rdDistGeom.ETKDGv3() embed_status = rdDistGeom.EmbedMolecule(mol, etkdgv3) if embed_status == -1: status.write(f"{sdf_file} : rdDistGeom.EmbedMolecule(mol, etkdgv3) = Failed\\n") # Compute Gasteiger charges try: rdPartialCharges.ComputeGasteigerCharges(mol) except: status.write(f"{sdf_file} : rdPartialCharges.ComputeGasteigerCharges(mol) = Failed\\n") # UFF energy minimization try: rdForceFieldHelpers.UFFOptimizeMolecule(mol) except: status.write(f"{sdf_file} : rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed\\n") Chem.MolToMolFile(mol, output_path) ``` |
From: Andrew D. <da...@da...> - 2024-01-10 10:36:32
|
Hi everyone, We have released mmpdb 3.1, which you can get from https://github.com/rdkit/mmpdb . mmpdb 3.0, released May 2023, merged three development tracks: - create and query 1-cut med chem transformations as described in Awale et al., The Playbooks of Medicinal Chemistry Design Moves, J. Chem. Inf. Model. 2021, 61, 2, 729–742 - support indexing large datasets on a distributed cluster - replace the hash-based fingerprint environment with a SMARTS/pseudo-SMILES description Version 3.1 adds support for the 2- and 3-cut med chem transformations described by Awale et al. There are also a few feature improvements, some performance tuning, and bug fixes. See the CHANGELOG for details. Andrew da...@da... |
From: Marawan H. <mar...@ya...> - 2023-12-21 17:55:55
|
Hi, I am trying to use rdkit to replace matched SMARTS patterns in a molecule with a wildcard (*), and return a SMARTS string where the original molecule is an instance of this returned SMARTS string, I tried the following:########from rdkit import Chem def generate_modified_smarts(smiles, smarts_patterns, num_patterns_to_replace): molecule = Chem.MolFromSmiles(smiles) patterns_replaced = 0 for smarts in smarts_patterns: if patterns_replaced >= num_patterns_to_replace: break pattern = Chem.MolFromSmarts(smarts) while molecule.HasSubstructMatch(pattern) and patterns_replaced < num_patterns_to_replace: match_indices = molecule.GetSubstructMatch(pattern) # Extract segments before and after the match before_match, after_match = "", "" if match_indices[0] > 0: before_match = Chem.MolFragmentToSmarts(molecule, atomsToUse=list(range(match_indices[0]))) if match_indices[-1] < molecule.GetNumAtoms() - 1: after_match = Chem.MolFragmentToSmarts(molecule, atomsToUse=list(range(match_indices[-1] + 1, molecule.GetNumAtoms()))) # Combine parts with a wildcard modified_smarts = before_match + '*' + after_match molecule = Chem.MolFromSmarts(modified_smarts) patterns_replaced += 1 return Chem.MolToSmarts(molecule) example_smiles = "CCOC1=C(C=C2C(=C1)N=CC(=C2NC3=CC(=C(C=C3)OCC4=CC=CC=N4)Cl)C#N)NC(=O)C=CCN(C)C"smarts_patterns = ["C=O", "C#N"]num_patterns_to_replace = 2 modified_smarts = generate_modified_smarts(example_smiles, smarts_patterns, num_patterns_to_replace)print(f"Modified molecule SMARTS pattern: {modified_smarts}")####### While it seems to work for C=O, it does not for C#N and the connectivity is messed up for C#N, even if I use it alone, i.e. without the carbonyl. The matched patterns could be anywhere in the molecule and could be more complex than this, but I just tried some simple cases to see how robust is this approach. It worked for "CCO", but did not work when i tried "Cl". I am wondering if this is something you can help with, Marawan |