rdkit-discuss Mailing List for RDKit (Page 4)
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
You can subscribe to this list here.
| 2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
(27) |
Oct
(4) |
Nov
(20) |
Dec
(4) |
| 2008 |
Jan
(12) |
Feb
(2) |
Mar
(23) |
Apr
(40) |
May
(30) |
Jun
(6) |
Jul
(35) |
Aug
(60) |
Sep
(31) |
Oct
(33) |
Nov
(35) |
Dec
(3) |
| 2009 |
Jan
(16) |
Feb
(77) |
Mar
(88) |
Apr
(57) |
May
(33) |
Jun
(27) |
Jul
(55) |
Aug
(26) |
Sep
(12) |
Oct
(45) |
Nov
(42) |
Dec
(23) |
| 2010 |
Jan
(64) |
Feb
(17) |
Mar
(30) |
Apr
(55) |
May
(30) |
Jun
(65) |
Jul
(112) |
Aug
(26) |
Sep
(67) |
Oct
(20) |
Nov
(67) |
Dec
(23) |
| 2011 |
Jan
(57) |
Feb
(43) |
Mar
(50) |
Apr
(66) |
May
(95) |
Jun
(73) |
Jul
(64) |
Aug
(47) |
Sep
(22) |
Oct
(56) |
Nov
(51) |
Dec
(34) |
| 2012 |
Jan
(64) |
Feb
(45) |
Mar
(65) |
Apr
(85) |
May
(76) |
Jun
(47) |
Jul
(75) |
Aug
(72) |
Sep
(31) |
Oct
(77) |
Nov
(61) |
Dec
(41) |
| 2013 |
Jan
(68) |
Feb
(63) |
Mar
(36) |
Apr
(73) |
May
(61) |
Jun
(69) |
Jul
(98) |
Aug
(60) |
Sep
(74) |
Oct
(102) |
Nov
(92) |
Dec
(63) |
| 2014 |
Jan
(112) |
Feb
(84) |
Mar
(72) |
Apr
(59) |
May
(96) |
Jun
(54) |
Jul
(91) |
Aug
(54) |
Sep
(38) |
Oct
(47) |
Nov
(33) |
Dec
(39) |
| 2015 |
Jan
(41) |
Feb
(115) |
Mar
(66) |
Apr
(87) |
May
(63) |
Jun
(53) |
Jul
(61) |
Aug
(59) |
Sep
(115) |
Oct
(42) |
Nov
(60) |
Dec
(20) |
| 2016 |
Jan
(52) |
Feb
(72) |
Mar
(100) |
Apr
(125) |
May
(61) |
Jun
(106) |
Jul
(62) |
Aug
(74) |
Sep
(151) |
Oct
(151) |
Nov
(117) |
Dec
(148) |
| 2017 |
Jan
(106) |
Feb
(75) |
Mar
(106) |
Apr
(67) |
May
(85) |
Jun
(144) |
Jul
(53) |
Aug
(73) |
Sep
(188) |
Oct
(106) |
Nov
(118) |
Dec
(74) |
| 2018 |
Jan
(96) |
Feb
(43) |
Mar
(40) |
Apr
(111) |
May
(77) |
Jun
(112) |
Jul
(64) |
Aug
(85) |
Sep
(73) |
Oct
(117) |
Nov
(97) |
Dec
(47) |
| 2019 |
Jan
(63) |
Feb
(112) |
Mar
(109) |
Apr
(61) |
May
(51) |
Jun
(41) |
Jul
(57) |
Aug
(68) |
Sep
(47) |
Oct
(126) |
Nov
(117) |
Dec
(96) |
| 2020 |
Jan
(84) |
Feb
(82) |
Mar
(80) |
Apr
(100) |
May
(78) |
Jun
(68) |
Jul
(76) |
Aug
(69) |
Sep
(76) |
Oct
(73) |
Nov
(69) |
Dec
(42) |
| 2021 |
Jan
(44) |
Feb
(30) |
Mar
(85) |
Apr
(65) |
May
(41) |
Jun
(72) |
Jul
(55) |
Aug
(9) |
Sep
(44) |
Oct
(44) |
Nov
(30) |
Dec
(40) |
| 2022 |
Jan
(35) |
Feb
(29) |
Mar
(55) |
Apr
(30) |
May
(31) |
Jun
(27) |
Jul
(49) |
Aug
(15) |
Sep
(17) |
Oct
(25) |
Nov
(15) |
Dec
(40) |
| 2023 |
Jan
(32) |
Feb
(10) |
Mar
(10) |
Apr
(21) |
May
(33) |
Jun
(31) |
Jul
(12) |
Aug
(17) |
Sep
(14) |
Oct
(12) |
Nov
(8) |
Dec
(12) |
| 2024 |
Jan
(10) |
Feb
(18) |
Mar
(7) |
Apr
(4) |
May
(6) |
Jun
(4) |
Jul
(5) |
Aug
(6) |
Sep
(8) |
Oct
(1) |
Nov
(1) |
Dec
|
| 2025 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(2) |
Sep
(3) |
Oct
(2) |
Nov
|
Dec
|
|
From: Greg L. <gre...@gm...> - 2024-01-31 13:45:54
|
Hi Nick, Can you provide an example of exactly what you would like to have happen? -greg On Tue, Jan 30, 2024 at 5:46 PM Tomkinson, Nicholas < nic...@as...> wrote: > I am trying to convert a simple V2000 molfile with or without the chiral > flag into a V3000 molfile but this does not create an enhanced stereo > collection in the V3000 molfile. This is a requirement for another > application that does not handle V2000/V3000 mixtures well. Is there anyway > of forcing the writing of the enhanced collection in this context? > > > > Thanks > > > > Nick > > > ------------------------------ > > AstraZeneca UK Limited is a company incorporated in England and Wales with > registered number:03674842 and its registered office at 1 Francis Crick > Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. > > This e-mail and its attachments are intended for the above named recipient > only and may contain confidential and privileged information. If they have > come to you in error, you must not copy or show them to anyone; instead, > please reply to this e-mail, highlighting the error to the sender and then > immediately delete the message. For information about how AstraZeneca UK > Limited and its affiliates may process information, personal data and > monitor communications, please see our privacy notice at > www.astrazeneca.com > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
|
From: Tomkinson, N. <nic...@as...> - 2024-01-31 12:03:03
|
Thanks Tricarico - I was afraid this might be the answer, but thanks for your suggestion. I'm not entirely sure I understand how adding an enhanced stereo collection reflecting the status of the chiral flag when going from V2000 to V3000 is a problem; it would be good to see some examples. I know the chiral flag is a nightmare in when reading V3000 but when reading V2000 if it's not set correctly then the file is broken and setting the enhanced collection doesn't make it more broken. It would be nice if creating an enhanced collection from the chiral flag (when reading V2000 only) was available as an option. Cheers Nick From: Giovanni Tricarico <gio...@gl...> Sent: Wednesday, January 31, 2024 9:55 AM To: Tomkinson, Nicholas <nic...@as...>; rdk...@li... Subject: RE: V2000 to V3000 enhanced stereo question Hello Nick, We faced a (seemingly) related problem a while ago. In our case we were trying to convert V2000 CTABs to CXSMILES, and we were expecting that the V2000 chirality flag would translate to an enhanced stereo string in the CXSMILES. That is not so, by design. See my question, and the answer it got, here: V2000 chiral flag does not seem to be read by Chem.MolFromMolBlock() * Issue #6062 * rdkit/rdkit * GitHub<https://github.com/rdkit/rdkit/issues/6062> I imagine that the reason why the V2000 to V3000 conversion does not use the V2000 chirality flag is conceptually the same, but indeed worth checking. FYI, the practical solution for our workflow was: * create a function 'chiral_flag_from_molblock' that detects if a CTAB is V2000 or V3000; if V2000, reads the flag (by simple text parsing) and returns it (0 or 1), if V3000, returns -1 * create a function 'CTAB_to_CXSMILES' that calls the above; for V3000, the rdkit-generated CXSMILES is (or usually is) already correct; for V2000, if the flag is 1, the SMILES is identical to the CXSMILES; if the flag is 0, the function loops through all atoms, identifies those that have tetrahedral stereochemistry, and uses their indices to put together an '&1' enhanced stereo group string, which is then appended to the SMILES (as a V2000 CTAB with chirality flag 0 can only represent a racemic mixture where all configurations are inverted together, so it only needs one '&' group - of course with all the exceptions and issues you can imagine: meso stereoisomers or moieties, etc) Probably not ideal, but lacking any suggestion or a better 'native' solution, that's what we went for, and it seems to have worked so far. [I'll mention for completeness that we also run a further standardisation function on CXSMILES, which takes care of removing the enhanced stereo flags from meso moieties]. I hope this helps. Regards [cid:image001.png@01DA542E.D3A90830] [cid:image002.png@01DA542E.D3A90830]<https://twitter.com/GalapagosGlobal> [cid:image003.png@01DA542E.D3A90830] <https://www.linkedin.com/company/glpg> [cid:image004.png@01DA542E.D3A90830] <https://www.youtube.com/c/GalapagosGlobal> [cid:image005.png@01DA542E.D3A90830] <https://www.glpg.com/> Giovanni Tricarico Principal Scientist Chemoinformatics +32 15 6514 30<callto:+32%2015%206514%2030> gio...@gl...<mailto:gio...@gl...> Galapagos NV Generaal De Wittelaan L11 A3 2800 Mechelen, Belgium From: Tomkinson, Nicholas <nic...@as...<mailto:nic...@as...>> Sent: Tuesday, January 30, 2024 5:28 PM To: rdk...@li...<mailto:rdk...@li...> Subject: [Rdkit-discuss] V2000 to V3000 enhanced stereo question Some people who received this message don't often get email from nic...@as...<mailto:nic...@as...>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> I am trying to convert a simple V2000 molfile with or without the chiral flag into a V3000 molfile but this does not create an enhanced stereo collection in the V3000 molfile. This is a requirement for another application that does not handle V2000/V3000 mixtures well. Is there anyway of forcing the writing of the enhanced collection in this context? Thanks Nick ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com<https://www.astrazeneca.com/> This e-mail and its attachment(s) (if any) may contain confidential and/or proprietary information and is intended for its addressee(s) only. Any unauthorized use of the information contained herein (including, but not limited to, alteration, reproduction, communication, distribution or any other form of dissemination) is strictly prohibited. If you are not the intended addressee, please notify the originator promptly and delete this e-mail and its attachment(s) (if any) subsequently. Neither Galapagos nor any of its affiliates shall be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message (by a third party) or as a result of a virus being passed on. ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com<https://www.astrazeneca.com> |
|
From: Tomkinson, N. <nic...@as...> - 2024-01-30 16:44:32
|
I am trying to convert a simple V2000 molfile with or without the chiral flag into a V3000 molfile but this does not create an enhanced stereo collection in the V3000 molfile. This is a requirement for another application that does not handle V2000/V3000 mixtures well. Is there anyway of forcing the writing of the enhanced collection in this context? Thanks Nick ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com<https://www.astrazeneca.com> |
|
From: He, A. <he...@bu...> - 2024-01-10 23:34:29
|
Hi Emre! You can get more detailed info on failed conformer generations through rdDistGeom.EmbedFailureCauses, see: https://greglandrum.github.io/rdkit-blog/posts/2023-05-17-understanding-confgen-errors.html Bests, -- Amy He Hadad Lab @ OSU He...@os... From: Emre Apaydın <emr...@gm...> Date: Wednesday, January 10, 2024 at 8:47 AM To: rdk...@li... <rdk...@li...> Subject: [Rdkit-discuss] Ligand conversion problem from 2D to 3D Hello, I want to convert the 2D ligands I downloaded as sdf format from the ZINC library to 3D, but almost half of them are not converted to 3D. Some of them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 ligands are not converted Hello, I want to convert the 2D ligands I downloaded as sdf format from the ZINC library to 3D, but almost half of them are not converted to 3D. Some of them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 ligands are not converted to 3D in this way. When I run the script, I do not get any warning or error in IDE. When I look at the output of my Try, Except commands, I see "ZINC000008214373.sdf : rdDistGeom.EmbedMolecule(mol, etkdgv3) = Failed ZINC000008214373.sdf : rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed" It outputs like this for ligands that are not translated to 3D. When I try different methods, the ligands are converted to 3D. I wonder if there is something missing or wrong with my script. I would be grateful if you can help me. Thank you! ``` from rdkit import Chem from rdkit.Chem import rdDistGeom from rdkit.Chem import rdForceFieldHelpers from rdkit.Chem import rdPartialCharges import os ligands_dir = "ligands" output_dir = "new_ligands" status_file = "process_status.txt" if not os.path.exists(output_dir): os.makedirs(output_dir) sdf_files = [f for f in os.listdir(ligands_dir) if f.endswith(".sdf")] with open(status_file, 'w') as status: for sdf_file in sdf_files: input_path = os.path.join(ligands_dir, sdf_file) output_path = os.path.join(output_dir, sdf_file) mol = Chem.MolFromMolFile(input_path) # Add hydrogens try: mol = Chem.AddHs(mol, addCoords=True) except: status.write(f"{sdf_file} : Chem.AddHs(mol) = Failed\\n") continue # 3D embedding etkdgv3 = rdDistGeom.ETKDGv3() embed_status = rdDistGeom.EmbedMolecule(mol, etkdgv3) if embed_status == -1: status.write(f"{sdf_file} : rdDistGeom.EmbedMolecule(mol, etkdgv3) = Failed\\n") # Compute Gasteiger charges try: rdPartialCharges.ComputeGasteigerCharges(mol) except: status.write(f"{sdf_file} : rdPartialCharges.ComputeGasteigerCharges(mol) = Failed\\n") # UFF energy minimization try: rdForceFieldHelpers.UFFOptimizeMolecule(mol) except: status.write(f"{sdf_file} : rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed\\n") Chem.MolToMolFile(mol, output_path) ``` |
|
From: Emre A. <emr...@gm...> - 2024-01-10 13:46:58
|
Hello,
I want to convert the 2D ligands I downloaded as sdf format from the ZINC
library to 3D, but almost half of them are not converted to 3D. Some of
them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 ligands
are not converted to 3D in this way. When I run the script, I do not get
any warning or error in IDE. When I look at the output of my Try, Except
commands, I see "ZINC000008214373.sdf : rdDistGeom.EmbedMolecule(mol,
etkdgv3) = Failed
ZINC000008214373.sdf : rdForceFieldHelpers.UFFOptimizeMolecule(mol) =
Failed" It outputs like this for ligands that are not translated to 3D.
When I try different methods, the ligands are converted to 3D. I wonder if
there is something missing or wrong with my script. I would be grateful if
you can help me.
Thank you!
```
from rdkit import Chem
from rdkit.Chem import rdDistGeom
from rdkit.Chem import rdForceFieldHelpers
from rdkit.Chem import rdPartialCharges
import os
ligands_dir = "ligands"
output_dir = "new_ligands"
status_file = "process_status.txt"
if not os.path.exists(output_dir):
os.makedirs(output_dir)
sdf_files = [f for f in os.listdir(ligands_dir) if f.endswith(".sdf")]
with open(status_file, 'w') as status:
for sdf_file in sdf_files:
input_path = os.path.join(ligands_dir, sdf_file)
output_path = os.path.join(output_dir, sdf_file)
mol = Chem.MolFromMolFile(input_path)
# Add hydrogens
try:
mol = Chem.AddHs(mol, addCoords=True)
except:
status.write(f"{sdf_file} : Chem.AddHs(mol) = Failed\\n")
continue
# 3D embedding
etkdgv3 = rdDistGeom.ETKDGv3()
embed_status = rdDistGeom.EmbedMolecule(mol, etkdgv3)
if embed_status == -1:
status.write(f"{sdf_file} : rdDistGeom.EmbedMolecule(mol,
etkdgv3) = Failed\\n")
# Compute Gasteiger charges
try:
rdPartialCharges.ComputeGasteigerCharges(mol)
except:
status.write(f"{sdf_file} :
rdPartialCharges.ComputeGasteigerCharges(mol) = Failed\\n")
# UFF energy minimization
try:
rdForceFieldHelpers.UFFOptimizeMolecule(mol)
except:
status.write(f"{sdf_file} :
rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed\\n")
Chem.MolToMolFile(mol, output_path)
```
|
|
From: Andrew D. <da...@da...> - 2024-01-10 10:36:32
|
Hi everyone, We have released mmpdb 3.1, which you can get from https://github.com/rdkit/mmpdb . mmpdb 3.0, released May 2023, merged three development tracks: - create and query 1-cut med chem transformations as described in Awale et al., The Playbooks of Medicinal Chemistry Design Moves, J. Chem. Inf. Model. 2021, 61, 2, 729–742 - support indexing large datasets on a distributed cluster - replace the hash-based fingerprint environment with a SMARTS/pseudo-SMILES description Version 3.1 adds support for the 2- and 3-cut med chem transformations described by Awale et al. There are also a few feature improvements, some performance tuning, and bug fixes. See the CHANGELOG for details. Andrew da...@da... |
|
From: Marawan H. <mar...@ya...> - 2023-12-21 17:55:55
|
Hi,
I am trying to use rdkit to replace matched SMARTS patterns in a molecule with a wildcard (*), and return a SMARTS string where the original molecule is an instance of this returned SMARTS string,
I tried the following:########from rdkit import Chem
def generate_modified_smarts(smiles, smarts_patterns, num_patterns_to_replace): molecule = Chem.MolFromSmiles(smiles) patterns_replaced = 0
for smarts in smarts_patterns: if patterns_replaced >= num_patterns_to_replace: break
pattern = Chem.MolFromSmarts(smarts) while molecule.HasSubstructMatch(pattern) and patterns_replaced < num_patterns_to_replace: match_indices = molecule.GetSubstructMatch(pattern)
# Extract segments before and after the match before_match, after_match = "", "" if match_indices[0] > 0: before_match = Chem.MolFragmentToSmarts(molecule, atomsToUse=list(range(match_indices[0]))) if match_indices[-1] < molecule.GetNumAtoms() - 1: after_match = Chem.MolFragmentToSmarts(molecule, atomsToUse=list(range(match_indices[-1] + 1, molecule.GetNumAtoms())))
# Combine parts with a wildcard modified_smarts = before_match + '*' + after_match molecule = Chem.MolFromSmarts(modified_smarts) patterns_replaced += 1
return Chem.MolToSmarts(molecule)
example_smiles = "CCOC1=C(C=C2C(=C1)N=CC(=C2NC3=CC(=C(C=C3)OCC4=CC=CC=N4)Cl)C#N)NC(=O)C=CCN(C)C"smarts_patterns = ["C=O", "C#N"]num_patterns_to_replace = 2
modified_smarts = generate_modified_smarts(example_smiles, smarts_patterns, num_patterns_to_replace)print(f"Modified molecule SMARTS pattern: {modified_smarts}")#######
While it seems to work for C=O, it does not for C#N and the connectivity is messed up for C#N, even if I use it alone, i.e. without the carbonyl. The matched patterns could be anywhere in the molecule and could be more complex than this, but I just tried some simple cases to see how robust is this approach. It worked for "CCO", but did not work when i tried "Cl".
I am wondering if this is something you can help with,
Marawan |
|
From: Greg L. <gre...@gm...> - 2023-12-19 03:38:22
|
Hi Marawan, We don't currently support CSRML. It is certainly an interesting and flexible format, so it would be cool to have, but it would be a fair amount of work to implement. -greg On Tue, Dec 19, 2023 at 4:19 AM Marawan Hussien via Rdkit-discuss < rdk...@li...> wrote: > Hello, > > I am wondering if standard rdkit supports CSRML, I would like to encode > the toxprint chemotypes as binary fingerprints for a bunch of molecules to > train on, > > Thanks, > Marawan > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
|
From: Marawan H. <mar...@ya...> - 2023-12-19 03:16:26
|
Hello, I am wondering if standard rdkit supports CSRML, I would like to encode the toxprint chemotypes as binary fingerprints for a bunch of molecules to train on, Thanks,Marawan |
|
From: Pavel P. <pav...@uk...> - 2023-12-15 08:15:26
|
Dear colleagues, we are happy to invite you to the 7th Advanced In Silico Drug Design workshop which will be 29 January - 02 February 2024 in Olomouc. This year we cover topics on: - PDBe database services - virtual screening - machine learning and AI - structure- and ligand-based drug design tools - pharmacophore modeling - molecular docking and dynamics - de novo design - frequent hitters - natural compounds and many more. Lectures and tutorials will be provided by 12 experts in the field from Austria, Great Britain, Italy and Czech Republic. At the final day participants may win prizes in a competition where they would be able to apply gained knowledge and demonstrate their skills. The web-site of the workshop https://www.kfc.upol.cz/7add. Welcome to Olomouc! Kind regards, Pavel |
|
From: Mandar K. <man...@gm...> - 2023-12-14 03:41:26
|
Hello Everyone,
Thanks a lot for your detailed replies and suggestions. Thanks, Andrew, for
the code block; that also helped me understand what was happening.
I just realized my replies are not sent to the rdkit-discussion group. So,
I am summarizing the origin of the warning and solution below for future
readers:
The output SDFs from the docking program do not have a second line
mentioning 2D or 3D structure, so RDKit SDMolSupplier raised a warning but
treated a structure as 3D.
I processed these SDFs with obabel to add hydrogens, and then the second
line is present, mentioning 3D structure as indicated by Jan Jensen in
previous thread.
Best,
Mandar Kulkarni
On Wed, Dec 13, 2023 at 4:28 PM Andrew Dalke <da...@da...>
wrote:
> > On Dec 13, 2023, at 06:46, Mandar Kulkarni <
> man...@gm...> wrote:
> > Thanks a lot for the detailed answer. In my case, this line is blank,
> probably leading to warnings.
>
> Here's what RDKit does, from Code/GraphMol/FileParsers/MolFileParser.cpp
>
> If the dimension code is not 2D and not 3D it assumes 2D:
>
> if (tempStr.length() >= 22) {
> std::string dimLabel = tempStr.substr(20, 2);
> // Unless labelled as 3D we assume 2D
> if (dimLabel == "3d" || dimLabel == "3D") {
> res->setProp(common_properties::_3DConf, 1);
> }
> }
>
>
> Then, if there are non-zero coordinates and it's not marked as 2D, it
> issues the warning:
>
> bool nonzeroZ = hasNonZeroZCoords(conf);
>
> if (!nonzeroZ && marked3d == 1) {
> .. I'm skipping the code for this case ..
> } else if (marked3d == 0 && nonzeroZ) {
> BOOST_LOG(rdWarningLog)
> << "Warning: molecule is tagged as 2D, but at least one Z
> coordinate is not zero. "
> "Marking the mol as 3D."
> << std::endl;
> return true;
> }
>
> Andrew
> da...@da...
>
>
>
|
|
From: Jan H. J. <ja...@bi...> - 2023-12-13 07:22:04
|
You can also cross-check with standard InChI to see if this is an RDKit issue or a more general InChI issue. To convert InChI strings (and optionally AuxInfo) to SDF format with the standard inchi-1 executable, put the InChI string and AuxInfo into a text file and convert it like this. P:\Projects\RInChI\INCHI-1-BIN__V1.06\windows\32bit>*type test.txt* InChI=1/Ca.2H AuxInfo=1/0/N:1;2;3/rA:3Ca0H0H0/rB:;;/rC:;;; P:\Projects\RInChI\INCHI-1-BIN__V1.06\windows\32bit>*inchi-1.exe /InChI2Struct /OutputSDF test.txt* InChI version 1, Software v. 1.06 (inchi-1 executable) Windows 32-bit Build (MS VS 2015) of Dec 18 2020 20:45:14 Opened log file 'test.txt.log' Opened input file 'test.txt' Opened output file 'test.txt.txt' Opened problem file 'test.txt.prb' The command line used: "inchi-1.exe /InChI2Struct /OutputSDF test.txt" Converting InChI(s) to structure(s) in MOL format Output SDfile only without stereochemical information and atom coordinates Input format: InChI (plain identifier) Output format: SDfile only (without stereochemical info and atom coordinates) Timeout per structure: 60000 msec Up to 1024 atoms per structure Finished processing 1 structure: 0 errors, processing time 0:00:00.00 Elapsed walltime: 15 msec. P:\Projects\RInChI\INCHI-1-BIN__V1.06\windows\32bit>type test.txt.txt Structure #1. InChIV10 3 0 0 0 0 0 0 0 0 0 1 V2000 0.0000 0.0000 0.0000 Ca 0 0 0 0 15 0 0 0 0 0.0000 0.0000 0.0000 H 0 0 0 0 15 0 0 0 0 0.0000 0.0000 0.0000 H 0 0 0 0 15 0 0 0 0 M END $$$$ P:\Projects\RInChI\INCHI-1-BIN__V1.06\windows\32bit> Cheers -- Jan On 2023-12-12 07:59, S Joshua Swamidass wrote: > Perhaps provide some examples were this failure happens. > > Sent from Gmail Mobile > > > On Tue, Nov 28, 2023 at 7:35 PM 李大舟 <lid...@sy...> wrote: > > Dear RDKit Developers and Maintainers, > > I hope this email finds you well. My name is Dr. Dazhou Li, and I > am a researcher working on the development of a tool for > extracting chemical compound structures recognized by OCR (Optical > Character Recognition) technology. I have been using the RDKit > library for a crucial step in this process, specifically the > rdkit.Chem.inchi.MolFromInchi() function, to convert InChI-format > strings into Mol format representations. > > Firstly, I would like to express my gratitude for the excellent > work you have done in developing and maintaining the RDKit > library, which has been an invaluable resource in my research. The > library has consistently delivered high-quality results in various > aspects of chemical informatics, and I appreciate your dedication > to its development. > > However, I have encountered a specific issue with the > rdkit.Chem.inchi.MolFromInchi() function that I hope you can help > me understand and resolve. When attempting to convert InChI-format > strings generated by my tool, some of them fail with an error > message reporting "NaN." Since the rdkit.Chem.inchi.MolFromInchi() > function calls C++ code, I am unable to directly inspect its > execution or source code to diagnose the issue. > > My primary request is for assistance in understanding the internal > workings of the rdkit.Chem.inchi.MolFromInchi() function, > specifically the checking process or generation step that leads to > the "NaN" error when certain InChI-format strings are processed. > It is crucial for my research to determine at which point in the > execution of this function my generated InChI-formatted strings > are considered unreasonable, as this information will help me > refine my tool's output to be compatible with RDKit. > > I understand that the RDKit library is a complex and comprehensive > toolkit, and I appreciate the complexity involved in diagnosing > such issues. However, any insights or guidance you can provide > regarding the problematic cases and the internal processes of the > rdkit.Chem.inchi.MolFromInchi() function would be immensely > valuable to me and would help me ensure the compatibility of my > tool with RDKit. > > If possible, I would be grateful for access to relevant > documentation or insights into the specific error conditions that > may lead to the "NaN" result. Additionally, any suggestions or > best practices for generating InChI-format strings that are more > likely to be successfully processed by RDKit would be greatly > appreciated. > > Thank you for your time and consideration. I look forward to your > response and hope that we can collaborate to resolve this issue > and enhance the compatibility of my tool with the RDKit library. > > Please feel free to reach out to me if you require any additional > information or if there are specific details about my tool or the > InChI-format strings that would aid in diagnosing the issue. > > Best regards, > > Dr. Dazhou Li > Shenyang University of Chemical Technology > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
|
From: Jan H. J. <ja...@bi...> - 2023-12-13 07:21:25
|
> I could not figure out how Rdkit is guessing it as 2D structure, as there is no such information in SDF. Ah, but there is, just a little hidden :-). The source-and-timestamp line of each molfile in the SDF contains that information. The line is the second line of the molfile and you can find the 2D/3D tag as the last two characters of that line. An example: :MOLFILE_BEGINS: Mol2Comp0618061807*2D* 1 0 0 0 0 0 0 0 0 0999 V2000 10.2967 -1.5283 0.0000 Ar 0 0 0 0 0 0 0 0 0 0 0 0 M END :MOLFILE_ENDS: Cheers -- Jan On 2023-12-13 03:39, Mandar Kulkarni wrote: > Hello, > > I am using RDKit 2023.9.2's SDMolSupplier to read docked SDF files > (V2000 formats, suppl = Chem.SDMolSupplier(sdf_file);) and getting a > warning as: > > Warning: molecule is tagged as 2D, but at least one Z coordinate is not zero. Marking the mol as 3D. > > I could not figure out how Rdkit is guessing it as 2D structure, as > there is no such information in SDF. > Is there any more information need to provide SDMolSupplier to make it > understand it's a 3D molecule? > I kindly look forward to hearing to suggestions. > TIhanks in advance, > Mandar Kulkarni |
|
From: Andrew D. <da...@da...> - 2023-12-13 05:55:29
|
Hi Mandar,
> On Dec 13, 2023, at 03:39, Mandar Kulkarni <man...@gm...> wrote:
> I could not figure out how Rdkit is guessing it as 2D structure, as there is no such information in SDF.
Line 2 of the SDF record looks something like:
RDKit 2D
This line has the format (quoting from the documentation):
IIPPPPPPPPMMDDYYHHmmddSSssssssssssEEEEEEEEEEEERRRRRR A2<--A8--><---A10-->A2I2<--F10.5-><---F12.5--><-I6-> )
where "User's first and last initials (l), program name (P), date/time (M/D/Y,H:m), dimensional codes (d) such as 2D or 3D, scaling factors (S, s), energy (E) if modeling program input, internal registry number (R)"
In the example I gave, most of these fields are blank, except for the program name ("RDKit") and the dimension code "2D" in columns 21-22 (if I counted right). The dimension code indicates the structure is expected to be in 2D.
> Is there any more information need to provide SDMolSupplier to make it understand it's a 3D molecule?
It is only a warning. RDKit interprets the molecule as 3D despite the warning.
The file format documentation also says: "The “dimensional code” is maintained explicitly. Thus “3D” really means 3D, although “2D” will be interpreted as 3D if any non-zero Z-coordinates are found.
Best regards,
Andrew
da...@da...
|
|
From: Mandar K. <man...@gm...> - 2023-12-13 02:40:31
|
Hello, I am using RDKit 2023.9.2's SDMolSupplier to read docked SDF files (V2000 formats, suppl = Chem.SDMolSupplier(sdf_file);) and getting a warning as: Warning: molecule is tagged as 2D, but at least one Z coordinate is not zero. Marking the mol as 3D. I could not figure out how Rdkit is guessing it as 2D structure, as there is no such information in SDF. Is there any more information need to provide SDMolSupplier to make it understand it's a 3D molecule? I kindly look forward to hearing to suggestions. TIhanks in advance, Mandar Kulkarni |
|
From: S J. S. <swa...@gm...> - 2023-12-12 06:59:28
|
Perhaps provide some examples were this failure happens. Sent from Gmail Mobile On Tue, Nov 28, 2023 at 7:35 PM 李大舟 <lid...@sy...> wrote: > Dear RDKit Developers and Maintainers, > > I hope this email finds you well. My name is Dr. Dazhou Li, and I am a > researcher working on the development of a tool for extracting chemical > compound structures recognized by OCR (Optical Character Recognition) > technology. I have been using the RDKit library for a crucial step in this > process, specifically the rdkit.Chem.inchi.MolFromInchi() function, to > convert InChI-format strings into Mol format representations. > > Firstly, I would like to express my gratitude for the excellent work you > have done in developing and maintaining the RDKit library, which has been > an invaluable resource in my research. The library has consistently > delivered high-quality results in various aspects of chemical informatics, > and I appreciate your dedication to its development. > > However, I have encountered a specific issue with the > rdkit.Chem.inchi.MolFromInchi() function that I hope you can help me > understand and resolve. When attempting to convert InChI-format strings > generated by my tool, some of them fail with an error message reporting > "NaN." Since the rdkit.Chem.inchi.MolFromInchi() function calls C++ code, I > am unable to directly inspect its execution or source code to diagnose the > issue. > > My primary request is for assistance in understanding the internal > workings of the rdkit.Chem.inchi.MolFromInchi() function, specifically the > checking process or generation step that leads to the "NaN" error when > certain InChI-format strings are processed. It is crucial for my research > to determine at which point in the execution of this function my generated > InChI-formatted strings are considered unreasonable, as this information > will help me refine my tool's output to be compatible with RDKit. > > I understand that the RDKit library is a complex and comprehensive > toolkit, and I appreciate the complexity involved in diagnosing such > issues. However, any insights or guidance you can provide regarding the > problematic cases and the internal processes of the > rdkit.Chem.inchi.MolFromInchi() function would be immensely valuable to me > and would help me ensure the compatibility of my tool with RDKit. > > If possible, I would be grateful for access to relevant documentation or > insights into the specific error conditions that may lead to the "NaN" > result. Additionally, any suggestions or best practices for generating > InChI-format strings that are more likely to be successfully processed by > RDKit would be greatly appreciated. > > Thank you for your time and consideration. I look forward to your response > and hope that we can collaborate to resolve this issue and enhance the > compatibility of my tool with the RDKit library. > > Please feel free to reach out to me if you require any additional > information or if there are specific details about my tool or the > InChI-format strings that would aid in diagnosing the issue. > > Best regards, > > Dr. Dazhou Li > Shenyang University of Chemical Technology > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
|
From: Greg L. <gre...@gm...> - 2023-12-10 06:37:42
|
Dear all, The 2024 RDKit UGM will take place from 11-13 September in Zurich Switzerland. We'll post more information and open registration in Q1 of next year. Best regards, -greg |
|
From: Pavel P. <pav...@uk...> - 2023-12-01 06:47:38
|
Dear colleagues, we are happy to invite you to the 7th Advanced In Silico Drug Design workshop which will be 29 January - 02 February 2024 in Olomouc. This year we cover topics on: - PDBe database services - virtual screening - machine learning and AI - structure- and ligand-based drug design tools - pharmacophore modeling - molecular docking and dynamics - de novo design - frequent hitters - natural compounds and many more. Lectures and tutorials will be provided by 12 experts in the field from Austria, Great Britain, Italy and Czech Republic. At the final day participants may win prizes in a competition where they would be able to apply gained knowledge and demonstrate their skills. The web-site of the workshop https://www.kfc.upol.cz/7add. Welcome to Olomouc! Kind regards, Pavel |
|
From: <lid...@sy...> - 2023-11-29 01:33:09
|
Dear RDKit Developers and Maintainers, I hope this email finds you well. My name is Dr. Dazhou Li, and I am a researcher working on the development of a tool for extracting chemical compound structures recognized by OCR (Optical Character Recognition) technology. I have been using the RDKit library for a crucial step in this process, specifically the rdkit.Chem.inchi.MolFromInchi() function, to convert InChI-format strings into Mol format representations. Firstly, I would like to express my gratitude for the excellent work you have done in developing and maintaining the RDKit library, which has been an invaluable resource in my research. The library has consistently delivered high-quality results in various aspects of chemical informatics, and I appreciate your dedication to its development. However, I have encountered a specific issue with the rdkit.Chem.inchi.MolFromInchi() function that I hope you can help me understand and resolve. When attempting to convert InChI-format strings generated by my tool, some of them fail with an error message reporting "NaN." Since the rdkit.Chem.inchi.MolFromInchi() function calls C++ code, I am unable to directly inspect its execution or source code to diagnose the issue. My primary request is for assistance in understanding the internal workings of the rdkit.Chem.inchi.MolFromInchi() function, specifically the checking process or generation step that leads to the "NaN" error when certain InChI-format strings are processed. It is crucial for my research to determine at which point in the execution of this function my generated InChI-formatted strings are considered unreasonable, as this information will help me refine my tool's output to be compatible with RDKit. I understand that the RDKit library is a complex and comprehensive toolkit, and I appreciate the complexity involved in diagnosing such issues. However, any insights or guidance you can provide regarding the problematic cases and the internal processes of the rdkit.Chem.inchi.MolFromInchi() function would be immensely valuable to me and would help me ensure the compatibility of my tool with RDKit. If possible, I would be grateful for access to relevant documentation or insights into the specific error conditions that may lead to the "NaN" result. Additionally, any suggestions or best practices for generating InChI-format strings that are more likely to be successfully processed by RDKit would be greatly appreciated. Thank you for your time and consideration. I look forward to your response and hope that we can collaborate to resolve this issue and enhance the compatibility of my tool with the RDKit library. Please feel free to reach out to me if you require any additional information or if there are specific details about my tool or the InChI-format strings that would aid in diagnosing the issue. Best regards, Dr. Dazhou Li Shenyang University of Chemical Technology |
|
From: Ling C. <lin...@gm...> - 2023-11-20 05:23:22
|
Thank you Christian! This is good to know.
The meaning of the number after "OH" is not defined in the web page. I'll
take a look at the publication. But yes, I get the idea.
Ling
Christian Meyenburg <chr...@un...> 於 2023年11月18日週六
下午12:14寫道:
> Hi Ling,
>
> On 2023-11-17 18:40, Ling Chan wrote:
> > When I run MolToSmiles on a molecule with a 6-valenced sulfur, it
> > produced a problematic smiles. Seems it's a bug?
> >
> > [...]
> >
> > Running "Chem.MolToSmiles(Chem.MolFromMolFile("mol.sdf"))" on the
> > following sdf, I got
> > 'C[S@OH16](F)(F)(F)(F)F'
>
> The SMILES looks correct to me. Have a look at the SMILES documentation
> [0] regarding the Chiral Specification of octahedral structures (3.3.4).
>
> The OH16 does in fact *not* represent a Hydroxy group or something of
> the sort, but is a closer description of the octahedral geometry.
>
> Best,
> Chris
>
> [0] https://daylight.com/dayhtml/doc/theory/theory.smiles.html
>
> --
> Christian Meyenburg
>
> ZBH - Zentrum für Bioinformatik Hamburg
> Universität Hamburg
> Bundesstrasse 43
> D-20146 Hamburg
> Germany
>
> Tel.: +49 40 42838 7353
> Fax.: +49 40 23951-2291
> e-Mail: chr...@un...
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
|
|
From: Christian M. <chr...@un...> - 2023-11-18 20:11:46
|
Hi Ling,
On 2023-11-17 18:40, Ling Chan wrote:
> When I run MolToSmiles on a molecule with a 6-valenced sulfur, it
> produced a problematic smiles. Seems it's a bug?
>
> [...]
>
> Running "Chem.MolToSmiles(Chem.MolFromMolFile("mol.sdf"))" on the
> following sdf, I got
> 'C[S@OH16](F)(F)(F)(F)F'
The SMILES looks correct to me. Have a look at the SMILES documentation
[0] regarding the Chiral Specification of octahedral structures (3.3.4).
The OH16 does in fact *not* represent a Hydroxy group or something of
the sort, but is a closer description of the octahedral geometry.
Best,
Chris
[0] https://daylight.com/dayhtml/doc/theory/theory.smiles.html
--
Christian Meyenburg
ZBH - Zentrum für Bioinformatik Hamburg
Universität Hamburg
Bundesstrasse 43
D-20146 Hamburg
Germany
Tel.: +49 40 42838 7353
Fax.: +49 40 23951-2291
e-Mail: chr...@un... |
|
From: Ling C. <lin...@gm...> - 2023-11-17 17:40:31
|
Dear colleagues,
When I run MolToSmiles on a molecule with a 6-valenced sulfur, it produced
a problematic smiles. Seems it's a bug?
Thanks.
Ling
Running "Chem.MolToSmiles(Chem.MolFromMolFile("mol.sdf"))" on the following
sdf, I got
'C[S@OH16](F)(F)(F)(F)F'
-------------------------------------------------------------------------------------------------------------
RDKit 3D
7 6 0 0 1 0 0 0 0 0999 V2000
-2.0677 2.3607 1.4012 F 0 0 0 0 0 0 0 0 0 0 0 0
-1.3050 2.6304 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0
0.0161 3.0975 0.8090 F 0 0 0 0 0 0 0 0 0 0 0 0
-0.5423 2.9001 -1.4012 F 0 0 0 0 0 0 0 0 0 0 0 0
-1.8443 4.1559 -0.0000 F 0 0 0 0 0 0 0 0 0 0 0 0
-2.6261 2.1634 -0.8090 F 0 0 0 0 0 0 0 0 0 0 0 0
-0.7117 0.9522 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0
2 3 1 0
2 4 1 0
2 5 1 0
2 6 1 0
2 7 1 0
M END
$$$$
|
|
From: Noel O'B. <bao...@gm...> - 2023-11-10 10:06:27
|
Hi all, I'll keep it short: we're currently recruiting for a Cheminformatician at Sosei Heptares. We have no specific grade in mind - you could be just finishing your PhD, or be much more experienced. Here's the link for more details: https://cezanneondemand.intervieweb.it/heptares/jobs/cheminformatician-computational-chemistry-team-37704/en/ Happy to answer questions about the position - just email me off-list. Regards, Noel |
|
From: Chris S. <sw...@ma...> - 2023-11-05 10:49:29
|
Thanks Chris > On 5 Nov 2023, at 09:23, Wim Dehaen <wim...@gm...> wrote: > > how about: > len(list(mol.GetAromaticAtoms())) > > best wishes > wim > > > On Sun, 5 Nov 2023, 08:41 Chris Swain via Rdkit-discuss, <rdk...@li... <mailto:rdk...@li...>> wrote: >> Hi, >> >> Perhaps I’m missing something obvious, but is there a way to calculate the number of aromatic atoms in a molecule? >> >> Cheers >> >> Chris >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdk...@li... <mailto:Rdk...@li...> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |
|
From: Wim D. <wim...@gm...> - 2023-11-05 09:23:58
|
how about: len(list(mol.GetAromaticAtoms())) best wishes wim On Sun, 5 Nov 2023, 08:41 Chris Swain via Rdkit-discuss, < rdk...@li...> wrote: > Hi, > > Perhaps I’m missing something obvious, but is there a way to calculate the > number of aromatic atoms in a molecule? > > Cheers > > Chris > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |