rdkit-discuss Mailing List for RDKit
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
(27) |
Oct
(4) |
Nov
(20) |
Dec
(4) |
2008 |
Jan
(12) |
Feb
(2) |
Mar
(23) |
Apr
(40) |
May
(30) |
Jun
(6) |
Jul
(35) |
Aug
(60) |
Sep
(31) |
Oct
(33) |
Nov
(35) |
Dec
(3) |
2009 |
Jan
(16) |
Feb
(77) |
Mar
(88) |
Apr
(57) |
May
(33) |
Jun
(27) |
Jul
(55) |
Aug
(26) |
Sep
(12) |
Oct
(45) |
Nov
(42) |
Dec
(23) |
2010 |
Jan
(64) |
Feb
(17) |
Mar
(30) |
Apr
(55) |
May
(30) |
Jun
(65) |
Jul
(112) |
Aug
(26) |
Sep
(67) |
Oct
(20) |
Nov
(67) |
Dec
(23) |
2011 |
Jan
(57) |
Feb
(43) |
Mar
(50) |
Apr
(66) |
May
(95) |
Jun
(73) |
Jul
(64) |
Aug
(47) |
Sep
(22) |
Oct
(56) |
Nov
(51) |
Dec
(34) |
2012 |
Jan
(64) |
Feb
(45) |
Mar
(65) |
Apr
(85) |
May
(76) |
Jun
(47) |
Jul
(75) |
Aug
(72) |
Sep
(31) |
Oct
(77) |
Nov
(61) |
Dec
(41) |
2013 |
Jan
(68) |
Feb
(63) |
Mar
(36) |
Apr
(73) |
May
(61) |
Jun
(69) |
Jul
(98) |
Aug
(60) |
Sep
(74) |
Oct
(102) |
Nov
(92) |
Dec
(63) |
2014 |
Jan
(112) |
Feb
(84) |
Mar
(72) |
Apr
(59) |
May
(96) |
Jun
(54) |
Jul
(91) |
Aug
(54) |
Sep
(38) |
Oct
(47) |
Nov
(33) |
Dec
(39) |
2015 |
Jan
(41) |
Feb
(115) |
Mar
(66) |
Apr
(87) |
May
(63) |
Jun
(53) |
Jul
(61) |
Aug
(59) |
Sep
(115) |
Oct
(42) |
Nov
(60) |
Dec
(20) |
2016 |
Jan
(52) |
Feb
(72) |
Mar
(100) |
Apr
(125) |
May
(61) |
Jun
(106) |
Jul
(62) |
Aug
(74) |
Sep
(151) |
Oct
(151) |
Nov
(117) |
Dec
(148) |
2017 |
Jan
(106) |
Feb
(75) |
Mar
(106) |
Apr
(67) |
May
(85) |
Jun
(144) |
Jul
(53) |
Aug
(73) |
Sep
(188) |
Oct
(106) |
Nov
(118) |
Dec
(74) |
2018 |
Jan
(96) |
Feb
(43) |
Mar
(40) |
Apr
(111) |
May
(77) |
Jun
(112) |
Jul
(64) |
Aug
(85) |
Sep
(73) |
Oct
(117) |
Nov
(97) |
Dec
(47) |
2019 |
Jan
(63) |
Feb
(112) |
Mar
(109) |
Apr
(61) |
May
(51) |
Jun
(41) |
Jul
(57) |
Aug
(68) |
Sep
(47) |
Oct
(126) |
Nov
(117) |
Dec
(96) |
2020 |
Jan
(84) |
Feb
(82) |
Mar
(80) |
Apr
(100) |
May
(78) |
Jun
(68) |
Jul
(76) |
Aug
(69) |
Sep
(76) |
Oct
(73) |
Nov
(69) |
Dec
(42) |
2021 |
Jan
(44) |
Feb
(30) |
Mar
(85) |
Apr
(65) |
May
(41) |
Jun
(72) |
Jul
(55) |
Aug
(9) |
Sep
(44) |
Oct
(44) |
Nov
(30) |
Dec
(40) |
2022 |
Jan
(35) |
Feb
(29) |
Mar
(55) |
Apr
(30) |
May
(31) |
Jun
(27) |
Jul
(49) |
Aug
(15) |
Sep
(17) |
Oct
(25) |
Nov
(15) |
Dec
(40) |
2023 |
Jan
(32) |
Feb
(10) |
Mar
(10) |
Apr
(21) |
May
(33) |
Jun
(31) |
Jul
(12) |
Aug
(17) |
Sep
(14) |
Oct
(12) |
Nov
(8) |
Dec
(12) |
2024 |
Jan
(10) |
Feb
(18) |
Mar
(7) |
Apr
(4) |
May
(6) |
Jun
(4) |
Jul
(5) |
Aug
(6) |
Sep
(8) |
Oct
(1) |
Nov
(1) |
Dec
|
2025 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Pavel P. <pav...@uk...> - 2025-03-28 07:57:30
|
Thank you, Wim. It works. Even a simpler solution can be to remove all atoms except required ones. I had to guess :) However, this is a bug in the recent RDKit versions. The function MolFragmentToSmiles works correctly in version 2023, but not in 2024. On 28/03/2025 00:10, Wim Dehaen wrote: > Pavel, > this is a bit hacky, but you can try the below: > ``` > def get_frag_smi(mol,frag_atoms): > if len(frag_atoms) > 1: > b2b = [] # bonds to break > fsmi = "" #fragment smiles > # get bonds outside of fragment > for b in mol.GetBonds(): > b_idx = b.GetBeginAtomIdx() > e_idx = b.GetEndAtomIdx() > if e_idx not in frag_atoms\ > or b_idx not in frag_atoms: > b2b.append(b.GetIdx()) > # break all bonds except those in fragments > fmol = Chem.FragmentOnBonds(mol,b2b,addDummies=0) > smis = Chem.MolToSmiles(fmol).split(".") > # retain the only fragment with more than one atom in there > while fsmi == "": > smi = smis.pop(0) > m = Chem.MolFromSmiles(smi,sanitize=False) > if len(m.GetAtoms()) > 1: > fsmi = smi > else: #one atom, no canonicalize needed > fsmi = Chem.MolFragmentToSmiles(mol, frag_atoms) > return fsmi > ``` > it is based on the observation/assumption that FragmentOnBonds() and > then MolToSmiles() canonizes the fragments cleanly. > > print(get_frag_smi(mol,[1,2,3,17])) > > print(get_frag_smi(mol,[9,10,11,12])) > prints `cN(c)O` twice. > > best wishes, > wim > > On Thu, Mar 27, 2025 at 12:23 PM Pavel Polishchuk > <pav...@uk...> wrote: > > Hello, > > I encountered an issue with SMILES of fragments. Maybe someone > may suggest a workaround. > I attached the notebook, but will also reproduce some code here. > > We have a structure with two Ns and we take an N atom and > adjacent atoms to make a fragment SMILES and got different > results, while SMILES represent the same pattern (only the order > of atoms is different). I guess this happens due to > canonicalization algorithm, which takes into account some > additional information missing in the output SMILES (e.g. ring > membership). For instance, if we break a saturated cycle (bond > 8-9), we get identical SMILES output. > > mol = Chem.MolFromSmiles('CCn1c2cccc3CCn(c23)c2ccccc12') > > > print(Chem.MolFragmentToSmiles(mol, [1,2,3,17], canonical=True)) > print(Chem.MolFragmentToSmiles(mol, [9,10,11,12], canonical=True)) > > cN(C)c > cN(c)C > > So, the question is how to workaround this issue? We already > have millions of such patterns. So, it will work if we will be > able to canonicalize them. However, standard canonicalization does > not work, because we have disable sanitization during SMILES > parsing. It returns the same output as input SMILES. Any ideas are > appreciated. > > print(Chem.MolToSmiles(Chem.MolFromSmiles('cN(C)c', sanitize=False))) > print(Chem.MolToSmiles(Chem.MolFromSmiles('cN(c)C', sanitize=False))) > > cN(C)c > cN(c)C > > This issue actually came from the code of identification of > functional groups. > > Kind regards, > Pavel > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Wim D. <wim...@gm...> - 2025-03-27 23:10:53
|
Pavel, this is a bit hacky, but you can try the below: ``` def get_frag_smi(mol,frag_atoms): if len(frag_atoms) > 1: b2b = [] # bonds to break fsmi = "" #fragment smiles # get bonds outside of fragment for b in mol.GetBonds(): b_idx = b.GetBeginAtomIdx() e_idx = b.GetEndAtomIdx() if e_idx not in frag_atoms\ or b_idx not in frag_atoms: b2b.append(b.GetIdx()) # break all bonds except those in fragments fmol = Chem.FragmentOnBonds(mol,b2b,addDummies=0) smis = Chem.MolToSmiles(fmol).split(".") # retain the only fragment with more than one atom in there while fsmi == "": smi = smis.pop(0) m = Chem.MolFromSmiles(smi,sanitize=False) if len(m.GetAtoms()) > 1: fsmi = smi else: #one atom, no canonicalize needed fsmi = Chem.MolFragmentToSmiles(mol, frag_atoms) return fsmi ``` it is based on the observation/assumption that FragmentOnBonds() and then MolToSmiles() canonizes the fragments cleanly. > print(get_frag_smi(mol,[1,2,3,17])) > print(get_frag_smi(mol,[9,10,11,12])) prints `cN(c)O` twice. best wishes, wim On Thu, Mar 27, 2025 at 12:23 PM Pavel Polishchuk <pav...@uk...> wrote: > Hello, > > I encountered an issue with SMILES of fragments. Maybe someone may > suggest a workaround. > I attached the notebook, but will also reproduce some code here. > > We have a structure with two Ns and we take an N atom and adjacent atoms > to make a fragment SMILES and got different results, while SMILES represent > the same pattern (only the order of atoms is different). I guess this > happens due to canonicalization algorithm, which takes into account some > additional information missing in the output SMILES (e.g. ring membership). > For instance, if we break a saturated cycle (bond 8-9), we get identical > SMILES output. > > mol = Chem.MolFromSmiles('CCn1c2cccc3CCn(c23)c2ccccc12') > > > print(Chem.MolFragmentToSmiles(mol, [1,2,3,17], canonical=True)) > print(Chem.MolFragmentToSmiles(mol, [9,10,11,12], canonical=True)) > > cN(C)c > cN(c)C > > So, the question is how to workaround this issue? We already have > millions of such patterns. So, it will work if we will be able to > canonicalize them. However, standard canonicalization does not work, > because we have disable sanitization during SMILES parsing. It returns the > same output as input SMILES. Any ideas are appreciated. > > print(Chem.MolToSmiles(Chem.MolFromSmiles('cN(C)c', sanitize=False))) > print(Chem.MolToSmiles(Chem.MolFromSmiles('cN(c)C', sanitize=False))) > > cN(C)c > cN(c)C > > This issue actually came from the code of identification of functional > groups. > > Kind regards, > Pavel > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Pavel P. <pav...@uk...> - 2025-03-27 11:19:34
|
Hello, I encountered an issue with SMILES of fragments. Maybe someone may suggest a workaround. I attached the notebook, but will also reproduce some code here. We have a structure with two Ns and we take an N atom and adjacent atoms to make a fragment SMILES and got different results, while SMILES represent the same pattern (only the order of atoms is different). I guess this happens due to canonicalization algorithm, which takes into account some additional information missing in the output SMILES (e.g. ring membership). For instance, if we break a saturated cycle (bond 8-9), we get identical SMILES output. mol = Chem.MolFromSmiles('CCn1c2cccc3CCn(c23)c2ccccc12') print(Chem.MolFragmentToSmiles(mol, [1,2,3,17], canonical=True)) print(Chem.MolFragmentToSmiles(mol, [9,10,11,12], canonical=True)) cN(C)c cN(c)C So, the question is how to workaround this issue? We already have millions of such patterns. So, it will work if we will be able to canonicalize them. However, standard canonicalization does not work, because we have disable sanitization during SMILES parsing. It returns the same output as input SMILES. Any ideas are appreciated. print(Chem.MolToSmiles(Chem.MolFromSmiles('cN(C)c', sanitize=False))) print(Chem.MolToSmiles(Chem.MolFromSmiles('cN(c)C', sanitize=False))) cN(C)c cN(c)C This issue actually came from the code of identification of functional groups. Kind regards, Pavel |
From: Andrew D. <da...@da...> - 2024-11-04 16:35:34
|
Hi all, I've spent the last while working on some techniques to improve the performance of SMARTS-based fingerprint generators. It's called "talus" and is available at https://hg.sr.ht/~dalke/talus . It's able to improve the performance of Klekota-Roth fingerprint generation by about a factor of 12. These fingerprints have long been described as a slow to generate, eg, "PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints" (2010) at https://onlinelibrary.wiley.com/doi/full/10.1002/jcc.21707 says "The slowest algorithms are the Klekota-Roth fingerprint and Klekota-Roth fingerprint count because they are matching 4860 SMARTS patterns for each molecule.", which they timed as taking 14x the time of MACCS-key generation. The fastest version is available at https://hg.sr.ht/~dalke/talus/browse/KlekotaRoth/kr_filtered_atomtypes.py?rev=tip , which takes a SMILES file and generates the fingerprints in chemfp's FPS format, as a standalone file which depends only on RDKit. ## How does it work? This effort comes from looking at the Klekota-Roth fingerprints (defined in the supplementary data for "Chemical substructures that enrich for biological activity", doi: 10.1093/bioinformatics/btn479, https://academic.oup.com/bioinformatics/article/24/21/2518/192573 and available with a few minor syntax changes in the CDK's Java sources), which contains 4,860 SMARTS strings including [!#1][CH2][CH]([!#1])[!#1] -and- OS(=O)(=O)c1ccc(NN=C(C=O)C=O)cc1 The direct translation into a set of if statements, conceptually like _pat123 = Chem.MolFromSmarts("[!#1][CH2][CH]([!#1])[!#1]") ... if mol.HasSubstructMatch(_pat123): fp.SetBit(123) takes 417 seconds in my standard benchmark of about 30,000 SMILES string, of which only 2.6 seconds is parsing the SMILES string and the rest is the SMARTS matches. I was able to speed this up by a factor of 12 using the following techniques: 1) create a filter based on atom types counts, something like: _at1_pat = Chem.MolFromSmarts("[!#1]") _at2_pat = Chem.MolFromSmarts("[CH2]") _at3_pat = Chem.MolFromSmarts("[CH]") ... num_at1 = len(mol.GetSubstructMatches(_at1_pat)) num_at2 = len(mol.GetSubstructMatches(_at2_pat)) num_at3 = len(mol.GetSubstructMatches(_at3_pat)) ... if (num_at1 >= 3 and num_at2 >= 1 and num_at3 >= 1 and mol.HasSubstructMatch(_pat123)): fp.SetBit(123) 2) Analyze the atom SMARTS to recognize that, for example, both "[CH2]" and "[CH]" will always match "[!#1]", so the minimum counts can be increased to: if (num_at1 >= 5 and num_at2 >= 1 and num_at3 >= 1 and mol.HasSubstructMatch(_pat123)): fp.SetBit(123) 3) Identify SMARTS prefixes, which provide a natural tree structure. For example, the last two SMARTS patterns in the Klekota-Roth keys are: SCCS SCCS(=O)=O There is no reason to test for "SCCS(=O)=O" if "SCCS" does not pass, in which case there's no need to repeat the check for "S" and "C" counts, resulting in something like: if (num_S >= 2 and num_C >= 2 and mol.HasSubstructMatch(_pat_4858)): fp.SetBit(4858) if (num_O >= 2 and mol.HasSubstructMatch(_pat_4859)): fp.SetBit(4859) 4) Improve the effectiveness of SMARTS prefixes The SMARTS patterns are generated by Daylight's canonicalization rules then sorted ASCII-betically, but the SMARTS prefix method works better if the SMARTS starts with a unlikely chain terminal. For example, bit 3994 (key 3995) is "COc1cccc(C=NNC(=O)CO)c1", but "OCC(NN=Cc1cccc(OC)c1)=O" is an equivalent SMARTS with a longer initial chain. 5) Identify SMARTS prefixes which can be inserted as a filter. Here's an example of what that looks like, with a bit number followed by the SMARTS pattern, where the "*" indicates that a pattern is only used for filtering: 2949 Br 3222 BrC * BrCC filter 7 patterns 3332 BrC(C)(Br)Br 3333 BrC(C)C(=O)N * BrCCC filter 3 patterns 3430 BrCC(C)O 2973 BrCCC=O 4683 BrCCC(NC(O)=O)=O 4227 BrC(C(N)NC=O)(Br)Br 4692 BrC(C(O)NC=O)(Br)Br This says the "Br" is one of the keys, so bits 3222, 3332, 3333, 3430, etc. will not be tested unless Br exists. It further notices that "BrCC" is a common prefix to 7 patterns, so on the assumption that the overhead of one rejection test (which should be usual case) saves the time needed to do 7 additional test, it adds that extra filter. The "BrCCC" is provide a further refinement. All told, this brought Klekota-Roth fingerprint generation down to 33.5 seconds, of which 2.3 seconds (7%) was for SMILES processing so another 10x performance gain may be possible. ## These gains are not necessarily portable These impressive performance gains are possible because of how the Klekota-Roth keys were generated. For the subset of the PubChem keys which can be handled by "HasSubstructMatch" to a SMARTS pattern, the overall performance is only 2x, not 12x. ## Possible future directions A clear direction for future improvement would be to build a decision tree based on all reasonable SMARTS subgraphs, tuned by match statistics from a representative selection of molecules. Another extension would be to handle minimum counts, like how "at least 2 rings of size 6", (expressed as "*~1~*~*~*~*~*~1" or "[R]@1@[R]@[R]@[R]@[R]@[R]@1") requires at least 7 ring atoms. Anyone thinking further along these lines may be interested in "Efficient matching of multiple chemical subgraphs" at https://www.nextmovesoftware.com/talks/Sayle_MultipleSmarts_ICCS_201106.pdf . I wanted a system which could generate a Python module, rather than a C/C++/Java library, resulting in different trade-offs. ## Methods to analyze atom and bond SMARTS terms Developing this package required building a parser for the atom and bond SMARTS terms so I could tell if one atom SMARTS is a ubset of another atoms SMARTS. (I let RDKit handle the full SMARTS parsing, then use QueryAtom.GetSmarts() or QueryBond.GetSmarts() to get the actual SMARTS terms). I think it may be of broader interest for anyone working with SMARTS as a syntax level. For example, the test driver takes a SMARTS string and gives a breakdown of the different components, and where that information came from in the SMARTS term: % python smarts_parse.py '[#6a]=@[PH+]' Pattern SMARTS: [#6a]=@[PH+] atoms[0]: [#6&a] -> [c;R;!X0] ^^ ^ elements: [c] ^ in_ring: [R] connectivities: [!X0] + from SMARTS topology atoms[1]: [P&H1&+] -> [P;H1;h0,h1;+1;!X0] ^ elements: [P] ^^ total_hcount: [H1] ^^ implicit_hcount: [h0,h1] ^ charges: [+1] connectivities: [!X0] + from SMARTS topology bonds[0]: =&@ (between atoms 0 and 1) -> '=;@' ^ bondtypes: [=] ^ in_ring: [@] This is able to figure out that "[#6a]" means it must be an aromatic carbon, which means it must be in a ring. It also knows from the SMARTS topology that there must be at least one bond (hence [!X0]). Were it a bit more clever, the "R" should tell it there are at least two bonds, both ring bonds, but that's for the future to fix. It also adds some additional constraints (which I conjectured would be useful atom typing) like how "H1" means the implicit hydrogen count must be only 0 or 1. Some of this work dates back to a SMARTS regular-expression based tokenizer I contributed to Brian Kelly's FROWNS project back in 2001 or so! See https://frowns.sourceforge.net/ . If you want to take this effort further, please contact me and I'll provide some help, thoughts, and advice! Andrew da...@da... |
From: Pavel P. <pav...@uk...> - 2024-10-25 13:34:58
|
Dear colleagues, we are glad to invite you to the 8th Advanced In Silico Drug Design workshop which will be 27-31 January 2025 at Palacky University in Olomouc (Czech Republic). This year we cover topics on: - virtual screening - machine learning and AI - structure- and ligand-based drug design tools - pharmacophore modeling - molecular docking and dynamics - de novo design - chemical space visualization and others Lectures and tutorials will be provided by experts in the field from Austria, France, Italy, Israel and Czech Republic. In particular, Prof. Thierry Langer, Prof. Alexandre Varnek, Prof. Johannes Kirchmair, Prof. Hanoch Senderowitz, Prof. Alexander Domling. There is no fee. The web-site of the workshop https://www.kfc.upol.cz/8add. Kind regards, Pavel |
From: <ml...@li...> - 2024-09-27 07:10:03
|
Hello, Recently, I updated some Python code using from rdkit.Chem.MolStandardize import Standardizer # rdkit <= '2023.09.3' to from rdkit.Chem.MolStandardize import rdMolStandardize # rdkit >= '2024.03.5' Because after an rdkit fresh install, rdkit was updated and my former code stopped working. My old code was this: --- standardizer = Standardizer() def standardize(preserve_stereo, preserve_taut, mol): if preserve_stereo or preserve_taut: s_mol = standardizer.standardize(mol) # We don't need to get fragment parent, because the charge parent is the largest fragment s_mol = standardizer.charge_parent(s_mol, skip_standardize=True) s_mol = standardizer.isotope_parent(s_mol, skip_standardize=True) if not preserve_stereo: s_mol = standardizer.stereo_parent(s_mol, skip_standardize=True) if not preserve_taut: s_mol = standardizer.tautomer_parent(s_mol, skip_standardize=True) return standardizer.standardize(s_mol) else: # standardizer.super_parent(mol): _NOT_ standardizer.standardize(mol) # which doesn't even unsalt the molecule... return standardizer.super_parent(mol) --- And the new code is: --- def standardize(preserve_stereo, preserve_taut, mol): if preserve_stereo or preserve_taut: # We don't need to get fragment parent, because the charge parent is the largest fragment s_mol = rdMolStandardize.ChargeParent(mol, skipStandardize=False) s_mol = rdMolStandardize.IsotopeParent(s_mol, skipStandardize=True) if not preserve_stereo: s_mol = rdMolStandardize.StereoParent(s_mol, skipStandardize=False) if not preserve_taut: s_mol = rdMolStandardize.TautomerParent(s_mol, skipStandardize=False) return s_mol else: return rdMolStandardize.SuperParent(mol, skipStandardize=False) --- Which I hope is isofunctional. The old Standardizer module had a "standardize" method. Is this method also present in rdMolStandardize? Has it changed name (e.g. to rdMolStandardize.Cleanup)? Regards, Francois. |
From: <dd...@wp...> - 2024-09-18 17:34:37
|
Hi all, The following survey aims to gather empirical data to better understand the expectations of data format users concerning comparing them. It should take no more than 10 minutes: forms.gle https://forms.gle/K9AR6gbyjCNCk4FL6 Your response would be greatly appreciated! Best, Dominik |
From: Manish S. <ms...@sa...> - 2024-09-12 18:14:45
|
Hi Kurt, You might find the following scripts helpful for enumerating compounds and align them to a reference molecule: o RDKitEnumerateCompoundLibrary.py <http://www.mayachemtools.org/docs/scripts/html/RDKitEnumerateCompoundLibrar y.html> o RDKitPerformPositionalAnalogueScan.py <http://www.mayachemtools.org/docs/scripts/html/RDKitPerformPositionalAnalog ueScan.html> o RDKitGenerateConstrainedConformers.py <http://www.mayachemtools.org/docs/scripts/html/RDKitGenerateConstrainedConf ormers.html> o RDKitPerformConstrainedMinimization.py <http://www.mayachemtools.org/docs/scripts/html/RDKitPerformConstrainedMinim ization.html> Let me know of any further questions. Thanks, Manish From: Kurt Thorn <kur...@ar...> Sent: Thursday, September 12, 2024 8:50 AM To: rdk...@li... Subject: [Rdkit-discuss] Impose conformation of molecule substructure? Hi All - I would like to enumerate a virtual library of a compound family we have a crystal structure of, where I want to model structures of multiple substituents at a single site. What I would like to do is enforce that the constant part of the molecule assume the conformation in the crystal structure and enumerate just conformers for the new substituent added. Does anyone have a suggestion for how to achieve this in rdkit? Thanks, Kurt Dr Kurt Thorn Chief Technology Officer +1.609.423.1571 (US Office) +1.415.298.3495 (US Mobile) kur...@ar... <mailto:kur...@ar...> www.arrepath.com <http://www.arrepath.com/> ArrePath Inc. 303A College Road East Princeton, NJ 08540 U.S. |
From: Kurt T. <kur...@ar...> - 2024-09-12 17:26:29
|
Thanks Stephen! That code pointed me to the key "coordMap" parameter for fixing atom coordinates. Kurt ________________________________ From: Stephen Roughley <s.d...@go...> Sent: Thursday, September 12, 2024 9:40 AM To: Kurt Thorn <kur...@ar...> Cc: rdk...@li... <rdk...@li...> Subject: Re: [Rdkit-discuss] Impose conformation of molecule substructure? Hi Kurt, The Vernalis KNIME community contribution has a node "Templated Conformer Generator (RDKit)" (see https://hub.knime.com/n/wK3RJiystQYq5M9w ) which will do exactly this. If you don't want to do it in KNIME, then you can see the relevant bits of the Java source at: https://github.com/vernalis/vernalis-knime-nodes/blob/d125b97ad2841133622150c168472168547c4ff3/com.vernalis.knime.chem.pmi/src/com/vernalis/knime/chem/pmi/nodes/confs/rdkitgenerate/RdkitConfgenNodeModel.java#L441-L465 and in particular at: https://github.com/vernalis/vernalis-knime-nodes/blob/d125b97ad2841133622150c168472168547c4ff3/com.vernalis.knime.chem.pmi/src/com/vernalis/knime/chem/pmi/nodes/confs/rdkitgenerate/RdkitConfgenNodeModel.java#L688-L809 Steve On Thu, 12 Sept 2024 at 17:13, Kurt Thorn <kur...@ar...<mailto:kur...@ar...>> wrote: Hi All - I would like to enumerate a virtual library of a compound family we have a crystal structure of, where I want to model structures of multiple substituents at a single site. What I would like to do is enforce that the constant part of the molecule assume the conformation in the crystal structure and enumerate just conformers for the new substituent added. Does anyone have a suggestion for how to achieve this in rdkit? Thanks, Kurt [Logo Description automatically generated] Dr Kurt Thorn Chief Technology Officer +1.609.423.1571 (US Office) +1.415.298.3495 (US Mobile) kur...@ar...<mailto:kur...@ar...> www.arrepath.com<http://www.arrepath.com/> ArrePath Inc. 303A College Road East Princeton, NJ 08540 U.S. _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |
From: Stephen R. <s.d...@go...> - 2024-09-12 16:40:53
|
Hi Kurt, The Vernalis KNIME community contribution has a node "Templated Conformer Generator (RDKit)" (see https://hub.knime.com/n/wK3RJiystQYq5M9w ) which will do exactly this. If you don't want to do it in KNIME, then you can see the relevant bits of the Java source at: https://github.com/vernalis/vernalis-knime-nodes/blob/d125b97ad2841133622150c168472168547c4ff3/com.vernalis.knime.chem.pmi/src/com/vernalis/knime/chem/pmi/nodes/confs/rdkitgenerate/RdkitConfgenNodeModel.java#L441-L465 and in particular at: https://github.com/vernalis/vernalis-knime-nodes/blob/d125b97ad2841133622150c168472168547c4ff3/com.vernalis.knime.chem.pmi/src/com/vernalis/knime/chem/pmi/nodes/confs/rdkitgenerate/RdkitConfgenNodeModel.java#L688-L809 Steve On Thu, 12 Sept 2024 at 17:13, Kurt Thorn <kur...@ar...> wrote: > Hi All - > > I would like to enumerate a virtual library of a compound family we have a > crystal structure of, where I want to model structures of multiple > substituents at a single site. What I would like to do is enforce that the > constant part of the molecule assume the conformation in the crystal > structure and enumerate just conformers for the new substituent added. Does > anyone have a suggestion for how to achieve this in rdkit? > > Thanks, > Kurt > > > > *[image: Logo Description automatically generated]* > > *Dr Kurt Thorn* > > *Chief Technology Officer* > > +1.609.423.1571 (US Office) > > +1.415.298.3495 (US Mobile) > > kur...@ar... > > www.arrepath.com > > > > *ArrePath Inc.* > > 303A College Road East > > Princeton, NJ 08540 > > U.S. > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Kurt T. <kur...@ar...> - 2024-09-12 16:10:33
|
Hi All - I would like to enumerate a virtual library of a compound family we have a crystal structure of, where I want to model structures of multiple substituents at a single site. What I would like to do is enforce that the constant part of the molecule assume the conformation in the crystal structure and enumerate just conformers for the new substituent added. Does anyone have a suggestion for how to achieve this in rdkit? Thanks, Kurt [Logo Description automatically generated] Dr Kurt Thorn Chief Technology Officer +1.609.423.1571 (US Office) +1.415.298.3495 (US Mobile) kur...@ar...<mailto:kur...@ar...> www.arrepath.com<http://www.arrepath.com/> ArrePath Inc. 303A College Road East Princeton, NJ 08540 U.S. |
From: Andrew D. <da...@da...> - 2024-09-11 12:47:44
|
Hi Srdjan, > On Sep 11, 2024, at 11:12, Srdjan Pusara <srd...@ho...> wrote: > I would like to ask is it possible to find source code how these [UFF] interaction terms were implemented? The RDKit source code is available at https://github.com/rdkit/rdkit/tree/master . Use the green button labeled "<> Code" to get the source code either through the git version control tool, or as a zip file. If you want to use the web interface, see https://github.com/rdkit/rdkit/tree/master/Code/ForceField/UFF Best regards, Andrew da...@da... |
From: Srdjan P. <srd...@ho...> - 2024-09-11 09:12:54
|
Hello, I have seen that Rdkit can return force field parameters between group of atoms (bond_params = rdForceFieldHelpers.GetUFFBondStretchParams(mol, 6, 1),angle_params = rdForceFieldHelpers.GetUFFAngleBendParams(mol, 0, 1, 2) etc). I would like to ask is it possible to find source code how these interaction terms were implemented? I understand that these equations can be implemented by reading original paper, but it would be helpful to access the source code od RDkit where these interaction terms are already implemented. In addiion, I have noticed that original UFF paper has some small errors or typos, so having already implemented source code would help. Thanx for help in advance. |
From: Joe B. <Joe...@Sc...> - 2024-08-28 13:55:51
|
Hi all, We are recruiting for a full-time developer with python, RDkit and chemistry experience to work on our Compliance Hub applications. These are used by many of the world's top pharmaceutical companies, CROs and specialist chemical suppliers to ensure compliance with complex and chemical regulations globally. For more information and to apply please see https://blog.scitegrity.com/news/blog-post-1-0-5-1-0-1 It's a remote role, although you do need to be UK based. Best regards Joe Bradley CEO, Scitegrity Limited This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. Scitegrity accepts no liability for any damage caused by any virus transmitted by this email. Scitegrity accept no liability for any advice relating to controlled substances given in this e-mail. |
From: Andrew D. <da...@da...> - 2024-08-27 13:57:15
|
On Aug 27, 2024, at 14:44, Ingvar Lagerstedt <in...@ne...> wrote: > To me it would make sense if RDKit removed the aromatic flags for any atom that is no longer in a ring when deleting an aromatic atom/bond. > Alternatively remove the aromatic flag on any non-ring atom when attempting to kekulize the structure rather than throwing an exception. As Noel commented, the toolkit can't make that assumption. Different people will have different reasons for removing an atom. Some will want to remove multiple atoms, for example. Here is one function to remove an atom. It converts the molecule to Kekulé form, updates the hydrogen counts on the neighboring atoms, and then removes the specified atom: from rdkit import Chem m = Chem.MolFromSmiles('c1ccccc1') mw = Chem.RWMol(m) Chem.Kekulize(mw, clearAromaticFlags=True) atom_idx = 0 atom = mw.GetAtomWithIdx(atom_idx) for bond in atom.GetBonds(): int_bondtype = int(bond.GetBondType()) assert int_bondtype in (1, 2, 3), "unexpected bond type!" other_atom = bond.GetOtherAtom(atom) other_atom.SetNumExplicitHs( other_atom.GetNumExplicitHs() + int_bondtype) mw.RemoveAtom(atom_idx) Chem.SanitizeMol(mw) print(Chem.MolToSmiles(mw)) Bear in mind that there can be multiple possible Kekulé assignments, while Chem.Kekulize only picks one. In some cases (not this one, of course) you may need to apply the above removal method to all distinct assignments (for the given ring system) in order to get all the valid transformed molecules. A more correct function should also also take care when the int_bondtype == 1 and the other_atom has a chiral tag, and does not already have a hydrogen, because you may want to preserve the chiral indicator. Something like this may work (it's copied and pasted from another function, with a few tweaks to match the above naming scheme, but I haven't tested it). num_hs = other_atom.GetTotalNumHs() if (not num_hs) and other_atom.GetChiralTag(): for bond_i, b in enumerate(other_atom.GetBonds()): if b.GetIdx() == bond_idx: break else: raise AssertionError("Could not find bond") want_invert = (bond_i % 2 == 0) if want_invert: if other_atom.GetChiralTag() == 2: other_atom.SetChiralTag(Chem.ChiralType.CHI_TETRAHEDRAL_CW) else: other_atom.SetChiralTag(Chem.ChiralType.CHI_TETRAHEDRAL_CCW) Cheers, Andrew da...@da... |
From: Noel O'B. <bao...@gm...> - 2024-08-27 13:27:05
|
There are other more subtle changes that can affect the aromaticity, e.g. changing a bond order, the charge, or the atomic number of an atom. IMO, the user needs to take responsibility for knowing if aromaticity might be invalidated, and perform the appropriate actions. The alternative is for the toolkit to take the responsibility, trigger a check on every edit and take a performance hit in the general case. Indeed, atom deletion could be treated specially, but slippery slope and confusion here we come! :-) Regards, Noel On Tue, 27 Aug 2024 at 13:47, Ingvar Lagerstedt <in...@ne...> wrote: > Hello, > > When deleting an aromatic atom or bond, the ring information is removed, > while any remaining atom in the broken aromatic ring is still labelled > aromatic. When attempting to sanitize such a molecule I get an exception: "rdkit.Chem.rdchem.AtomKekulizeException: > non-ring atom 0 marked aromatic" > > To recreate: > > >>> from rdkit import Chem > > >>> m = Chem.MolFromSmiles('c1ccccc1') > > >>> mw = Chem.RWMol(m) > > >>> mw.RemoveAtom(0) > > >>> Chem.SanitizeMol(mw) > > [10:40:50] non-ring atom 0 marked aromatic > > Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > rdkit.Chem.rdchem.AtomKekulizeException: non-ring atom 0 marked aromatic > > > The example is simplistic, but there are reactions where an aromatic > system can be broken, such as Zincke-Koenig reaction or Djerassi-Rylander > oxidation. The exception makes it harder to describe such reactions. > > > I currently check if an atom/bond is aromatic before deleting it, and if > so remove all aromatic flags in the molecule. > > > To me it would make sense if RDKit removed the aromatic flags for any atom > that is no longer in a ring when deleting an aromatic atom/bond. > > Alternatively remove the aromatic flag on any non-ring atom when > attempting to kekulize the structure rather than throwing an exception. > > > Compare with a Birch reduction where the ring stays intact, here the > kekulization/the following aromatize step rightly fails to find an aromatic > ring, no exception is thrown, and the atoms are marked as non-aromatic. > > > Kind Regards, > > Ingvar > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Ingvar L. <in...@ne...> - 2024-08-27 12:44:59
|
Hello, When deleting an aromatic atom or bond, the ring information is removed, while any remaining atom in the broken aromatic ring is still labelled aromatic. When attempting to sanitize such a molecule I get an exception: "rdkit.Chem.rdchem.AtomKekulizeException: non-ring atom 0 marked aromatic" To recreate: >>> from rdkit import Chem >>> m = Chem.MolFromSmiles('c1ccccc1') >>> mw = Chem.RWMol(m) >>> mw.RemoveAtom(0) >>> Chem.SanitizeMol(mw) [10:40:50] non-ring atom 0 marked aromatic Traceback (most recent call last): File "<stdin>", line 1, in <module> rdkit.Chem.rdchem.AtomKekulizeException: non-ring atom 0 marked aromatic The example is simplistic, but there are reactions where an aromatic system can be broken, such as Zincke-Koenig reaction or Djerassi-Rylander oxidation. The exception makes it harder to describe such reactions. I currently check if an atom/bond is aromatic before deleting it, and if so remove all aromatic flags in the molecule. To me it would make sense if RDKit removed the aromatic flags for any atom that is no longer in a ring when deleting an aromatic atom/bond. Alternatively remove the aromatic flag on any non-ring atom when attempting to kekulize the structure rather than throwing an exception. Compare with a Birch reduction where the ring stays intact, here the kekulization/the following aromatize step rightly fails to find an aromatic ring, no exception is thrown, and the atoms are marked as non-aromatic. Kind Regards, Ingvar |
From: Diogo M. <dio...@gm...> - 2024-08-22 19:16:02
|
Hello, We are recruiting a programmer, primarily Python, to improve autodock (molecular docking) and integrate it with other software, such as RDKit and OpenMM. The location is Scripps Research in La Jolla, California. Goals are: - to support development in general, - improve user-friendliness of command line and graphical interfaces, - make autodock components more usable from Python For more details and to apply, see: https://recruiting2.ultipro.com/SCR1003TSRI/JobBoard/98759e7d-7ede-4c0b-ac7b-2c6293c7b522/OpportunityDetail?opportunityId=b92548d1-155c-4c8e-be0e-f59c5b2452e0 Best regards, Diogo |
From: Andrew D. <da...@da...> - 2024-08-05 08:35:38
|
Hi RDKit-ers, I have released chemfp 4.2. The new "simarray" functionality computes the full comparison matrix as a NumPy array, eg, for use in some clustering algorithms. It has built-in support for Tanimoto, Dice, cosine, and Hamming comparisons, plus an option to get the individual "a", "b", "c", and "d" components should you need a specialized metric. It processes 100M comparisons per second on my laptop, which means if you had 30 TB of free disk space you could generate the NxN comparisons for ChEMBL in about a day. (I'm curious if someone will do this!) I've also updated chemfp's RDKit-Fingerprint, RDKit-Morgan, RDKit-AtomPair, and RDKit-Torsion fingerprint types to use RDKit's fingerprint generator API, instead of the older function-based API. This includes support for count emulation. Some of the parameter names have changed to follow RDKit's newer convention, and the RDKit-Morgan fingerprints now default to r=3 (to match the RDKit default) rather than r=2. Chemfp still supports the older function-based API, which is used if you specify the older version number explicitly. For a full description of what's new in this release, see https://chemfp.com/docs/whats_new_in_42.html . Chemfp may be the package you’ve been looking for, if you work with binary cheminformatics fingerprints in Python. Chemfp is perhaps best known for its high-performance fingerprint similarity search. Its Taylor/Butina clustering, MaxMin diversity selection, and sphere exclusion, (including directed sphere exclusion) are equally world-class. Or, if you simply need a 100K by 100K distance array to pass into scikit-learn, chemfp’s simarray can generate that in less than a minute. The chemfp homepage is https://chemfp.com/ . To install a pre-compiled chemfp for Linux-based OSes: python -m pip install chemfp -i https://chemfp.com/packages/ The default installation limits or disables a few chemfp features as described in the base license agreement at https://chemfp.com/BaseLicense.txt . To request a license key, which is free for academic use, see https://chemfp.com/license/ . Best regards, Andrew Dalke da...@da... |
From: Ingvar L. <in...@ne...> - 2024-07-30 14:59:03
|
Ah - found the rdkit-dev conda package. Missed a wildcard when I searched . > On 30 Jul 2024, at 14:58, Ingvar Lagerstedt <in...@ne...> wrote: > > Hello, > > In the conda package for RDKit 2024.03.5 for rdkit, including librdkit there are no header files, e.g., RDKitBase.h. They were in the 2024.03.4 version. Have the headers moved to another conda package, or have they accidentally been left out. I have the linux-64 and osx-arm64 version. > > Kind Regards, > Ingvar > > Ingvar Lagerstedt > Senior software engineer > NextMove Software Limited > Innovation Centre, 320 Cambridge Science Park, Cambridge, CB4 0WG > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |
From: Ingvar L. <in...@ne...> - 2024-07-30 14:21:32
|
Hello, In the conda package for RDKit 2024.03.5 for rdkit, including librdkit there are no header files, e.g., RDKitBase.h. They were in the 2024.03.4 version. Have the headers moved to another conda package, or have they accidentally been left out. I have the linux-64 and osx-arm64 version. Kind Regards, Ingvar Ingvar Lagerstedt Senior software engineer NextMove Software Limited Innovation Centre, 320 Cambridge Science Park, Cambridge, CB4 0WG |
From: Greg L. <gre...@gm...> - 2024-07-26 15:07:08
|
Hi Joao, On Thu, Jul 25, 2024 at 8:03 PM J Sousa <jso...@gm...> wrote: > > In fingerprints calculated by RDKit with includeChirality=True, is the CIP > label (R/S) the atom property directly used to generate the integer > identifier of an atom circular neighborhood? > > Is the CIP label used for Morgan, RTKit, Atom pairs and Torsions > fingerprints? > Yes, that is currently how the fingerprinting code handles chirality. This is not a great way to do it, but it's what we have right now.[1] -greg [1] I haven't put a lot of time into this, but I haven't come up with anything better yet. |
From: J S. <jso...@gm...> - 2024-07-25 18:00:50
|
Hi, In fingerprints calculated by RDKit with includeChirality=True, is the CIP label (R/S) the atom property directly used to generate the integer identifier of an atom circular neighborhood? Is the CIP label used for Morgan, RTKit, Atom pairs and Torsions fingerprints? Thanks, Joao Sousa |
From: Ernst-Georg S. <pg...@tu...> - 2024-07-02 15:56:50
|
Am 27.06.2024 um 11:03 schrieb Wim Dehaen: > I would expect the problem here is kekulization. The SMARTS is pattern > matching using the kekule structure (i.e. double and single bonds, non > aromatic atoms) and is not sanitized whereas the SMILES after parsing > and sanitization has aromatic bonds and aromatic atoms. Try what happens > when you do a SMARTS match with the SMILES with aromatic atoms: > `[2H]c1cc([3H])cc(C2=N[C@](C)([37Cl])CC2)c1` That was it indeed. Thank you, Ernst-Georg |
From: Noel O'B. <bao...@gm...> - 2024-06-27 09:28:07
|
"Every valid SMILES is also a valid SMARTS": I think this is one of John May's lines, which I was never keen on as it makes people think that if you treat a SMILES as a SMARTS that it will match the original SMILES. It mostly will, but I think you have found the difference between the SMILES and SMARTS treatment of "[2H]" - one means deuterium, the other means an isotope of mass 2 with a single implicit hydrogen attached. It doesn't match because the deuterium doesn't have another hydrogen attached. [I think??] Regards, Noel On Thu, 27 Jun 2024 at 10:05, Wim Dehaen <wim...@gm...> wrote: > I would expect the problem here is kekulization. The SMARTS is pattern > matching using the kekule structure (i.e. double and single bonds, non > aromatic atoms) and is not sanitized whereas the SMILES after parsing and > sanitization has aromatic bonds and aromatic atoms. Try what happens when > you do a SMARTS match with the SMILES with aromatic atoms: > `[2H]c1cc([3H])cc(C2=N[C@](C)([37Cl])CC2)c1` > > best wishes > wim > > On Thu, Jun 27, 2024 at 10:56 AM pgchem pgchem <pg...@tu...> > wrote: > >> Hello all, >> >> if every valid SMILES is also a valid SMARTS, why does: >> >> select substruct('[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol, >> '[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol) >> >> yield "True", but: >> >> select substruct('[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::mol, >> '[2H]C1=CC([3H])=CC(=C1)C1=N[C@](C)([37Cl])CC1'::qmol) >> >> is "False"? The same is observed when using the @> operator. >> >> RDKit 2024.03.3 built from source + PostgreSQL 16.3. >> >> best regards >> >> Ernst-Georg >> >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdk...@li... >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |