rdkit-discuss Mailing List for RDKit (Page 2)
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
(27) |
Oct
(4) |
Nov
(20) |
Dec
(4) |
2008 |
Jan
(12) |
Feb
(2) |
Mar
(23) |
Apr
(40) |
May
(30) |
Jun
(6) |
Jul
(35) |
Aug
(60) |
Sep
(31) |
Oct
(33) |
Nov
(35) |
Dec
(3) |
2009 |
Jan
(16) |
Feb
(77) |
Mar
(88) |
Apr
(57) |
May
(33) |
Jun
(27) |
Jul
(55) |
Aug
(26) |
Sep
(12) |
Oct
(45) |
Nov
(42) |
Dec
(23) |
2010 |
Jan
(64) |
Feb
(17) |
Mar
(30) |
Apr
(55) |
May
(30) |
Jun
(65) |
Jul
(112) |
Aug
(26) |
Sep
(67) |
Oct
(20) |
Nov
(67) |
Dec
(23) |
2011 |
Jan
(57) |
Feb
(43) |
Mar
(50) |
Apr
(66) |
May
(95) |
Jun
(73) |
Jul
(64) |
Aug
(47) |
Sep
(22) |
Oct
(56) |
Nov
(51) |
Dec
(34) |
2012 |
Jan
(64) |
Feb
(45) |
Mar
(65) |
Apr
(85) |
May
(76) |
Jun
(47) |
Jul
(75) |
Aug
(72) |
Sep
(31) |
Oct
(77) |
Nov
(61) |
Dec
(41) |
2013 |
Jan
(68) |
Feb
(63) |
Mar
(36) |
Apr
(73) |
May
(61) |
Jun
(69) |
Jul
(98) |
Aug
(60) |
Sep
(74) |
Oct
(102) |
Nov
(92) |
Dec
(63) |
2014 |
Jan
(112) |
Feb
(84) |
Mar
(72) |
Apr
(59) |
May
(96) |
Jun
(54) |
Jul
(91) |
Aug
(54) |
Sep
(38) |
Oct
(47) |
Nov
(33) |
Dec
(39) |
2015 |
Jan
(41) |
Feb
(115) |
Mar
(66) |
Apr
(87) |
May
(63) |
Jun
(53) |
Jul
(61) |
Aug
(59) |
Sep
(115) |
Oct
(42) |
Nov
(60) |
Dec
(20) |
2016 |
Jan
(52) |
Feb
(72) |
Mar
(100) |
Apr
(125) |
May
(61) |
Jun
(106) |
Jul
(62) |
Aug
(74) |
Sep
(151) |
Oct
(151) |
Nov
(117) |
Dec
(148) |
2017 |
Jan
(106) |
Feb
(75) |
Mar
(106) |
Apr
(67) |
May
(85) |
Jun
(144) |
Jul
(53) |
Aug
(73) |
Sep
(188) |
Oct
(106) |
Nov
(118) |
Dec
(74) |
2018 |
Jan
(96) |
Feb
(43) |
Mar
(40) |
Apr
(111) |
May
(77) |
Jun
(112) |
Jul
(64) |
Aug
(85) |
Sep
(73) |
Oct
(117) |
Nov
(97) |
Dec
(47) |
2019 |
Jan
(63) |
Feb
(112) |
Mar
(109) |
Apr
(61) |
May
(51) |
Jun
(41) |
Jul
(57) |
Aug
(68) |
Sep
(47) |
Oct
(126) |
Nov
(117) |
Dec
(96) |
2020 |
Jan
(84) |
Feb
(82) |
Mar
(80) |
Apr
(100) |
May
(78) |
Jun
(68) |
Jul
(76) |
Aug
(69) |
Sep
(76) |
Oct
(73) |
Nov
(69) |
Dec
(42) |
2021 |
Jan
(44) |
Feb
(30) |
Mar
(85) |
Apr
(65) |
May
(41) |
Jun
(72) |
Jul
(55) |
Aug
(9) |
Sep
(44) |
Oct
(44) |
Nov
(30) |
Dec
(40) |
2022 |
Jan
(35) |
Feb
(29) |
Mar
(55) |
Apr
(30) |
May
(31) |
Jun
(27) |
Jul
(49) |
Aug
(15) |
Sep
(17) |
Oct
(25) |
Nov
(15) |
Dec
(40) |
2023 |
Jan
(32) |
Feb
(10) |
Mar
(10) |
Apr
(21) |
May
(33) |
Jun
(31) |
Jul
(12) |
Aug
(17) |
Sep
(14) |
Oct
(12) |
Nov
(8) |
Dec
(12) |
2024 |
Jan
(10) |
Feb
(18) |
Mar
(7) |
Apr
(4) |
May
(6) |
Jun
(4) |
Jul
(5) |
Aug
(6) |
Sep
(7) |
Oct
|
Nov
|
Dec
|
From: He, A. <he...@bu...> - 2024-05-13 21:25:50
|
Hi Pavel, Do you work with small rings (5, 6, 7) or large cyclic structures (like cyclic peptides)? To distinguish different conformations of small rings, I feel that the torsional angles or apex heights – such geometric values that are alignment-free and depend on the internal coordinates of the molecules - might be more useful than RMSD. You can run conformation generation then put conformers into categories, if you don’t have too many rings and the rings aren’t that big. To get started with a simple example in RDKit, I previously found this tutorial very helpful: https://sunhwan.github.io/blog/2021/02/24/RDKit-ETKDG-Piperazine.html Docking in AutoDock Vina (https://autodock-vina.readthedocs.io/en/latest/) or AutoDock-GPU (https://github.com/ccsb-scripps/AutoDock-GPU) supports sampling of ring conformations on the run. By default, attempts will be made during docking to sample alternate conformers of 7-membered and larger rings. Optionally, you could also turn on the sampling for 6-membered rings and smaller ones. Take a peek at this recent paper to learn about the method: https://www.cambridge.org/core/journals/qrb-discovery/article/performance-evaluation-of-flexible-macrocycle-docking-in-autodock/D8417BC284AEE198EC6AF25C7E677249 The Meeko project (https://github.com/forlilab/Meeko?tab=readme-ov-file#python-tutorial) provides a seamless workflow in Python to export your RDKit molecules into AutoDock-ready formats (and the docking outcomes can be retrieved back to RDKit, too!). The multiple docking outcomes with AutoDock Vina can give you at least some idea of what conformations might fit. You could refine the poses with more advanced methods. Hope this helps! Best regards, Amy H. From: Pavel Polishchuk <pav...@uk...> Date: Monday, May 13, 2024 at 4:43 AM To: rdk...@li... <rdk...@li...> Subject: [Rdkit-discuss] sampling of ring conformation for docking Hello, I use RDKit to embed initial conformations for docking. The issue is with saturated rings. I can use a single random conformer but its geometry may be unsuitable and the whole molecule will fail to dock. I can use several starting conformers for docking and to avoid docking of very similar conformers I can select a few diverse conformers based on RMSD between rings only. However, the issue occurs if a molecule has several such saturated rings. The current workaround is to compute RMSD between corresponding rings individually, then average RMSD values and select a diverse set of conformers. It may work to some extend. However I'm curious whether a better solution possible? Can we sample rings individually and embed a molecule using pre-generated conformers of some parts (rings)? I know about the restricted conformer enumeration function, but it will work if we supply only a single connected part as fixed. It should not work if we have two disconnected parts (rings) with 3D coordinates, because we do not know their relative position to generate 3D coordinates for the rest of atoms in a molecule. Maybe someone will have some ideas/suggestions? Kind regards, Pavel _______________________________________________ Rdkit-discuss mailing list Rdk...@li... https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/rdkit-discuss__;!!KGKeukY!0p9-LhqopxbW2-tJTOxCwEVRUKO6jN5s_2WifPuV2PCrDjoa_nTmgY9NPdqsyDi2aHTJ3LA1_Kh37wI0Vhn8IlJ5PAEKr5vut811YA$<https://urldefense.com/v3/__https:/lists.sourceforge.net/lists/listinfo/rdkit-discuss__;!!KGKeukY!0p9-LhqopxbW2-tJTOxCwEVRUKO6jN5s_2WifPuV2PCrDjoa_nTmgY9NPdqsyDi2aHTJ3LA1_Kh37wI0Vhn8IlJ5PAEKr5vut811YA$> |
From: Pavel P. <pav...@uk...> - 2024-05-13 08:43:01
|
Hello, I use RDKit to embed initial conformations for docking. The issue is with saturated rings. I can use a single random conformer but its geometry may be unsuitable and the whole molecule will fail to dock. I can use several starting conformers for docking and to avoid docking of very similar conformers I can select a few diverse conformers based on RMSD between rings only. However, the issue occurs if a molecule has several such saturated rings. The current workaround is to compute RMSD between corresponding rings individually, then average RMSD values and select a diverse set of conformers. It may work to some extend. However I'm curious whether a better solution possible? Can we sample rings individually and embed a molecule using pre-generated conformers of some parts (rings)? I know about the restricted conformer enumeration function, but it will work if we supply only a single connected part as fixed. It should not work if we have two disconnected parts (rings) with 3D coordinates, because we do not know their relative position to generate 3D coordinates for the rest of atoms in a molecule. Maybe someone will have some ideas/suggestions? Kind regards, Pavel |
From: Ariadna L. P. <ari...@gm...> - 2024-05-02 08:18:31
|
Hello everyone, Thank you for all your helpful suggestions. I've taken careful note of them, and they have been extremely helpful in guiding my work. 3D-QSAR is also new for me and your insights and expertise have been incredibly valuable. Thank you once again for your generous assistance. Best Regards, Ariadna Llop Missatge de Andrew Dalke <da...@da...> del dia dt., 30 d’abr. 2024 a les 22:45: > Hi Ariadna, > > In general the MACCS keys are not that good for comparing similarity. > They exist still for historical reasons. Back in the 1970s the company > Molecular Design Limited developed a program called "Molecular Access > System" (MACCS) for structure registration, substructure search, and the > like. > > Substructure search is slow, so MACCS includes a set of keys which would > act as fast filters - if the query contained a key but the database entry > did not, then the query could not match that entry. > > In the 1980s when fingerprint similarity search first became popular - > this is before the term "fingerprint" was even coined - people used the > MACCS keys because they were already computed and sitting there, on the > computer system they were already using. > > Over time people developed other types of fingerprints, and different ways > to compare them, and a more complete understanding of how they are coupled > to the types of system being studied. > > For example, in "Comparing structural fingerprints using a > literature-based similarity benchmark" by Sayle and O'Boyle, > "Extended-connectivity fingerprints of diameter 4 and 6 are among the best > performing fingerprints when ranking diverse structures by similarity, as > is the topological torsion fingerprint. However, when ranking very close > analogues, the atom pair fingerprint outperforms the others tested." > > They found the MACCS fingerprints to be one of the worst performers, which > you might expect now that you know the happenstance which made them popular. > > Since you are doing 3D QSAR, you should familiarize yourself with the > fingerprints used in that area. I have no experience with 3D QSAR and > cannot provide advice on what is appropriate. > > The first paper I found using Google Scholar to search for "3d qsar > fingerprints" is "Docking, Interaction Fingerprint, and Three-Dimensional > Quantitative Structure–Activity Relationship (3D-QSAR) of Sigma1 Receptor > Ligands, Analogs of the Neuroprotective Agent RC-33" at > https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6637851/ which uses > Interaction fingerprints. > > The second is "Novel TOPP descriptors in 3D-QSAR analysis of apoptosis > inducing 4-aryl-4H-chromenes: Comparison versus other 2D- and > 3D-descriptors" at > https://www.sciencedirect.com/science/article/pii/S0968089607005834 which > I mention to because it summarizes 7 different descriptor-based approaches, > and places the MACCS keys in last place, far below the second worst ("TOPP > > GRIND > BCI 4096 = ECFP > FCFP > GRID-GOLPE ≫ DRAGON ⋙ MDL 166"). > > No doubt there are many others for you to read through and try out. > > > > # Generate fingerprint descriptor database > > fps = [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in mols] > > What I can suggest is you try my chemfp package, specifically the 3.2b1 I > just released (bear in mind that it is beta!) > > You can install it with: > > python -m pip install chemfp==4.2b1 -i https://chemfp.com/packages/ > > To generate Morgan fingerprints of radius 2, I suggest you compute them > once and store them in a file, like this command-line example: > > rdkit2fps --morgan2 dataset.smi -o dataset.fps > > (use "--maccs" to generate MACCS keys, "--pair" for atom pairs; and use > "--help" to see what other options are available.) > > To "Calculate pairwise Tanimoto similarity between fingerprints" as a > distance, you can use another command-line tool to generate the matrix as a > NumPy "npy" file, like this: > > chemfp simarray dataset.fps --as-distance -o dataset.npy > > To load this in Python: > > import numpy as np > dists = np.load("dataset.npy") > > If you also need the identifiers: > > with open("dataset.npy", "rb") as f: > dists = np.load(f) > metadata = np.load(f) > ids = np.load(f) > > This should make it easier to iterate over the different clustering > methods available, since you only generate the fingerprints and distance > matrix once. > > If you decide to use interaction fingerprints, or some other fingerprint > type that is not in the RDKit, you can still generate the fingerprints in > FPS format (a simple text format) and use chemfp to generate your matrix > for you, either on the command-line or through its Python API. > > > However, I'm not satisfied with the results and would like to experiment > with MACCS Keys to see if they yield better clustering outcomes. Does > anyone know how to cluster compounds using MACCS fingerprints? Any insights > on the best approach to calculate similarities and cluster using these > fingerprints would be highly appreciated. > > In case I was not clear enough before, MACCS keys make poor fingerprints. > There is no reason to expect they will yield better clustering outcomes, > and multiple papers which suggest they will make worse outcomes. > > Best regards, > > Andrew > da...@da... > > > |
From: Andrew D. <da...@da...> - 2024-04-30 21:10:38
|
Hi Ariadna, In general the MACCS keys are not that good for comparing similarity. They exist still for historical reasons. Back in the 1970s the company Molecular Design Limited developed a program called "Molecular Access System" (MACCS) for structure registration, substructure search, and the like. Substructure search is slow, so MACCS includes a set of keys which would act as fast filters - if the query contained a key but the database entry did not, then the query could not match that entry. In the 1980s when fingerprint similarity search first became popular - this is before the term "fingerprint" was even coined - people used the MACCS keys because they were already computed and sitting there, on the computer system they were already using. Over time people developed other types of fingerprints, and different ways to compare them, and a more complete understanding of how they are coupled to the types of system being studied. For example, in "Comparing structural fingerprints using a literature-based similarity benchmark" by Sayle and O'Boyle, "Extended-connectivity fingerprints of diameter 4 and 6 are among the best performing fingerprints when ranking diverse structures by similarity, as is the topological torsion fingerprint. However, when ranking very close analogues, the atom pair fingerprint outperforms the others tested." They found the MACCS fingerprints to be one of the worst performers, which you might expect now that you know the happenstance which made them popular. Since you are doing 3D QSAR, you should familiarize yourself with the fingerprints used in that area. I have no experience with 3D QSAR and cannot provide advice on what is appropriate. The first paper I found using Google Scholar to search for "3d qsar fingerprints" is "Docking, Interaction Fingerprint, and Three-Dimensional Quantitative Structure–Activity Relationship (3D-QSAR) of Sigma1 Receptor Ligands, Analogs of the Neuroprotective Agent RC-33" at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6637851/ which uses Interaction fingerprints. The second is "Novel TOPP descriptors in 3D-QSAR analysis of apoptosis inducing 4-aryl-4H-chromenes: Comparison versus other 2D- and 3D-descriptors" at https://www.sciencedirect.com/science/article/pii/S0968089607005834 which I mention to because it summarizes 7 different descriptor-based approaches, and places the MACCS keys in last place, far below the second worst ("TOPP > GRIND > BCI 4096 = ECFP > FCFP > GRID-GOLPE ≫ DRAGON ⋙ MDL 166"). No doubt there are many others for you to read through and try out. > # Generate fingerprint descriptor database > fps = [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in mols] What I can suggest is you try my chemfp package, specifically the 3.2b1 I just released (bear in mind that it is beta!) You can install it with: python -m pip install chemfp==4.2b1 -i https://chemfp.com/packages/ To generate Morgan fingerprints of radius 2, I suggest you compute them once and store them in a file, like this command-line example: rdkit2fps --morgan2 dataset.smi -o dataset.fps (use "--maccs" to generate MACCS keys, "--pair" for atom pairs; and use "--help" to see what other options are available.) To "Calculate pairwise Tanimoto similarity between fingerprints" as a distance, you can use another command-line tool to generate the matrix as a NumPy "npy" file, like this: chemfp simarray dataset.fps --as-distance -o dataset.npy To load this in Python: import numpy as np dists = np.load("dataset.npy") If you also need the identifiers: with open("dataset.npy", "rb") as f: dists = np.load(f) metadata = np.load(f) ids = np.load(f) This should make it easier to iterate over the different clustering methods available, since you only generate the fingerprints and distance matrix once. If you decide to use interaction fingerprints, or some other fingerprint type that is not in the RDKit, you can still generate the fingerprints in FPS format (a simple text format) and use chemfp to generate your matrix for you, either on the command-line or through its Python API. > However, I'm not satisfied with the results and would like to experiment with MACCS Keys to see if they yield better clustering outcomes. Does anyone know how to cluster compounds using MACCS fingerprints? Any insights on the best approach to calculate similarities and cluster using these fingerprints would be highly appreciated. In case I was not clear enough before, MACCS keys make poor fingerprints. There is no reason to expect they will yield better clustering outcomes, and multiple papers which suggest they will make worse outcomes. Best regards, Andrew da...@da... |
From: YUKTI D. <yuk...@st...> - 2024-04-24 19:14:59
|
Can anybody help me doing bioactivity prediction of batch of smiles through RDKit? |
From: Greg L. <gre...@gm...> - 2024-04-23 14:20:24
|
Hi, Please do not duplicate questions/posts between the mailing list and github discussions. That's spamming the community. -greg On Tue, Apr 23, 2024 at 4:10 PM Ariadna Llop Peiró <ari...@gm...> wrote: > Hello everyone, > > I'm currently working with a dataset of chemical compounds, aiming to > cluster them into different series to create a 3D-QSAR model. Up to this > point, I've been using Morgan Fingerprints to generate the descriptors and > cluster the compounds based on their Tanimoto Similarity: > > ``` > # Generate fingerprint descriptor database > fps = [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in mols] > > > # Calculate pairwise Tanimoto similarity between fingerprints > similarity_matrix = [] > for i in range(len(fps)): > similarities = [] > for j in range(len(fps)): > similarities.append(DataStructs.TanimotoSimilarity(fps[i], fps[j])) > > similarity_matrix.append(similarities) > ``` > > > With the similarity matrix, I applied hierarchical clustering based on a > Tanimoto Similarity threshold to group similar compounds: > > ``` > # Cluster based on Tanimoto similarity > dists = 1 - np.array(similarity_matrix) > hc = hierarchy.linkage(squareform(dists), method='single') > > # Specify a distance threshold or number of clusters > threshold = 0.6 # Adjust this value based on your dendrogram and > similarity values > clusters = hierarchy.fcluster(hc, threshold, criterion='distance') > ``` > > However, I'm not satisfied with the results and would like to experiment > with MACCS Keys to see if they yield better clustering outcomes. Does > anyone know how to cluster compounds using MACCS fingerprints? Any insights > on the best approach to calculate similarities and cluster using these > fingerprints would be highly appreciated. > > Thank you in advance for your suggestions! > > Ariadna Llop > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Ariadna L. P. <ari...@gm...> - 2024-04-23 14:07:38
|
Hello everyone, I'm currently working with a dataset of chemical compounds, aiming to cluster them into different series to create a 3D-QSAR model. Up to this point, I've been using Morgan Fingerprints to generate the descriptors and cluster the compounds based on their Tanimoto Similarity: ``` # Generate fingerprint descriptor database fps = [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in mols] # Calculate pairwise Tanimoto similarity between fingerprints similarity_matrix = [] for i in range(len(fps)): similarities = [] for j in range(len(fps)): similarities.append(DataStructs.TanimotoSimilarity(fps[i], fps[j])) similarity_matrix.append(similarities) ``` With the similarity matrix, I applied hierarchical clustering based on a Tanimoto Similarity threshold to group similar compounds: ``` # Cluster based on Tanimoto similarity dists = 1 - np.array(similarity_matrix) hc = hierarchy.linkage(squareform(dists), method='single') # Specify a distance threshold or number of clusters threshold = 0.6 # Adjust this value based on your dendrogram and similarity values clusters = hierarchy.fcluster(hc, threshold, criterion='distance') ``` However, I'm not satisfied with the results and would like to experiment with MACCS Keys to see if they yield better clustering outcomes. Does anyone know how to cluster compounds using MACCS fingerprints? Any insights on the best approach to calculate similarities and cluster using these fingerprints would be highly appreciated. Thank you in advance for your suggestions! Ariadna Llop |
From: מיכל ר. <mic...@gm...> - 2024-03-27 10:28:22
|
Hi I'm trying to define the following reaction: '([A:1]\[A:2]=[A:3]\[A:4]=[A:5]/[A:6]=[A:7]\[A:8]=[A:9]\[A:10]) >> ([A:1]/[A:2]=[A:9]\[A:10].[A:4]1=[A:5][A:6]=[A:7][A:8]=[A:3]1)' I want the reaction to take place for the cis case specifically as written and not for the all-trans reactant. using rdchiral I manage to eliminate the all-trans reactant, but the product is given in its all-trans case and not in the cis case, as the reaction demands (between atoms 1,2,9,10): reactant: 'CCCC/[NH+]=C/C=C(C)\\C=C/C=C(C)/C=C/C1=C(C)CCCC1(C)C' product: 'CCCC/[NH+]=C(C)/C=C/C1=C(C)CCCC1(C)C' How can I resolve this issue? |
From: Greg L. <gre...@gm...> - 2024-03-20 16:36:08
|
For what it's worth, this one works too: m.GetSubstructMatches(Chem.MolFromSmarts('P1->[Zr+3]<-C1')) It looks like a problem in the way ring closure bonds are being handled in the SMARTS parser. Jan: would you mind creating an issue for this in github? -greg On Wed, Mar 20, 2024 at 3:30 PM Jan Halborg Jensen <jhj...@ch...> wrote: > The following finds no matches: > > m = Chem.MolFromSmiles('C1P->[Zr+3]<-1') > m.GetSubstructMatches(Chem.MolFromSmarts('C1P->[Zr+3]<-1’)) > > But all these work: > > m.GetSubstructMatches(Chem.MolFromSmiles('C1P->[Zr+3]<-1’)) > > m.GetSubstructMatches(Chem.MolFromSmarts('[*]->[Zr+3]’)) > > m = Chem.MolFromSmiles('C1P-[Zr+3]-1') > m.GetSubstructMatches(Chem.MolFromSmarts('C1P-[Zr+3]-1’)) > > > Is this a bug, or is there something I’m missing with regard to the first > case? > > Best regards, Jan > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Jan H. J. <jhj...@ch...> - 2024-03-20 14:28:16
|
The following finds no matches: m = Chem.MolFromSmiles('C1P->[Zr+3]<-1') m.GetSubstructMatches(Chem.MolFromSmarts('C1P->[Zr+3]<-1’)) But all these work: m.GetSubstructMatches(Chem.MolFromSmiles('C1P->[Zr+3]<-1’)) m.GetSubstructMatches(Chem.MolFromSmarts('[*]->[Zr+3]’)) m = Chem.MolFromSmiles('C1P-[Zr+3]-1') m.GetSubstructMatches(Chem.MolFromSmarts('C1P-[Zr+3]-1’)) Is this a bug, or is there something I’m missing with regard to the first case? Best regards, Jan |
From: Paolo T. <pao...@gm...> - 2024-03-19 11:16:35
|
Dear Jan, Definitely it is a bug. I’ll try and fix it for the next release which is due in ~2 weeks. Thanks for reporting, cheers Paolo > On 19 Mar 2024, at 11:20, Jan Halborg Jensen <jhj...@ch...> wrote: > > Why does ResonanceMolSupplier only give me one resonance structure for O[NH+]=[C-]NC when O[NH+]=[CH]NC gives me two structures? Is that a bug? > > Best regards, Jan > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |
From: Jan H. J. <jhj...@ch...> - 2024-03-19 10:18:30
|
Why does ResonanceMolSupplier only give me one resonance structure for O[NH+]=[C-]NC when O[NH+]=[CH]NC gives me two structures? Is that a bug? Best regards, Jan |
From: 王昊 <hwa...@16...> - 2024-03-13 13:25:15
|
Hi: I have two molecules as shown below. It seems that they should not have a common substructure or the substructure is smaller,, but they can match the following results. I have added parameters to no avail. How can I solve this problem? code: smi1 = 'CC(=O)OCCc1ccccc1' smi2 = 'CCCCCC' mol1 = Chem.MolFromSmiles(smi1) mol2 = Chem.MolFromSmiles(smi2) params = rdFMCS.MCSParameters() params.BondCompare = rdFMCS.BondCompare.CompareOrderExact params.AtomCompare = rdFMCS.AtomCompare.CompareAny params.MatchValences = True params.MatchChiralTag = True mcs = rdFMCS.FindMCS([mol1, mol2],params) mcs_smarts = mcs.smartsString mcs_smiles = Chem.MolToSmiles(Chem.MolFromSmarts(mcs_smarts)) print (mcs_smarts) print (mcs_smiles) result: [#6]-[#6]-[#6]-,:[#6]-,:[#6]-,:[#6] CCCCCC |
From: Greg L. <gre...@gm...> - 2024-03-13 05:23:20
|
Dear all, The (free) registration for the 2024 RDKit UGM, being held from 11-13 September at the ETH in Zurich, Switzerland, is now open: https://www.eventbrite.com/e/860637719587 You can submit proposals to do talks, tutorials, lightning talks, and posters here: https://forms.gle/5GK5ej7hCdPguwKz8 As in the past couple of years, we will stream the talks for people who cannot attend in person. Best regards, -greg |
From: Ádám B. <bar...@gm...> - 2024-02-23 09:50:33
|
Hello all, Is it possible to add a legend to individual reaction components if I use DrawReaction, like with MolsToGridImage or DrawMolecule? I'm trying to display a reaction that has identifiers (ex: A, B, P1) below each component. I'm currently drawing the reaction with DrawReaction. The reaction is generated from an RxnBlock. Thank you, -- ~Baróthi Ádám |
From: Chris S. <sw...@ma...> - 2024-02-22 11:42:07
|
Hi Both, Many thanks for your rapid response, much appreciated. Cheers Chris |
From: Taka S. <ser...@gm...> - 2024-02-22 10:48:22
|
Hi Chris, I think you can do it with SaveXlsxFromFrame. http://rdkit.org/docs/source/rdkit.Chem.PandasTools.html#SaveXlsxFromFrame rdkit.Chem.PandasTools.SaveXlsxFromFrame(*frame*, *outFile*, *molCol='ROMol'*, *size=(300, 300)*, *formats=None*)¶ <http://rdkit.org/docs/source/rdkit.Chem.PandasTools.html#rdkit.Chem.PandasTools.SaveXlsxFromFrame> Saves pandas DataFrame as a xlsx file with embedded images. molCol can be either a single column label or a list of column labels. It maps numpy data types to excel cell types: int, float -> number datetime -> datetime object -> string (limited to 32k character - xlsx limitations) The formats parameter can be optionally set to a dict of XlsxWriter formats (https://xlsxwriter.readthedocs.io/format.html#format), e.g.: { ‘write_string’: {‘text_wrap’: True} } Currently supported keys for the formats dict are: ‘write_string’, ‘write_number’, ‘write_datetime’. Cells with compound images are a bit larger than images due to excel. Column width weirdness explained (from xlsxwriter docs): The width corresponds to the column width value that is specified in Excel. It is approximately equal to the length of a string in the default font of Calibri 11. Unfortunately, there is no way to specify “AutoFit” for a column in the Excel file format. This feature is only available at runtime from within Excel. Thanks, Taka 2024年2月22日(木) 19:19 Chris Swain via Rdkit-discuss < rdk...@li...>: > Hi, > > Is it possible to export from a Pandas data frame to Excel, inserting the > structures as images in the excel sheet? > > Cheers > > Chris > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Chris S. <sw...@ma...> - 2024-02-22 10:17:58
|
Hi, Is it possible to export from a Pandas data frame to Excel, inserting the structures as images in the excel sheet? Cheers Chris |
From: Eduardo M. <edu...@gm...> - 2024-02-18 12:11:06
|
Hello, Are you passionate about computational chemistry, eager to explore chemical space, thrilled by deep learning, and fascinated by aromatic molecules? If so, we have the perfect opportunity for you! The Poranne Research Group, at the Technion - Israel Institute of Technology, is currently recruiting graduate students to join our dynamic team. As part of our team, you'll delve into cutting-edge research, collaborate with experts in the field, and contribute to groundbreaking discoveries in the interfaces of physical organic chemistry, organic electronics and machine learning. Don't miss out on this opportunity to be part of our vibrant research group. Visit our group website <https://poranne-group.github.io/> to learn more about our research and how you can apply. Best, Eduardo |
From: Santiago F. <san...@me...> - 2024-02-14 14:11:35
|
Good morning I am trying to convert some reactions in SMILES format to RXN and MRV format, but I am not able to obtain a good representation. The reaction components are really messed up in the final output. For example, using this reaction SMILES, these are the steps I am following: string reaction = "C.CO>cno.NN>coc.[H][H]"; RDKit::ChemicalReaction* result = RDKit::RxnSmartsToChemicalReaction(reaction, nullptr, true, true); RDDepict::compute2DCoordsForReaction(*result); string mrvReaction = RDKit::ChemicalReactionToMrvBlock(*result, false); or string rxnReaction = RDKit::ChemicalReactionToRxnBlock(*result, true, true); I am attaching both results. Best regards Santiago [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg] [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com> SANTIAGO FRAGA Software Developer san...@me...<mailto:%20s...@me...> MESTRELAB RESEARCH S.L. PHONE +34881976775 FAX +34981941079 Feliciano Barrera, 9B-Bajo 15706 Santiago de Compostela (SPAIN) Follow us: [Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/> |
From: Greg L. <gre...@gm...> - 2024-02-07 05:43:06
|
Hi Amy, On Tue, Feb 6, 2024 at 8:20 PM He, Amy <he...@bu...> wrote: > > > Emre, great to hear from you. I also just wanted to say that not all > entries in ZINC can be transformed into 3D structures. We encountered a > couple of instances where the annotated stereo is nonphysical, especially > at closely-spaced stereo centers in small cycles. I had thought to design a > check to capture these instances, but eventually I just gave up and > discarded the entries because no other available methods (including ETKDG) > or software can build 3D structures for 3D calculations. It would still be > helpful to consider a check even for 2D calculations. > > Yeah, this is a difficult one. It would be nice to have a check to catch at least the "simple" cases like these cage structures where there's only one possible relative stereo possible, but I haven't managed to find it yet. -greg |
From: He, A. <he...@bu...> - 2024-02-06 20:05:16
|
Hi Greg, Thanks so much for your kind suggestions. That also helped with our projects! Emre, great to hear from you. I also just wanted to say that not all entries in ZINC can be transformed into 3D structures. We encountered a couple of instances where the annotated stereo is nonphysical, especially at closely-spaced stereo centers in small cycles. I had thought to design a check to capture these instances, but eventually I just gave up and discarded the entries because no other available methods (including ETKDG) or software can build 3D structures for 3D calculations. It would still be helpful to consider a check even for 2D calculations. Best regards, Amy From: Greg Landrum <gre...@gm...> Date: Tuesday, February 6, 2024 at 6:44 AM To: Emre Apaydın <emr...@gm...> Cc: He, Amy <he...@bu...>, rdk...@li... <rdk...@li...> Subject: Re: [Rdkit-discuss] Ligand conversion problem from 2D to 3D Hi Emre, Both of those compounds look like they have conflicting stereochemistry information in the ring systems, i. e. the stereo which is specified cannot actually exist. There's something else going on as well (that looks like a bug) Hi Emre, Both of those compounds look like they have conflicting stereochemistry information in the ring systems, i.e. the stereo which is specified cannot actually exist. There's something else going on as well (that looks like a bug) but this is already a big enough problem. The easiest thing to do with these compounds (or things which have this kind of problem) is to just disable the stereochemistry entirely by removing all instances of the @ symbol from the input strings: In [26]: ps = rdDistGeom.ETKDGv3() In [27]: m = Chem.AddHs(Chem.MolFromSmiles('COc1cc(Br)c(C[N+]2(CCOCC[C@H]3CC[C@H]4C[C@H]3C4(C)C)CCO ...: CC2)cc1OC'.replace('@',''))) In [28]: rdDistGeom.EmbedMolecule(m,ps) Out[28]: 0 In [29]: m = Chem.AddHs(Chem.MolFromSmiles('CC(C)(C)OC(=O)N[C@H](C[Si](C)(C)C)C(=O)N1CCC[C@H]1C(=O) ...: N[C@@H](CCCCN)B1O[C@@H]2C[C@H]3C[C@@H](C3(C)C)[C@]2(C)O1'.replace('@',''))) In [30]: rdDistGeom.EmbedMolecule(m,ps) Out[30]: 0 -greg On Tue, Feb 6, 2024 at 9:42 AM Emre Apaydın <emr...@gm...<mailto:emr...@gm...>> wrote: Thank you so much for your help. I managed to convert most of the molecules from 2D to 3D, but no matter which ETKDG version, which embedding parameter I try, I cannot convert these two molecules; ZINC000101210593, ZINC000196058327. Is there an alternative feature or method I can try? I would be grateful if you could help me. Thank you! He, Amy <he...@bu...<mailto:he...@bu...>>, 11 Oca 2024 Per, 01:58 tarihinde şunu yazdı: Hi Emre! You can get more detailed info on failed conformer generations through rdDistGeom.EmbedFailureCauses, see: https://greglandrum.github.io/rdkit-blog/posts/2023-05-17-understanding-confgen-errors.html<https://urldefense.com/v3/__https:/greglandrum.github.io/rdkit-blog/posts/2023-05-17-understanding-confgen-errors.html__;!!KGKeukY!zC3dbTgBTpt6IY3K9mQhnSMqzjaX5IpXP3eSNuuiYvDoiIL42h3Xzv5cqkiPUC-E9jJku-W-Wyl54FstHYVR9uYkJGCMiB5D0Q$> Bests, -- Amy He Hadad Lab @ OSU He...@os...<mailto:He...@os...> From: Emre Apaydın <emr...@gm...<mailto:emr...@gm...>> Date: Wednesday, January 10, 2024 at 8:47 AM To: rdk...@li...<mailto:rdk...@li...> <rdk...@li...<mailto:rdk...@li...>> Subject: [Rdkit-discuss] Ligand conversion problem from 2D to 3D Hello, I want to convert the 2D ligands I downloaded as sdf format from the ZINC library to 3D, but almost half of them are not converted to 3D. Some of them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 ligands are not converted Hello, I want to convert the 2D ligands I downloaded as sdf format from the ZINC library to 3D, but almost half of them are not converted to 3D. Some of them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 ligands are not converted to 3D in this way. When I run the script, I do not get any warning or error in IDE. When I look at the output of my Try, Except commands, I see "ZINC000008214373.sdf : rdDistGeom.EmbedMolecule(mol, etkdgv3) = Failed ZINC000008214373.sdf : rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed" It outputs like this for ligands that are not translated to 3D. When I try different methods, the ligands are converted to 3D. I wonder if there is something missing or wrong with my script. I would be grateful if you can help me. Thank you! ``` from rdkit import Chem from rdkit.Chem import rdDistGeom from rdkit.Chem import rdForceFieldHelpers from rdkit.Chem import rdPartialCharges import os ligands_dir = "ligands" output_dir = "new_ligands" status_file = "process_status.txt" if not os.path.exists(output_dir): os.makedirs(output_dir) sdf_files = [f for f in os.listdir(ligands_dir) if f.endswith(".sdf")] with open(status_file, 'w') as status: for sdf_file in sdf_files: input_path = os.path.join(ligands_dir, sdf_file) output_path = os.path.join(output_dir, sdf_file) mol = Chem.MolFromMolFile(input_path) # Add hydrogens try: mol = Chem.AddHs(mol, addCoords=True) except: status.write(f"{sdf_file} : Chem.AddHs(mol) = Failed\\n") continue # 3D embedding etkdgv3 = rdDistGeom.ETKDGv3() embed_status = rdDistGeom.EmbedMolecule(mol, etkdgv3) if embed_status == -1: status.write(f"{sdf_file} : rdDistGeom.EmbedMolecule(mol, etkdgv3) = Failed\\n") # Compute Gasteiger charges try: rdPartialCharges.ComputeGasteigerCharges(mol) except: status.write(f"{sdf_file} : rdPartialCharges.ComputeGasteigerCharges(mol) = Failed\\n") # UFF energy minimization try: rdForceFieldHelpers.UFFOptimizeMolecule(mol) except: status.write(f"{sdf_file} : rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed\\n") Chem.MolToMolFile(mol, output_path) ``` _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss<https://urldefense.com/v3/__https:/lists.sourceforge.net/lists/listinfo/rdkit-discuss__;!!KGKeukY!zC3dbTgBTpt6IY3K9mQhnSMqzjaX5IpXP3eSNuuiYvDoiIL42h3Xzv5cqkiPUC-E9jJku-W-Wyl54FstHYVR9uYkJGBfYAHZhw$> |
From: Greg L. <gre...@gm...> - 2024-02-06 11:44:40
|
Hi Emre, Both of those compounds look like they have conflicting stereochemistry information in the ring systems, i.e. the stereo which is specified cannot actually exist. There's something else going on as well (that looks like a bug) but this is already a big enough problem. The easiest thing to do with these compounds (or things which have this kind of problem) is to just disable the stereochemistry entirely by removing all instances of the @ symbol from the input strings: In [26]: ps = rdDistGeom.ETKDGv3() In [27]: m = Chem.AddHs(Chem.MolFromSmiles('COc1cc(Br)c(C[N+]2(CCOCC[C@H ]3CC[C@H]4C[C@H]3C4(C)C)CCO ...: CC2)cc1OC'.replace('@',''))) In [28]: rdDistGeom.EmbedMolecule(m,ps) Out[28]: 0 In [29]: m = Chem.AddHs(Chem.MolFromSmiles('CC(C)(C)OC(=O)N[C@H ](C[Si](C)(C)C)C(=O)N1CCC[C@H]1C(=O) ...: N[C@@H](CCCCN)B1O[C@@H]2C[C@H]3C[C@@H](C3(C)C)[C@ ]2(C)O1'.replace('@',''))) In [30]: rdDistGeom.EmbedMolecule(m,ps) Out[30]: 0 -greg On Tue, Feb 6, 2024 at 9:42 AM Emre Apaydın <emr...@gm...> wrote: > Thank you so much for your help. I managed to convert most of the > molecules from 2D to 3D, but no matter which ETKDG version, which embedding > parameter I try, I cannot convert these two molecules; ZINC000101210593, > ZINC000196058327. Is there an alternative feature or method I can try? I > would be grateful if you could help me. Thank you! > > He, Amy <he...@bu...>, 11 Oca 2024 Per, 01:58 tarihinde > şunu yazdı: > >> Hi Emre! >> >> >> >> You can get more detailed info on failed conformer generations through >> *rdDistGeom.EmbedFailureCauses*, see: >> >> >> https://greglandrum.github.io/rdkit-blog/posts/2023-05-17-understanding-confgen-errors.html >> >> >> >> Bests, >> >> >> >> >> >> -- >> >> Amy He >> >> Hadad Lab @ OSU >> >> He...@os... >> >> >> >> *From: *Emre Apaydın <emr...@gm...> >> *Date: *Wednesday, January 10, 2024 at 8:47 AM >> *To: *rdk...@li... < >> rdk...@li...> >> *Subject: *[Rdkit-discuss] Ligand conversion problem from 2D to 3D >> >> Hello, I want to convert the 2D ligands I downloaded as sdf format from >> the ZINC library to 3D, but almost half of them are not converted to 3D. >> Some of them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 >> ligands are not converted >> >> Hello, >> >> >> >> I want to convert the 2D ligands I downloaded as sdf format from the ZINC >> library to 3D, but almost half of them are not converted to 3D. Some of >> them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 ligands >> are not converted to 3D in this way. When I run the script, I do not get >> any warning or error in IDE. When I look at the output of my Try, Except >> commands, I see "ZINC000008214373.sdf : rdDistGeom.EmbedMolecule(mol, >> etkdgv3) = Failed >> ZINC000008214373.sdf : rdForceFieldHelpers.UFFOptimizeMolecule(mol) = >> Failed" It outputs like this for ligands that are not translated to 3D. >> When I try different methods, the ligands are converted to 3D. I wonder if >> there is something missing or wrong with my script. I would be grateful if >> you can help me. >> >> >> >> Thank you! >> >> >> ``` >> from rdkit import Chem >> from rdkit.Chem import rdDistGeom >> from rdkit.Chem import rdForceFieldHelpers >> from rdkit.Chem import rdPartialCharges >> import os >> >> ligands_dir = "ligands" >> output_dir = "new_ligands" >> status_file = "process_status.txt" >> >> if not os.path.exists(output_dir): >> os.makedirs(output_dir) >> >> sdf_files = [f for f in os.listdir(ligands_dir) if f.endswith(".sdf")] >> >> with open(status_file, 'w') as status: >> for sdf_file in sdf_files: >> input_path = os.path.join(ligands_dir, sdf_file) >> output_path = os.path.join(output_dir, sdf_file) >> mol = Chem.MolFromMolFile(input_path) >> >> # Add hydrogens >> try: >> mol = Chem.AddHs(mol, addCoords=True) >> except: >> status.write(f"{sdf_file} : Chem.AddHs(mol) = Failed\\n") >> continue >> >> # 3D embedding >> etkdgv3 = rdDistGeom.ETKDGv3() >> embed_status = rdDistGeom.EmbedMolecule(mol, etkdgv3) >> if embed_status == -1: >> status.write(f"{sdf_file} : rdDistGeom.EmbedMolecule(mol, >> etkdgv3) = Failed\\n") >> >> # Compute Gasteiger charges >> try: >> rdPartialCharges.ComputeGasteigerCharges(mol) >> except: >> status.write(f"{sdf_file} : >> rdPartialCharges.ComputeGasteigerCharges(mol) = Failed\\n") >> >> # UFF energy minimization >> try: >> rdForceFieldHelpers.UFFOptimizeMolecule(mol) >> except: >> status.write(f"{sdf_file} : >> rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed\\n") >> >> Chem.MolToMolFile(mol, output_path) >> ``` >> > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Emre A. <emr...@gm...> - 2024-02-06 08:40:36
|
Thank you so much for your help. I managed to convert most of the molecules from 2D to 3D, but no matter which ETKDG version, which embedding parameter I try, I cannot convert these two molecules; ZINC000101210593, ZINC000196058327. Is there an alternative feature or method I can try? I would be grateful if you could help me. Thank you! He, Amy <he...@bu...>, 11 Oca 2024 Per, 01:58 tarihinde şunu yazdı: > Hi Emre! > > > > You can get more detailed info on failed conformer generations through > *rdDistGeom.EmbedFailureCauses*, see: > > > https://greglandrum.github.io/rdkit-blog/posts/2023-05-17-understanding-confgen-errors.html > > > > Bests, > > > > > > -- > > Amy He > > Hadad Lab @ OSU > > He...@os... > > > > *From: *Emre Apaydın <emr...@gm...> > *Date: *Wednesday, January 10, 2024 at 8:47 AM > *To: *rdk...@li... < > rdk...@li...> > *Subject: *[Rdkit-discuss] Ligand conversion problem from 2D to 3D > > Hello, I want to convert the 2D ligands I downloaded as sdf format from > the ZINC library to 3D, but almost half of them are not converted to 3D. > Some of them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 > ligands are not converted > > Hello, > > > > I want to convert the 2D ligands I downloaded as sdf format from the ZINC > library to 3D, but almost half of them are not converted to 3D. Some of > them are; ZINC000008214373, ZINC000001530666, ZINC000085545180. 208 ligands > are not converted to 3D in this way. When I run the script, I do not get > any warning or error in IDE. When I look at the output of my Try, Except > commands, I see "ZINC000008214373.sdf : rdDistGeom.EmbedMolecule(mol, > etkdgv3) = Failed > ZINC000008214373.sdf : rdForceFieldHelpers.UFFOptimizeMolecule(mol) = > Failed" It outputs like this for ligands that are not translated to 3D. > When I try different methods, the ligands are converted to 3D. I wonder if > there is something missing or wrong with my script. I would be grateful if > you can help me. > > > > Thank you! > > > ``` > from rdkit import Chem > from rdkit.Chem import rdDistGeom > from rdkit.Chem import rdForceFieldHelpers > from rdkit.Chem import rdPartialCharges > import os > > ligands_dir = "ligands" > output_dir = "new_ligands" > status_file = "process_status.txt" > > if not os.path.exists(output_dir): > os.makedirs(output_dir) > > sdf_files = [f for f in os.listdir(ligands_dir) if f.endswith(".sdf")] > > with open(status_file, 'w') as status: > for sdf_file in sdf_files: > input_path = os.path.join(ligands_dir, sdf_file) > output_path = os.path.join(output_dir, sdf_file) > mol = Chem.MolFromMolFile(input_path) > > # Add hydrogens > try: > mol = Chem.AddHs(mol, addCoords=True) > except: > status.write(f"{sdf_file} : Chem.AddHs(mol) = Failed\\n") > continue > > # 3D embedding > etkdgv3 = rdDistGeom.ETKDGv3() > embed_status = rdDistGeom.EmbedMolecule(mol, etkdgv3) > if embed_status == -1: > status.write(f"{sdf_file} : rdDistGeom.EmbedMolecule(mol, > etkdgv3) = Failed\\n") > > # Compute Gasteiger charges > try: > rdPartialCharges.ComputeGasteigerCharges(mol) > except: > status.write(f"{sdf_file} : > rdPartialCharges.ComputeGasteigerCharges(mol) = Failed\\n") > > # UFF energy minimization > try: > rdForceFieldHelpers.UFFOptimizeMolecule(mol) > except: > status.write(f"{sdf_file} : > rdForceFieldHelpers.UFFOptimizeMolecule(mol) = Failed\\n") > > Chem.MolToMolFile(mol, output_path) > ``` > |
From: Lewis M. <lew...@gm...> - 2024-02-05 22:20:04
|
Good catch, thank you Diogo! Recognising the difficulties of tautomer enumeration: For my own purposes, the ideal behaviour would be to get the set of all three plausible tautomers of 'mol1' no matter what the input SMILES. Looks like there's already a Github Issue up (https://github.com/rdkit/rdkit/issues/5937) but I can add this if it has a different cause. thanks all Lewis On Tue, Feb 6, 2024 at 7:23 AM Diogo Martins <dio...@gm...> wrote: > Hello, > > I think it's a bug because the tautomers depend on how the input SMILES is > written. Both represent mol1: > > Sc1ncc2c(c1)cccc2 > Sc1cc2ccccc2cn1 > > However the resulting tautomers differ depending on which is used as input. > > Best regards, > Diogo > > On Mon, 5 Feb 2024 at 11:38, Lewis Martin <lew...@gm...> > wrote: > >> Thank you very much for the detective work, Wim! This is helpful. >> >> It looks like the _reverse_ transition is possible, though. If I start by >> generating tautomers of "mol2", then "mol1" is recovered, which indicates >> this is an allowed transform. Is it possible that one direction is allowed >> but not the reverse? >> >> Failing a solution there, does anyone know if it is possible to add >> SMIRKS to the allowed tautomers through the python interface? >> Thanks, >> Lewis >> >> On Mon, Feb 5, 2024 at 9:52 PM Wim Dehaen <wim...@gm...> wrote: >> >>> hi lewis, >>> if i am not mistaken this is because the tautomer transfor "1,3 aromatic >>> heteroatom H shift" does not account for other chalcogens than oxygen, so >>> no selenium, tellurium or sulfur. >>> you can find the list of transforms here: >>> https://github.com/rdkit/rdkit/blob/8dae48b7a17fd984c69d04549e6d9b53690f5c52/Code/GraphMol/MolStandardize/TautomerCatalog/tautomerTransforms.in#L46 >>> (poiting to the line with the relevant transform). >>> best wishes >>> wim >>> >>> On Mon, Feb 5, 2024 at 3:26 AM Lewis Martin <lew...@gm...> >>> wrote: >>> >>>> Hi all, >>>> I'm looking at scoring tautomers, and using the 'tautobase' dataset >>>> used by Weider et al* at: >>>> >>>> https://github.com/choderalab/neutromeratio/blob/master/data/b3lyp_tautobase_subset.txt >>>> >>>> This dataset has pairs of tautomers with experimental logK values to >>>> determine the preferred tautomer. >>>> >>>> In at least one case, depending on which tautomer you use as the >>>> 'entry' point, the enumerated tautomers by RDKit either do or don't include >>>> both of the pair of input molecules. *I'm hoping there's a way to >>>> uniquely recover the full set of possible tautomers from using any input >>>> tautomer. * >>>> >>>> Here's a code example: >>>> >>>> from rdkit import Chem >>>>> >>>> from rdkit.Chem import Draw >>>> >>>> from rdkit.Chem.Draw import IPythonConsole >>>>> IPythonConsole.drawOptions.addStereoAnnotation = True >>>>> from rdkit.Chem.MolStandardize import rdMolStandardize >>>>> >>>>> #same result if you don't do any of these params. >>>> >>>> tautomer_params = >>>>> Chem.MolStandardize.rdMolStandardize.CleanupParameters() >>>>> tautomer_params.tautomerRemoveSp3Stereo = False >>>>> tautomer_params.tautomerRemoveBondStereo = False >>>>> tautomer_params.tautomerRemoveIsotopicHs = False >>>>> tautomer_params.tautomerReassignStereo = False >>>>> tautomer_params.doCanonical = True >>>>> >>>>> enumerator = rdMolStandardize.TautomerEnumerator(tautomer_params) >>>>> >>>>> smi1 = 'Sc1cc2ccccc2cn1' >>>>> smi2 = 'S=c1cc2ccccc2c[nH]1' >>>>> mol1 = Chem.MolFromSmiles(smi1) >>>>> mol2 = Chem.MolFromSmiles(smi2) >>>>> >>>>> #choose mol1 or mol2 to be source of tautomers: >>>>> #choose mol1, and look at the tautomers. Note that mol2 isn't present! >>>>> tauts = [Chem.MolFromSmiles(Chem.MolToSmiles(m)) for m in >>>>> enumerator.Enumerate(mol1)] >>>>> >>>>> Draw.MolsToGridImage([mol1, mol2]+tauts, legends=['mol1', 'mol2 (not >>>>> present in tauts!)'] + [f'taut{i}' for i in range(len(tauts))], >>>>> molsPerRow=4) >>>>> >>>> >>>> And a picture of this in a notebook for an at-a-glance view: >>>> https://gist.github.com/ljmartin/4a9d9eb684df3e11e59fc6502a4b7b03 >>>> >>>> Does anyone know a way to recover "mol2" within tautomers of "mol1"? >>>> >>>> Thank you! >>>> Lewis >>>> >>>> >>>> _______________________________________________ >>>> Rdkit-discuss mailing list >>>> Rdk...@li... >>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>> >>> _______________________________________________ >> Rdkit-discuss mailing list >> Rdk...@li... >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > |