rdkit-discuss Mailing List for RDKit (Page 4)
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
(27) |
Oct
(4) |
Nov
(20) |
Dec
(4) |
2008 |
Jan
(12) |
Feb
(2) |
Mar
(23) |
Apr
(40) |
May
(30) |
Jun
(6) |
Jul
(35) |
Aug
(60) |
Sep
(31) |
Oct
(33) |
Nov
(35) |
Dec
(3) |
2009 |
Jan
(16) |
Feb
(77) |
Mar
(88) |
Apr
(57) |
May
(33) |
Jun
(27) |
Jul
(55) |
Aug
(26) |
Sep
(12) |
Oct
(45) |
Nov
(42) |
Dec
(23) |
2010 |
Jan
(64) |
Feb
(17) |
Mar
(30) |
Apr
(55) |
May
(30) |
Jun
(65) |
Jul
(112) |
Aug
(26) |
Sep
(67) |
Oct
(20) |
Nov
(67) |
Dec
(23) |
2011 |
Jan
(57) |
Feb
(43) |
Mar
(50) |
Apr
(66) |
May
(95) |
Jun
(73) |
Jul
(64) |
Aug
(47) |
Sep
(22) |
Oct
(56) |
Nov
(51) |
Dec
(34) |
2012 |
Jan
(64) |
Feb
(45) |
Mar
(65) |
Apr
(85) |
May
(76) |
Jun
(47) |
Jul
(75) |
Aug
(72) |
Sep
(31) |
Oct
(77) |
Nov
(61) |
Dec
(41) |
2013 |
Jan
(68) |
Feb
(63) |
Mar
(36) |
Apr
(73) |
May
(61) |
Jun
(69) |
Jul
(98) |
Aug
(60) |
Sep
(74) |
Oct
(102) |
Nov
(92) |
Dec
(63) |
2014 |
Jan
(112) |
Feb
(84) |
Mar
(72) |
Apr
(59) |
May
(96) |
Jun
(54) |
Jul
(91) |
Aug
(54) |
Sep
(38) |
Oct
(47) |
Nov
(33) |
Dec
(39) |
2015 |
Jan
(41) |
Feb
(115) |
Mar
(66) |
Apr
(87) |
May
(63) |
Jun
(53) |
Jul
(61) |
Aug
(59) |
Sep
(115) |
Oct
(42) |
Nov
(60) |
Dec
(20) |
2016 |
Jan
(52) |
Feb
(72) |
Mar
(100) |
Apr
(125) |
May
(61) |
Jun
(106) |
Jul
(62) |
Aug
(74) |
Sep
(151) |
Oct
(151) |
Nov
(117) |
Dec
(148) |
2017 |
Jan
(106) |
Feb
(75) |
Mar
(106) |
Apr
(67) |
May
(85) |
Jun
(144) |
Jul
(53) |
Aug
(73) |
Sep
(188) |
Oct
(106) |
Nov
(118) |
Dec
(74) |
2018 |
Jan
(96) |
Feb
(43) |
Mar
(40) |
Apr
(111) |
May
(77) |
Jun
(112) |
Jul
(64) |
Aug
(85) |
Sep
(73) |
Oct
(117) |
Nov
(97) |
Dec
(47) |
2019 |
Jan
(63) |
Feb
(112) |
Mar
(109) |
Apr
(61) |
May
(51) |
Jun
(41) |
Jul
(57) |
Aug
(68) |
Sep
(47) |
Oct
(126) |
Nov
(117) |
Dec
(96) |
2020 |
Jan
(84) |
Feb
(82) |
Mar
(80) |
Apr
(100) |
May
(78) |
Jun
(68) |
Jul
(76) |
Aug
(69) |
Sep
(76) |
Oct
(73) |
Nov
(69) |
Dec
(42) |
2021 |
Jan
(44) |
Feb
(30) |
Mar
(85) |
Apr
(65) |
May
(41) |
Jun
(72) |
Jul
(55) |
Aug
(9) |
Sep
(44) |
Oct
(44) |
Nov
(30) |
Dec
(40) |
2022 |
Jan
(35) |
Feb
(29) |
Mar
(55) |
Apr
(30) |
May
(31) |
Jun
(27) |
Jul
(49) |
Aug
(15) |
Sep
(17) |
Oct
(25) |
Nov
(15) |
Dec
(40) |
2023 |
Jan
(32) |
Feb
(10) |
Mar
(10) |
Apr
(21) |
May
(33) |
Jun
(31) |
Jul
(12) |
Aug
(17) |
Sep
(14) |
Oct
(12) |
Nov
(8) |
Dec
(12) |
2024 |
Jan
(10) |
Feb
(18) |
Mar
(7) |
Apr
(4) |
May
(6) |
Jun
(4) |
Jul
(5) |
Aug
(6) |
Sep
(8) |
Oct
(1) |
Nov
(1) |
Dec
|
2025 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Greg L. <gre...@gm...> - 2023-12-19 03:38:22
|
Hi Marawan, We don't currently support CSRML. It is certainly an interesting and flexible format, so it would be cool to have, but it would be a fair amount of work to implement. -greg On Tue, Dec 19, 2023 at 4:19 AM Marawan Hussien via Rdkit-discuss < rdk...@li...> wrote: > Hello, > > I am wondering if standard rdkit supports CSRML, I would like to encode > the toxprint chemotypes as binary fingerprints for a bunch of molecules to > train on, > > Thanks, > Marawan > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Marawan H. <mar...@ya...> - 2023-12-19 03:16:26
|
Hello, I am wondering if standard rdkit supports CSRML, I would like to encode the toxprint chemotypes as binary fingerprints for a bunch of molecules to train on, Thanks,Marawan |
From: Pavel P. <pav...@uk...> - 2023-12-15 08:15:26
|
Dear colleagues, we are happy to invite you to the 7th Advanced In Silico Drug Design workshop which will be 29 January - 02 February 2024 in Olomouc. This year we cover topics on: - PDBe database services - virtual screening - machine learning and AI - structure- and ligand-based drug design tools - pharmacophore modeling - molecular docking and dynamics - de novo design - frequent hitters - natural compounds and many more. Lectures and tutorials will be provided by 12 experts in the field from Austria, Great Britain, Italy and Czech Republic. At the final day participants may win prizes in a competition where they would be able to apply gained knowledge and demonstrate their skills. The web-site of the workshop https://www.kfc.upol.cz/7add. Welcome to Olomouc! Kind regards, Pavel |
From: Mandar K. <man...@gm...> - 2023-12-14 03:41:26
|
Hello Everyone, Thanks a lot for your detailed replies and suggestions. Thanks, Andrew, for the code block; that also helped me understand what was happening. I just realized my replies are not sent to the rdkit-discussion group. So, I am summarizing the origin of the warning and solution below for future readers: The output SDFs from the docking program do not have a second line mentioning 2D or 3D structure, so RDKit SDMolSupplier raised a warning but treated a structure as 3D. I processed these SDFs with obabel to add hydrogens, and then the second line is present, mentioning 3D structure as indicated by Jan Jensen in previous thread. Best, Mandar Kulkarni On Wed, Dec 13, 2023 at 4:28 PM Andrew Dalke <da...@da...> wrote: > > On Dec 13, 2023, at 06:46, Mandar Kulkarni < > man...@gm...> wrote: > > Thanks a lot for the detailed answer. In my case, this line is blank, > probably leading to warnings. > > Here's what RDKit does, from Code/GraphMol/FileParsers/MolFileParser.cpp > > If the dimension code is not 2D and not 3D it assumes 2D: > > if (tempStr.length() >= 22) { > std::string dimLabel = tempStr.substr(20, 2); > // Unless labelled as 3D we assume 2D > if (dimLabel == "3d" || dimLabel == "3D") { > res->setProp(common_properties::_3DConf, 1); > } > } > > > Then, if there are non-zero coordinates and it's not marked as 2D, it > issues the warning: > > bool nonzeroZ = hasNonZeroZCoords(conf); > > if (!nonzeroZ && marked3d == 1) { > .. I'm skipping the code for this case .. > } else if (marked3d == 0 && nonzeroZ) { > BOOST_LOG(rdWarningLog) > << "Warning: molecule is tagged as 2D, but at least one Z > coordinate is not zero. " > "Marking the mol as 3D." > << std::endl; > return true; > } > > Andrew > da...@da... > > > |
From: Jan H. J. <ja...@bi...> - 2023-12-13 07:22:04
|
You can also cross-check with standard InChI to see if this is an RDKit issue or a more general InChI issue. To convert InChI strings (and optionally AuxInfo) to SDF format with the standard inchi-1 executable, put the InChI string and AuxInfo into a text file and convert it like this. P:\Projects\RInChI\INCHI-1-BIN__V1.06\windows\32bit>*type test.txt* InChI=1/Ca.2H AuxInfo=1/0/N:1;2;3/rA:3Ca0H0H0/rB:;;/rC:;;; P:\Projects\RInChI\INCHI-1-BIN__V1.06\windows\32bit>*inchi-1.exe /InChI2Struct /OutputSDF test.txt* InChI version 1, Software v. 1.06 (inchi-1 executable) Windows 32-bit Build (MS VS 2015) of Dec 18 2020 20:45:14 Opened log file 'test.txt.log' Opened input file 'test.txt' Opened output file 'test.txt.txt' Opened problem file 'test.txt.prb' The command line used: "inchi-1.exe /InChI2Struct /OutputSDF test.txt" Converting InChI(s) to structure(s) in MOL format Output SDfile only without stereochemical information and atom coordinates Input format: InChI (plain identifier) Output format: SDfile only (without stereochemical info and atom coordinates) Timeout per structure: 60000 msec Up to 1024 atoms per structure Finished processing 1 structure: 0 errors, processing time 0:00:00.00 Elapsed walltime: 15 msec. P:\Projects\RInChI\INCHI-1-BIN__V1.06\windows\32bit>type test.txt.txt Structure #1. InChIV10 3 0 0 0 0 0 0 0 0 0 1 V2000 0.0000 0.0000 0.0000 Ca 0 0 0 0 15 0 0 0 0 0.0000 0.0000 0.0000 H 0 0 0 0 15 0 0 0 0 0.0000 0.0000 0.0000 H 0 0 0 0 15 0 0 0 0 M END $$$$ P:\Projects\RInChI\INCHI-1-BIN__V1.06\windows\32bit> Cheers -- Jan On 2023-12-12 07:59, S Joshua Swamidass wrote: > Perhaps provide some examples were this failure happens. > > Sent from Gmail Mobile > > > On Tue, Nov 28, 2023 at 7:35 PM 李大舟 <lid...@sy...> wrote: > > Dear RDKit Developers and Maintainers, > > I hope this email finds you well. My name is Dr. Dazhou Li, and I > am a researcher working on the development of a tool for > extracting chemical compound structures recognized by OCR (Optical > Character Recognition) technology. I have been using the RDKit > library for a crucial step in this process, specifically the > rdkit.Chem.inchi.MolFromInchi() function, to convert InChI-format > strings into Mol format representations. > > Firstly, I would like to express my gratitude for the excellent > work you have done in developing and maintaining the RDKit > library, which has been an invaluable resource in my research. The > library has consistently delivered high-quality results in various > aspects of chemical informatics, and I appreciate your dedication > to its development. > > However, I have encountered a specific issue with the > rdkit.Chem.inchi.MolFromInchi() function that I hope you can help > me understand and resolve. When attempting to convert InChI-format > strings generated by my tool, some of them fail with an error > message reporting "NaN." Since the rdkit.Chem.inchi.MolFromInchi() > function calls C++ code, I am unable to directly inspect its > execution or source code to diagnose the issue. > > My primary request is for assistance in understanding the internal > workings of the rdkit.Chem.inchi.MolFromInchi() function, > specifically the checking process or generation step that leads to > the "NaN" error when certain InChI-format strings are processed. > It is crucial for my research to determine at which point in the > execution of this function my generated InChI-formatted strings > are considered unreasonable, as this information will help me > refine my tool's output to be compatible with RDKit. > > I understand that the RDKit library is a complex and comprehensive > toolkit, and I appreciate the complexity involved in diagnosing > such issues. However, any insights or guidance you can provide > regarding the problematic cases and the internal processes of the > rdkit.Chem.inchi.MolFromInchi() function would be immensely > valuable to me and would help me ensure the compatibility of my > tool with RDKit. > > If possible, I would be grateful for access to relevant > documentation or insights into the specific error conditions that > may lead to the "NaN" result. Additionally, any suggestions or > best practices for generating InChI-format strings that are more > likely to be successfully processed by RDKit would be greatly > appreciated. > > Thank you for your time and consideration. I look forward to your > response and hope that we can collaborate to resolve this issue > and enhance the compatibility of my tool with the RDKit library. > > Please feel free to reach out to me if you require any additional > information or if there are specific details about my tool or the > InChI-format strings that would aid in diagnosing the issue. > > Best regards, > > Dr. Dazhou Li > Shenyang University of Chemical Technology > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Jan H. J. <ja...@bi...> - 2023-12-13 07:21:25
|
> I could not figure out how Rdkit is guessing it as 2D structure, as there is no such information in SDF. Ah, but there is, just a little hidden :-). The source-and-timestamp line of each molfile in the SDF contains that information. The line is the second line of the molfile and you can find the 2D/3D tag as the last two characters of that line. An example: :MOLFILE_BEGINS: Mol2Comp0618061807*2D* 1 0 0 0 0 0 0 0 0 0999 V2000 10.2967 -1.5283 0.0000 Ar 0 0 0 0 0 0 0 0 0 0 0 0 M END :MOLFILE_ENDS: Cheers -- Jan On 2023-12-13 03:39, Mandar Kulkarni wrote: > Hello, > > I am using RDKit 2023.9.2's SDMolSupplier to read docked SDF files > (V2000 formats, suppl = Chem.SDMolSupplier(sdf_file);) and getting a > warning as: > > Warning: molecule is tagged as 2D, but at least one Z coordinate is not zero. Marking the mol as 3D. > > I could not figure out how Rdkit is guessing it as 2D structure, as > there is no such information in SDF. > Is there any more information need to provide SDMolSupplier to make it > understand it's a 3D molecule? > I kindly look forward to hearing to suggestions. > TIhanks in advance, > Mandar Kulkarni |
From: Andrew D. <da...@da...> - 2023-12-13 05:55:29
|
Hi Mandar, > On Dec 13, 2023, at 03:39, Mandar Kulkarni <man...@gm...> wrote: > I could not figure out how Rdkit is guessing it as 2D structure, as there is no such information in SDF. Line 2 of the SDF record looks something like: RDKit 2D This line has the format (quoting from the documentation): IIPPPPPPPPMMDDYYHHmmddSSssssssssssEEEEEEEEEEEERRRRRR A2<--A8--><---A10-->A2I2<--F10.5-><---F12.5--><-I6-> ) where "User's first and last initials (l), program name (P), date/time (M/D/Y,H:m), dimensional codes (d) such as 2D or 3D, scaling factors (S, s), energy (E) if modeling program input, internal registry number (R)" In the example I gave, most of these fields are blank, except for the program name ("RDKit") and the dimension code "2D" in columns 21-22 (if I counted right). The dimension code indicates the structure is expected to be in 2D. > Is there any more information need to provide SDMolSupplier to make it understand it's a 3D molecule? It is only a warning. RDKit interprets the molecule as 3D despite the warning. The file format documentation also says: "The “dimensional code” is maintained explicitly. Thus “3D” really means 3D, although “2D” will be interpreted as 3D if any non-zero Z-coordinates are found. Best regards, Andrew da...@da... |
From: Mandar K. <man...@gm...> - 2023-12-13 02:40:31
|
Hello, I am using RDKit 2023.9.2's SDMolSupplier to read docked SDF files (V2000 formats, suppl = Chem.SDMolSupplier(sdf_file);) and getting a warning as: Warning: molecule is tagged as 2D, but at least one Z coordinate is not zero. Marking the mol as 3D. I could not figure out how Rdkit is guessing it as 2D structure, as there is no such information in SDF. Is there any more information need to provide SDMolSupplier to make it understand it's a 3D molecule? I kindly look forward to hearing to suggestions. TIhanks in advance, Mandar Kulkarni |
From: S J. S. <swa...@gm...> - 2023-12-12 06:59:28
|
Perhaps provide some examples were this failure happens. Sent from Gmail Mobile On Tue, Nov 28, 2023 at 7:35 PM 李大舟 <lid...@sy...> wrote: > Dear RDKit Developers and Maintainers, > > I hope this email finds you well. My name is Dr. Dazhou Li, and I am a > researcher working on the development of a tool for extracting chemical > compound structures recognized by OCR (Optical Character Recognition) > technology. I have been using the RDKit library for a crucial step in this > process, specifically the rdkit.Chem.inchi.MolFromInchi() function, to > convert InChI-format strings into Mol format representations. > > Firstly, I would like to express my gratitude for the excellent work you > have done in developing and maintaining the RDKit library, which has been > an invaluable resource in my research. The library has consistently > delivered high-quality results in various aspects of chemical informatics, > and I appreciate your dedication to its development. > > However, I have encountered a specific issue with the > rdkit.Chem.inchi.MolFromInchi() function that I hope you can help me > understand and resolve. When attempting to convert InChI-format strings > generated by my tool, some of them fail with an error message reporting > "NaN." Since the rdkit.Chem.inchi.MolFromInchi() function calls C++ code, I > am unable to directly inspect its execution or source code to diagnose the > issue. > > My primary request is for assistance in understanding the internal > workings of the rdkit.Chem.inchi.MolFromInchi() function, specifically the > checking process or generation step that leads to the "NaN" error when > certain InChI-format strings are processed. It is crucial for my research > to determine at which point in the execution of this function my generated > InChI-formatted strings are considered unreasonable, as this information > will help me refine my tool's output to be compatible with RDKit. > > I understand that the RDKit library is a complex and comprehensive > toolkit, and I appreciate the complexity involved in diagnosing such > issues. However, any insights or guidance you can provide regarding the > problematic cases and the internal processes of the > rdkit.Chem.inchi.MolFromInchi() function would be immensely valuable to me > and would help me ensure the compatibility of my tool with RDKit. > > If possible, I would be grateful for access to relevant documentation or > insights into the specific error conditions that may lead to the "NaN" > result. Additionally, any suggestions or best practices for generating > InChI-format strings that are more likely to be successfully processed by > RDKit would be greatly appreciated. > > Thank you for your time and consideration. I look forward to your response > and hope that we can collaborate to resolve this issue and enhance the > compatibility of my tool with the RDKit library. > > Please feel free to reach out to me if you require any additional > information or if there are specific details about my tool or the > InChI-format strings that would aid in diagnosing the issue. > > Best regards, > > Dr. Dazhou Li > Shenyang University of Chemical Technology > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Greg L. <gre...@gm...> - 2023-12-10 06:37:42
|
Dear all, The 2024 RDKit UGM will take place from 11-13 September in Zurich Switzerland. We'll post more information and open registration in Q1 of next year. Best regards, -greg |
From: Pavel P. <pav...@uk...> - 2023-12-01 06:47:38
|
Dear colleagues, we are happy to invite you to the 7th Advanced In Silico Drug Design workshop which will be 29 January - 02 February 2024 in Olomouc. This year we cover topics on: - PDBe database services - virtual screening - machine learning and AI - structure- and ligand-based drug design tools - pharmacophore modeling - molecular docking and dynamics - de novo design - frequent hitters - natural compounds and many more. Lectures and tutorials will be provided by 12 experts in the field from Austria, Great Britain, Italy and Czech Republic. At the final day participants may win prizes in a competition where they would be able to apply gained knowledge and demonstrate their skills. The web-site of the workshop https://www.kfc.upol.cz/7add. Welcome to Olomouc! Kind regards, Pavel |
From: <lid...@sy...> - 2023-11-29 01:33:09
|
Dear RDKit Developers and Maintainers, I hope this email finds you well. My name is Dr. Dazhou Li, and I am a researcher working on the development of a tool for extracting chemical compound structures recognized by OCR (Optical Character Recognition) technology. I have been using the RDKit library for a crucial step in this process, specifically the rdkit.Chem.inchi.MolFromInchi() function, to convert InChI-format strings into Mol format representations. Firstly, I would like to express my gratitude for the excellent work you have done in developing and maintaining the RDKit library, which has been an invaluable resource in my research. The library has consistently delivered high-quality results in various aspects of chemical informatics, and I appreciate your dedication to its development. However, I have encountered a specific issue with the rdkit.Chem.inchi.MolFromInchi() function that I hope you can help me understand and resolve. When attempting to convert InChI-format strings generated by my tool, some of them fail with an error message reporting "NaN." Since the rdkit.Chem.inchi.MolFromInchi() function calls C++ code, I am unable to directly inspect its execution or source code to diagnose the issue. My primary request is for assistance in understanding the internal workings of the rdkit.Chem.inchi.MolFromInchi() function, specifically the checking process or generation step that leads to the "NaN" error when certain InChI-format strings are processed. It is crucial for my research to determine at which point in the execution of this function my generated InChI-formatted strings are considered unreasonable, as this information will help me refine my tool's output to be compatible with RDKit. I understand that the RDKit library is a complex and comprehensive toolkit, and I appreciate the complexity involved in diagnosing such issues. However, any insights or guidance you can provide regarding the problematic cases and the internal processes of the rdkit.Chem.inchi.MolFromInchi() function would be immensely valuable to me and would help me ensure the compatibility of my tool with RDKit. If possible, I would be grateful for access to relevant documentation or insights into the specific error conditions that may lead to the "NaN" result. Additionally, any suggestions or best practices for generating InChI-format strings that are more likely to be successfully processed by RDKit would be greatly appreciated. Thank you for your time and consideration. I look forward to your response and hope that we can collaborate to resolve this issue and enhance the compatibility of my tool with the RDKit library. Please feel free to reach out to me if you require any additional information or if there are specific details about my tool or the InChI-format strings that would aid in diagnosing the issue. Best regards, Dr. Dazhou Li Shenyang University of Chemical Technology |
From: Ling C. <lin...@gm...> - 2023-11-20 05:23:22
|
Thank you Christian! This is good to know. The meaning of the number after "OH" is not defined in the web page. I'll take a look at the publication. But yes, I get the idea. Ling Christian Meyenburg <chr...@un...> 於 2023年11月18日週六 下午12:14寫道: > Hi Ling, > > On 2023-11-17 18:40, Ling Chan wrote: > > When I run MolToSmiles on a molecule with a 6-valenced sulfur, it > > produced a problematic smiles. Seems it's a bug? > > > > [...] > > > > Running "Chem.MolToSmiles(Chem.MolFromMolFile("mol.sdf"))" on the > > following sdf, I got > > 'C[S@OH16](F)(F)(F)(F)F' > > The SMILES looks correct to me. Have a look at the SMILES documentation > [0] regarding the Chiral Specification of octahedral structures (3.3.4). > > The OH16 does in fact *not* represent a Hydroxy group or something of > the sort, but is a closer description of the octahedral geometry. > > Best, > Chris > > [0] https://daylight.com/dayhtml/doc/theory/theory.smiles.html > > -- > Christian Meyenburg > > ZBH - Zentrum für Bioinformatik Hamburg > Universität Hamburg > Bundesstrasse 43 > D-20146 Hamburg > Germany > > Tel.: +49 40 42838 7353 > Fax.: +49 40 23951-2291 > e-Mail: chr...@un... > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Christian M. <chr...@un...> - 2023-11-18 20:11:46
|
Hi Ling, On 2023-11-17 18:40, Ling Chan wrote: > When I run MolToSmiles on a molecule with a 6-valenced sulfur, it > produced a problematic smiles. Seems it's a bug? > > [...] > > Running "Chem.MolToSmiles(Chem.MolFromMolFile("mol.sdf"))" on the > following sdf, I got > 'C[S@OH16](F)(F)(F)(F)F' The SMILES looks correct to me. Have a look at the SMILES documentation [0] regarding the Chiral Specification of octahedral structures (3.3.4). The OH16 does in fact *not* represent a Hydroxy group or something of the sort, but is a closer description of the octahedral geometry. Best, Chris [0] https://daylight.com/dayhtml/doc/theory/theory.smiles.html -- Christian Meyenburg ZBH - Zentrum für Bioinformatik Hamburg Universität Hamburg Bundesstrasse 43 D-20146 Hamburg Germany Tel.: +49 40 42838 7353 Fax.: +49 40 23951-2291 e-Mail: chr...@un... |
From: Ling C. <lin...@gm...> - 2023-11-17 17:40:31
|
Dear colleagues, When I run MolToSmiles on a molecule with a 6-valenced sulfur, it produced a problematic smiles. Seems it's a bug? Thanks. Ling Running "Chem.MolToSmiles(Chem.MolFromMolFile("mol.sdf"))" on the following sdf, I got 'C[S@OH16](F)(F)(F)(F)F' ------------------------------------------------------------------------------------------------------------- RDKit 3D 7 6 0 0 1 0 0 0 0 0999 V2000 -2.0677 2.3607 1.4012 F 0 0 0 0 0 0 0 0 0 0 0 0 -1.3050 2.6304 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0 0.0161 3.0975 0.8090 F 0 0 0 0 0 0 0 0 0 0 0 0 -0.5423 2.9001 -1.4012 F 0 0 0 0 0 0 0 0 0 0 0 0 -1.8443 4.1559 -0.0000 F 0 0 0 0 0 0 0 0 0 0 0 0 -2.6261 2.1634 -0.8090 F 0 0 0 0 0 0 0 0 0 0 0 0 -0.7117 0.9522 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 2 3 1 0 2 4 1 0 2 5 1 0 2 6 1 0 2 7 1 0 M END $$$$ |
From: Noel O'B. <bao...@gm...> - 2023-11-10 10:06:27
|
Hi all, I'll keep it short: we're currently recruiting for a Cheminformatician at Sosei Heptares. We have no specific grade in mind - you could be just finishing your PhD, or be much more experienced. Here's the link for more details: https://cezanneondemand.intervieweb.it/heptares/jobs/cheminformatician-computational-chemistry-team-37704/en/ Happy to answer questions about the position - just email me off-list. Regards, Noel |
From: Chris S. <sw...@ma...> - 2023-11-05 10:49:29
|
Thanks Chris > On 5 Nov 2023, at 09:23, Wim Dehaen <wim...@gm...> wrote: > > how about: > len(list(mol.GetAromaticAtoms())) > > best wishes > wim > > > On Sun, 5 Nov 2023, 08:41 Chris Swain via Rdkit-discuss, <rdk...@li... <mailto:rdk...@li...>> wrote: >> Hi, >> >> Perhaps I’m missing something obvious, but is there a way to calculate the number of aromatic atoms in a molecule? >> >> Cheers >> >> Chris >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdk...@li... <mailto:Rdk...@li...> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |
From: Wim D. <wim...@gm...> - 2023-11-05 09:23:58
|
how about: len(list(mol.GetAromaticAtoms())) best wishes wim On Sun, 5 Nov 2023, 08:41 Chris Swain via Rdkit-discuss, < rdk...@li...> wrote: > Hi, > > Perhaps I’m missing something obvious, but is there a way to calculate the > number of aromatic atoms in a molecule? > > Cheers > > Chris > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Chris S. <sw...@ma...> - 2023-11-05 07:39:14
|
Hi, Perhaps I’m missing something obvious, but is there a way to calculate the number of aromatic atoms in a molecule? Cheers Chris |
From: Nickolas J. <Nic...@eu...> - 2023-10-30 15:07:38
|
Good afternoon, I am attempting to compile and build Shape-It using Visual Studio and Anaconda . I believe I have all of my variables and environment configured correctly, although I could be wrong as well of course but the issue I am running into is when I attempt to build the project, and have confirmed that the build files have been written to the project folder, I run into errors in regards to some missing C++ header files it is searching for within RDKIT I believe. I know RDKIT is installed in my conda environment, and some of these C++ header files or source files is for GraphMol for example. I've looked everywhere in my conda environment or prefix conda environment and even outside my conda environment in Windows, GraphMol is not found anywhere on the machine. Not sure if I am missing something, but any insight is greatly appreciated. My colleague was able to install Shape-It successfully on Linux with no issues, and apparently no tweaking of environment variables whatsoever. Is Shape-It even possible to build on Windows? Thank you in advance [cid:image001.png@01DA0B14.3140C230] Best regards, Nick Jones IT Support Engineer [DISCOVERY -01] Eurofins Panlabs Inc. 6 Research Park Dr St. Charles, MO 63304 United States of America Cell Phone: 636-445-4759 or 636-328-8303 Email: Nic...@eu... Website: www.eurofinsdiscoveryservices.com<http://www.eurofinsdiscoveryservices.com/> and www.discoverx.com<http://www.discoverx.com/> |
From: He, A. <he...@bu...> - 2023-10-27 16:51:36
|
Hi Rocco, That is exactly what I was looking for. Thanks so much for your kind suggestion! Massive Thanks, Amy From: Rocco Moretti <rmo...@gm...> Date: Friday, October 27, 2023 at 12:30 PM To: He, Amy <he...@bu...> Cc: rdk...@li... <rdk...@li...> Subject: Re: [Rdkit-discuss] Is there a Smiles library for common amino acids and ligands that can be used for AssignBondOrdersFromTemplate I'll note that the official definitions for all the chemical entities in the PDB can be found in the wwPDB's Chemical Component Dictionary: https: //www. wwpdb. org/data/ccd That's in mmCIF format, but there are various SMILES and I'll note that the official definitions for all the chemical entities in the PDB can be found in the wwPDB's Chemical Component Dictionary: https://www.wwpdb.org/data/ccd<https://urldefense.com/v3/__https:/www.wwpdb.org/data/ccd__;!!KGKeukY!y9HDu9vHVJVCcrYdSUZbjDSBQhAAwMD-nxfmBFBdHFgYzBJt5OMl2TjF4lNApoBGu8c1ht_UtIxEBeozNWEJfjnXFN_BcxlM$> That's in mmCIF format, but there are various SMILES and InChI definitions for the residues included in the file. (Your mileage may vary for the quality of those representations, though, especially for the rarer ones, but it should be no worse than the SDFs.) You should be able to use an mmCIF parser to extract them. e.g. from mmcif.core.mmciflib import ParseCifSimple # py-mmcif from the RCSB: `pip install mmcif` ccd = ParseCifSimple("components.cif", True, 0, 255, "?", "logfile.txt") # logfile.txt is an arbitrary name ALA = ccd.GetBlock("ALA") desc = ALA.GetTable("pdbx_chem_comp_descriptor") print( desc.GetColumnNames() ) for ii in range(desc.GetNumRows()): print( desc.GetRow(ii) ) ['comp_id', 'type', 'program', 'program_version', 'descriptor'] ['ALA', 'SMILES', 'ACDLabs', '10.04', 'O=C(O)C(N)C'] ['ALA', 'SMILES_CANONICAL', 'CACTVS', '3.341', 'C[C@H](N)C(O)=O'] ['ALA', 'SMILES', 'CACTVS', '3.341', 'C[CH](N)C(O)=O'] ['ALA', 'SMILES_CANONICAL', 'OpenEye OEToolkits', '1.5.0', 'C[C@@H](C(=O)O)N'] ['ALA', 'SMILES', 'OpenEye OEToolkits', '1.5.0', 'CC(C(=O)O)N'] ['ALA', 'InChI', 'InChI', '1.03', 'InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5,6)/t2-/m0/s1'] ['ALA', 'InChIKey', 'InChI', '1.03', 'QNAYBMKLOCPYGJ-REOHCLBHSA-N'] The components file is rather large, so parsing time might be a little long at times. On Fri, Oct 27, 2023 at 10:55 AM He, Amy <he...@bu...<mailto:he...@bu...>> wrote: Dear RDKit experts, I need your advice on finding a source Smiles library for reference, to build the template molecule from Smiles for AssignBondOrdersFromTemplate<https://urldefense.com/v3/__https:/www.rdkit.org/docs/source/rdkit.Chem.AllChem.html__;!!KGKeukY!y9HDu9vHVJVCcrYdSUZbjDSBQhAAwMD-nxfmBFBdHFgYzBJt5OMl2TjF4lNApoBGu8c1ht_UtIxEBeozNWEJfjnXFI6QyOaN$>. I am using AssignBondOrdersFromTemplate to perceive bonds in a residue-wise manner from an input PDB, using a reference Smiles library like this: ref_smi = { "ALA": "NC(C)C(=O)", "GLY": "NCC(=O)", "ILE": "NC(C(C)CC)C(=O)", } I wonder if there has been an open reference library for common amino acids and ligands that present in PDB files. A previous post on rdkit-discuss (https://rdkit-discuss.narkive.com/JM2IGLQz/pdb-reader-and-bond-perception<https://urldefense.com/v3/__https:/rdkit-discuss.narkive.com/JM2IGLQz/pdb-reader-and-bond-perception__;!!KGKeukY!y9HDu9vHVJVCcrYdSUZbjDSBQhAAwMD-nxfmBFBdHFgYzBJt5OMl2TjF4lNApoBGu8c1ht_UtIxEBeozNWEJfjnXFC9xFZti$>) points me to this website: ftp://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/files/pdb.tar.gz<https://urldefense.com/v3/__ftp:/ftp.ebi.ac.uk/pub/databases/msd/pdbechem/files/pdb.tar.gz__;!!KGKeukY!y9HDu9vHVJVCcrYdSUZbjDSBQhAAwMD-nxfmBFBdHFgYzBJt5OMl2TjF4lNApoBGu8c1ht_UtIxEBeozNWEJfjnXFPDTdrMJ$> and useful links from http://www.ebi.ac.uk/pdbe-srv/pdbechem/<https://urldefense.com/v3/__http:/www.ebi.ac.uk/pdbe-srv/pdbechem/__;!!KGKeukY!y9HDu9vHVJVCcrYdSUZbjDSBQhAAwMD-nxfmBFBdHFgYzBJt5OMl2TjF4lNApoBGu8c1ht_UtIxEBeozNWEJfjnXFCmcNNh2$> But I am no longer able to access the contents. I guess we could always generate Smiles from the standardized SDF files.. Still I am wondering if there is an existing Smiles library (like a reference datafile), where we can retrieve the Smiles string using the residue names of common amino acids and maybe also ligands. Any comments or suggestions would be greatly appreciated. Thank you for your time and kind support in advance! Bests, -- Amy He Chemistry Graduate Teaching Assistant Hadad Lab Ohio State University he...@os...<mailto:he...@os...> _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss<https://urldefense.com/v3/__https:/lists.sourceforge.net/lists/listinfo/rdkit-discuss__;!!KGKeukY!y9HDu9vHVJVCcrYdSUZbjDSBQhAAwMD-nxfmBFBdHFgYzBJt5OMl2TjF4lNApoBGu8c1ht_UtIxEBeozNWEJfjnXFOA_LWfL$> |
From: Rocco M. <rmo...@gm...> - 2023-10-27 16:30:38
|
I'll note that the official definitions for all the chemical entities in the PDB can be found in the wwPDB's Chemical Component Dictionary: https://www.wwpdb.org/data/ccd That's in mmCIF format, but there are various SMILES and InChI definitions for the residues included in the file. (Your mileage may vary for the quality of those representations, though, especially for the rarer ones, but it should be no worse than the SDFs.) You should be able to use an mmCIF parser to extract them. e.g. from mmcif.core.mmciflib import ParseCifSimple # py-mmcif from the RCSB: `pip install mmcif` ccd = ParseCifSimple("components.cif", True, 0, 255, "?", "logfile.txt") # logfile.txt is an arbitrary name ALA = ccd.GetBlock("ALA") desc = ALA.GetTable("pdbx_chem_comp_descriptor") print( desc.GetColumnNames() ) for ii in range(desc.GetNumRows()): print( desc.GetRow(ii) ) *['comp_id', 'type', 'program', 'program_version', 'descriptor']* *['ALA', 'SMILES', 'ACDLabs', '10.04', 'O=C(O)C(N)C']['ALA', 'SMILES_CANONICAL', 'CACTVS', '3.341', 'C[C@H](N)C(O)=O']['ALA', 'SMILES', 'CACTVS', '3.341', 'C[CH](N)C(O)=O']['ALA', 'SMILES_CANONICAL', 'OpenEye OEToolkits', '1.5.0', 'C[C@@H](C(=O)O)N']['ALA', 'SMILES', 'OpenEye OEToolkits', '1.5.0', 'CC(C(=O)O)N']['ALA', 'InChI', 'InChI', '1.03', 'InChI=1S/C3H7NO2/c1-2(4)3(5)6/h2H,4H2,1H3,(H,5,6)/t2-/m0/s1']['ALA', 'InChIKey', 'InChI', '1.03', 'QNAYBMKLOCPYGJ-REOHCLBHSA-N']* The components file is rather large, so parsing time might be a little long at times. On Fri, Oct 27, 2023 at 10:55 AM He, Amy <he...@bu...> wrote: > Dear RDKit experts, > > > > I need your advice on finding a source Smiles library for reference, to > build the template molecule from Smiles for AssignBondOrdersFromTemplate > <https://www.rdkit.org/docs/source/rdkit.Chem.AllChem.html>. > > > > I am using AssignBondOrdersFromTemplate to perceive bonds in a > residue-wise manner from an input PDB, using a reference Smiles library > like this: > > > > ref_smi = { > > > > "ALA": "NC(C)C(=O)", > > "GLY": "NCC(=O)", > > "ILE": "NC(C(C)CC)C(=O)", > > > > } > > > I wonder if there has been an open reference library for common amino > acids and ligands that present in PDB files. A previous post on > rdkit-discuss ( > https://rdkit-discuss.narkive.com/JM2IGLQz/pdb-reader-and-bond-perception) > points me to this website: > > ftp://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/files/pdb.tar.gz > > and useful links from > > http://www.ebi.ac.uk/pdbe-srv/pdbechem/ > > > > But I am no longer able to access the contents. > > > > I guess we could always generate Smiles from the standardized SDF files.. > Still I am wondering if there is an existing Smiles library (like a > reference datafile), where we can retrieve the Smiles string using the > residue names of common amino acids and maybe also ligands. > > > > Any comments or suggestions would be greatly appreciated. Thank you for > your time and kind support in advance! > > > > > > Bests, > > > > > > -- > > Amy He > > Chemistry Graduate Teaching Assistant > > Hadad Lab > > Ohio State University > > he...@os... > > > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: He, A. <he...@bu...> - 2023-10-27 15:53:53
|
Dear RDKit experts, I need your advice on finding a source Smiles library for reference, to build the template molecule from Smiles for AssignBondOrdersFromTemplate<https://www.rdkit.org/docs/source/rdkit.Chem.AllChem.html>. I am using AssignBondOrdersFromTemplate to perceive bonds in a residue-wise manner from an input PDB, using a reference Smiles library like this: ref_smi = { "ALA": "NC(C)C(=O)", "GLY": "NCC(=O)", "ILE": "NC(C(C)CC)C(=O)", } I wonder if there has been an open reference library for common amino acids and ligands that present in PDB files. A previous post on rdkit-discuss (https://rdkit-discuss.narkive.com/JM2IGLQz/pdb-reader-and-bond-perception) points me to this website: ftp://ftp.ebi.ac.uk/pub/databases/msd/pdbechem/files/pdb.tar.gz and useful links from http://www.ebi.ac.uk/pdbe-srv/pdbechem/ But I am no longer able to access the contents. I guess we could always generate Smiles from the standardized SDF files.. Still I am wondering if there is an existing Smiles library (like a reference datafile), where we can retrieve the Smiles string using the residue names of common amino acids and maybe also ligands. Any comments or suggestions would be greatly appreciated. Thank you for your time and kind support in advance! Bests, -- Amy He Chemistry Graduate Teaching Assistant Hadad Lab Ohio State University he...@os...<mailto:he...@os...> |
From: Wang S. <shu...@gm...> - 2023-10-26 16:08:49
|
Hi all, If I create a molecule from smiles: ``` from rdkit import Chem mol = Chem.MolFromSmiles("C") for atm in mol.GetAtoms(): print(atm.GetPDBResidueInfo()) ``` then the pdb residue info for each atom is None However, if I created a molecule from pdb, is it possible to delete the pdb residue info associated to each atom, so that None is returned still? For illustrative purposes: ``` from rdkit import Chem from rdkit.Chem import AllChem mol = Chem.MolFromSmiles("C") mol = AllChem.AssignBondOrdersFromTemplate( mol, Chem.MolFromPDBBlock(Chem.MolToPDBBlock(mol)) ) ## !! This would instead return an "ValueError: MonomerInfo is not a PDB Residue" for atm in mol.GetAtoms(): atm.SetMonomerInfo(Chem.AtomMonomerInfo()) print(atm.GetPDBResidueInfo()) ``` Is it possible to set PDBResidueInfo to None? Or will I have to work around it by say writing out an intermediate sdf file and read it back in? Thank you |
From: Greg L. <gre...@gm...> - 2023-10-25 04:52:24
|
I'm not sure exactly what you're looking for, but all of the code for reading and writing SMILES is here: https://github.com/rdkit/rdkit/tree/master/Code/GraphMol/SmilesParse -greg On Tue, Oct 24, 2023 at 11:51 AM Eduardo Mayo <edu...@gm...> wrote: > Hello all, > > I hope you all are doing well. > > I am struggling trying to find the code where all the smile to mol and mol > to smile translation happens. Can someone point me in the right direction? > > kind regards, > eduardo > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |