rdkit-discuss Mailing List for RDKit (Page 9)
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
You can subscribe to this list here.
| 2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
(27) |
Oct
(4) |
Nov
(20) |
Dec
(4) |
| 2008 |
Jan
(12) |
Feb
(2) |
Mar
(23) |
Apr
(40) |
May
(30) |
Jun
(6) |
Jul
(35) |
Aug
(60) |
Sep
(31) |
Oct
(33) |
Nov
(35) |
Dec
(3) |
| 2009 |
Jan
(16) |
Feb
(77) |
Mar
(88) |
Apr
(57) |
May
(33) |
Jun
(27) |
Jul
(55) |
Aug
(26) |
Sep
(12) |
Oct
(45) |
Nov
(42) |
Dec
(23) |
| 2010 |
Jan
(64) |
Feb
(17) |
Mar
(30) |
Apr
(55) |
May
(30) |
Jun
(65) |
Jul
(112) |
Aug
(26) |
Sep
(67) |
Oct
(20) |
Nov
(67) |
Dec
(23) |
| 2011 |
Jan
(57) |
Feb
(43) |
Mar
(50) |
Apr
(66) |
May
(95) |
Jun
(73) |
Jul
(64) |
Aug
(47) |
Sep
(22) |
Oct
(56) |
Nov
(51) |
Dec
(34) |
| 2012 |
Jan
(64) |
Feb
(45) |
Mar
(65) |
Apr
(85) |
May
(76) |
Jun
(47) |
Jul
(75) |
Aug
(72) |
Sep
(31) |
Oct
(77) |
Nov
(61) |
Dec
(41) |
| 2013 |
Jan
(68) |
Feb
(63) |
Mar
(36) |
Apr
(73) |
May
(61) |
Jun
(69) |
Jul
(98) |
Aug
(60) |
Sep
(74) |
Oct
(102) |
Nov
(92) |
Dec
(63) |
| 2014 |
Jan
(112) |
Feb
(84) |
Mar
(72) |
Apr
(59) |
May
(96) |
Jun
(54) |
Jul
(91) |
Aug
(54) |
Sep
(38) |
Oct
(47) |
Nov
(33) |
Dec
(39) |
| 2015 |
Jan
(41) |
Feb
(115) |
Mar
(66) |
Apr
(87) |
May
(63) |
Jun
(53) |
Jul
(61) |
Aug
(59) |
Sep
(115) |
Oct
(42) |
Nov
(60) |
Dec
(20) |
| 2016 |
Jan
(52) |
Feb
(72) |
Mar
(100) |
Apr
(125) |
May
(61) |
Jun
(106) |
Jul
(62) |
Aug
(74) |
Sep
(151) |
Oct
(151) |
Nov
(117) |
Dec
(148) |
| 2017 |
Jan
(106) |
Feb
(75) |
Mar
(106) |
Apr
(67) |
May
(85) |
Jun
(144) |
Jul
(53) |
Aug
(73) |
Sep
(188) |
Oct
(106) |
Nov
(118) |
Dec
(74) |
| 2018 |
Jan
(96) |
Feb
(43) |
Mar
(40) |
Apr
(111) |
May
(77) |
Jun
(112) |
Jul
(64) |
Aug
(85) |
Sep
(73) |
Oct
(117) |
Nov
(97) |
Dec
(47) |
| 2019 |
Jan
(63) |
Feb
(112) |
Mar
(109) |
Apr
(61) |
May
(51) |
Jun
(41) |
Jul
(57) |
Aug
(68) |
Sep
(47) |
Oct
(126) |
Nov
(117) |
Dec
(96) |
| 2020 |
Jan
(84) |
Feb
(82) |
Mar
(80) |
Apr
(100) |
May
(78) |
Jun
(68) |
Jul
(76) |
Aug
(69) |
Sep
(76) |
Oct
(73) |
Nov
(69) |
Dec
(42) |
| 2021 |
Jan
(44) |
Feb
(30) |
Mar
(85) |
Apr
(65) |
May
(41) |
Jun
(72) |
Jul
(55) |
Aug
(9) |
Sep
(44) |
Oct
(44) |
Nov
(30) |
Dec
(40) |
| 2022 |
Jan
(35) |
Feb
(29) |
Mar
(55) |
Apr
(30) |
May
(31) |
Jun
(27) |
Jul
(49) |
Aug
(15) |
Sep
(17) |
Oct
(25) |
Nov
(15) |
Dec
(40) |
| 2023 |
Jan
(32) |
Feb
(10) |
Mar
(10) |
Apr
(21) |
May
(33) |
Jun
(31) |
Jul
(12) |
Aug
(17) |
Sep
(14) |
Oct
(12) |
Nov
(8) |
Dec
(12) |
| 2024 |
Jan
(10) |
Feb
(18) |
Mar
(7) |
Apr
(4) |
May
(6) |
Jun
(4) |
Jul
(5) |
Aug
(6) |
Sep
(8) |
Oct
(1) |
Nov
(1) |
Dec
|
| 2025 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(2) |
Sep
(3) |
Oct
(2) |
Nov
|
Dec
|
|
From: Jarod Y. <jar...@ho...> - 2023-05-15 18:55:21
|
Can one calculate average molecular weight on an ROMol or RWMol in the C++ implementation of RDKit? If yes, how? Sent from my iPhone |
|
From: עמית ה. <ami...@gm...> - 2023-05-14 09:07:37
|
Hi,
I tried to find a substructure (MCS) between molecules, and put the results
in grid-image. In the output, there is a missing bond in the aromatic
rings, but when I print just one substructure alone it prints this
correctly...
def defult():
l = 5
ligs = []
common_atoms_matrix = []
#make a list from the ligands that we insert to the code
for i in range(1, int(l)+1):
ligand = "ligand_" + str(i) + ".pdb"
ligs.append(ligand)
for i in range(len(ligs)):
mol_object = Chem.MolFromPDBFile(ligs[i])
Chem.SanitizeMol(mol_object)
ligs[i] = mol_object
return ligs
ligs = defult()
ls = []
for a, i in enumerate(ligs):
ks = []
for b, j in enumerate(ligs):
l = rdFMCS.FindMCS([i, j],
bondCompare=rdFMCS. BondCompare.CompareAny).smartsString
l = Chem.MolFromSmarts(l)
#Chem.SanitizeMol(l)
ls.append(l)
Draw.MolsToGridImage(ls, molsPerRow=5, subImgSize=(250,250))
[image: צילום מסך 2023-05-11 132315]
<https://user-images.githubusercontent.com/133215803/237793539-4211b7ec-4020-41a6-8769-23c21505461d.png>
Here is the code to print just one substructure:
l = rdFMCS.FindMCS([Chem.MolFromPDBFile("ligand_5.pdb"),
Chem.MolFromPDBFile("ligand_4.pdb")],
bondCompare=rdFMCS.BondCompare.CompareAny).smartsString
l = Chem.MolFromSmarts(l)
IPythonConsole.drawOptions.addBondIndices = True
l
[image: צילום מסך 2023-05-11 132559]
<https://user-images.githubusercontent.com/133215803/237793698-e417ce3d-d755-4f65-a549-fdef05056309.png>
Why is it happening? and how can I solve this problem?
Thank you,
Amit
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
ללא
וירוסים.www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#m_8337968732212653344_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
|
|
From: Geoffrey H. <geo...@gm...> - 2023-05-12 08:29:00
|
> > I am a little bit annoyed by the fact that the UFF energy > is not even negative and that the two FFs disagree by so much. > I would not consider this to be a big problem, IMHO. Do the two FFs compute something different? > Yes, very much so. UFF, for example does not include electrostatic non-bonded terms, and MMFF94 does. Additionally, MMFF94 includes a specific term for hydrogen bonding interactions. I consider force field energies to only be relevant in relative terms. That is, taking the relative difference of energy of relative conformers using a force field is more useful than the absolute value. (And particularly comparing between numbers from UFF vs. MMFF94 vs. GAFF, etc.) Even then, while force fields are useful for “quick cleanup” as my group showed (https://doi.org/10.1002/qua.26381) the relative energy rankings of UFF and MMFF94 are still crude relative to ML or quantum methods. In other words, the question is “does optimizing a conformer with UFF or MMFF94 help?” Yes. In general, conformers generated using RDKit (or Open Babel .. or most methods I’ve seen) have lower RMSD when optimized by a force field. Hope that helps, -Geoff --- Prof. Geoffrey Hutchison Department of Chemistry University of Pittsburgh tel: (412) 648-0492 email: ge...@pi... twitter: @ghutchis web: https://hutchisonlab.org/ |
|
From: Francois B. <ml...@li...> - 2023-05-12 02:07:13
|
Dear list,
I am a little bit worried.
If I start from the conformer given at the end of this message,
MMFF94 gives me a minimized conformer energy of -122.528
while UFF gives me 69.098 (I assume, both are in kcal/mol).
I am a little bit annoyed by the fact that the UFF energy
is not even negative and that the two FFs disagree by so much.
Are the two FFs using different units?
Do the two FFs compute something different?
I used the following python code for conformer minimization:
---
def minimize_conformer(ff, mol):
# ligand is supposed to be already properly protonated
# for given pH and in 3D
assert(mol.GetNumConformers() == 1)
conv_ene = []
if ff == "uff":
conv_ene = AllChem.UFFOptimizeMoleculeConfs(mol)
elif ff == "mmff":
conv_ene = AllChem.MMFFOptimizeMoleculeConfs(mol)
else:
print("minimize_conformer: unsupported FF: %s" % ff,
file=sys.stderr)
assert(False)
not_converged, ene = conv_ene[0]
assert(not_converged == 0)
return (mol, ene)
---
--- mol2
@<TRIPOS>MOLECULE
caffeine
24 25 0 0 0
SMALL
USER_CHARGES
@<TRIPOS>ATOM
1 C1 5.1112 0.7768 -0.9264 C.2 1 <0>
0.0365
2 C2 3.6441 2.3853 -1.2367 C.2 1 <0>
-0.2366
3 C3 3.0449 1.2362 -0.7981 C.2 1 <0>
0.2902
4 C4 2.9523 3.5995 -1.5265 C.2 1 <0>
0.7150
5 C5 0.8629 2.2904 -0.8430 C.2 1 <0>
0.6900
6 C6 6.0477 2.9539 -1.7310 C.3 1 <0>
0.2556
7 C7 1.0831 -0.0921 -0.1245 C.3 1 <0>
0.3001
8 C8 0.7590 4.6603 -1.5747 C.3 1 <0>
0.3001
9 N1 3.9557 0.2327 -0.6044 N.2 1 <0>
-0.5653
10 N2 4.9734 2.0838 -1.3171 N.pl3 1 <0>
0.0476
11 N3 1.6743 1.1555 -0.5931 N.am 1 <0>
-0.4231
12 N4 1.5557 3.4679 -1.3036 N.am 1 <0>
-0.4201
13 O1 3.5018 4.6256 -1.9190 O.2 1 <0>
-0.5700
14 O2 -0.3616 2.2696 -0.6756 O.2 1 <0>
-0.5700
15 H1 6.0715 0.2803 -0.8973 H 1 <0>
0.1500
16 H2 6.1690 2.8492 -2.8124 H 1 <0>
0.0000
17 H3 5.7997 3.9871 -1.4724 H 1 <0>
0.0000
18 H4 6.9664 2.6656 -1.2124 H 1 <0>
0.0000
19 H5 1.0630 -0.0986 0.9753 H 1 <0>
0.0000
20 H6 0.0567 -0.1790 -0.5104 H 1 <0>
0.0000
21 H7 1.6835 -0.9404 -0.4850 H 1 <0>
0.0000
22 H8 0.4310 4.6518 -2.6246 H 1 <0>
0.0000
23 H9 -0.1216 4.6705 -0.9156 H 1 <0>
0.0000
24 H10 1.3672 5.5577 -1.3880 H 1 <0>
0.0000
@<TRIPOS>BOND
1 1 9 2
2 1 10 1
3 2 3 2
4 2 4 1
5 2 10 1
6 3 9 1
7 3 11 1
8 4 12 am
9 4 13 2
10 5 11 am
11 5 12 am
12 5 14 2
13 6 10 1
14 7 11 1
15 8 12 1
16 1 15 1
17 6 16 1
18 6 17 1
19 6 18 1
20 7 19 1
21 7 20 1
22 7 21 1
23 8 22 1
24 8 23 1
25 8 24 1
---
Regards,
F.
|
|
From: pgchem p. <pg...@tu...> - 2023-05-11 19:24:16
|
> Joos Kiener <joo...@gm...> hat am 11.05.2023 16:11 CEST geschrieben: > > > Hi Ernst-Georg, > > maybe you are running into the same issue as I was: > > https://github.com/rdkit/rdkit/discussions/6148#discussioncomment-5450102 > > You have to explicitly tell python where the dlls are: > > os.add_dll_directory(r"C:\path\to\rdkit\lib") > > before importing rdkit. > > From python documentation: > > DLL dependencies for extension modules and DLLs loaded with ctypes https://docs.python.org/3/library/ctypes.html#module-ctypes on Windows are now resolved more securely. Only the system paths, the directory containing the DLL or PYD file, and directories added with add_dll_directory() https://docs.python.org/3/library/os.html#os.add_dll_directory are searched for load-time dependencies. Specifically, PATH and the current working directory are no longer used, and modifications to these will no longer have any effect on normal DLL resolution. > > Best Regards, > > Joos > > Hello Joos, that was the problem indeed. Thank you very much for your help. best regards Ernst-Georg |
|
From: Joos K. <joo...@gm...> - 2023-05-11 14:12:10
|
Hi Ernst-Georg, maybe you are running into the same issue as I was: https://github.com/rdkit/rdkit/discussions/6148#discussioncomment-5450102 You have to explicitly tell python where the dlls are: os.add_dll_directory(r"C:\path\to\rdkit\lib") before importing rdkit. >From python documentation: DLL dependencies for extension modules and DLLs loaded with ctypes <https://docs.python.org/3/library/ctypes.html#module-ctypes> on Windows are now resolved more securely. Only the system paths, the directory containing the DLL or PYD file, and directories added with add_dll_directory() <https://docs.python.org/3/library/os.html#os.add_dll_directory> are searched for load-time dependencies. *Specifically, PATH and the current working directory are no longer used, and modifications to these will no longer have any effect on normal DLL resolution*. Best Regards, Joos > > ---------- Forwarded message ---------- > From: pgchem pgchem <pg...@tu...> > To: "rdk...@li..." < > rdk...@li...> > Cc: > Bcc: > Date: Wed, 10 May 2023 16:32:31 +0200 (CEST) > Subject: [Rdkit-discuss] Building RDKit on Windows. Only static libraries > produced? How to make the Python wrapper work? > Hello all, > > I'm currently trying to build RDKit 2023_03_1 on Windows 11 with Visual > Studio 2022 Community. The building itself works: > > - No errors during the build > - I get a working extension against PostgreSQL 15.2 > - I get working static libraries for Visual Studio (RDGeneral.lib etc.) > - I get a rdkit package for Python 3.10 > > BUT this does not work: > > from rdkit import Chem > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "C:\RDKit\lib\site-packages\rdkit\__init__.py", line 6, in <module> > from . import rdBase > ImportError: DLL load failed while importing rdBase: The specified module > could not be found. > > I am pretty sure, that the environment is ok, I have checked with the > dependency walker that rdBase.pyd sees everything it needs. > > However, when I take a look at the rdkit package from PyPI, there are a > lot of *.dlls. while my build only produces static libraries (*.lib). > > When I compare the sizes of rdBase.pyd from PyPI and mine, mine is much > larger, so I assume that it also is statically linked - so it should work?, > but it doesn't. > > Any pointers to what I am doing wrong here? > > This is my cmake command: > > c:/cmake/bin/cmake -DRDK_BUILD_PYTHON_WRAPPERS=ON > -DBOOST_ROOT=C:/Devel/RDBuild/boost -DRDK_BUILD_CAIRO_SUPPORT=ON > -DRDK_BUILD_INCHI_SUPPORT=ON -DRDK_BUILD_AVALON_SUPPORT=ON > -DRDK_BUILD_PGSQL=ON -DPostgreSQL_ROOT="C:\PostgreSQL\15" > -DRDK_INSTALL_INTREE=OFF -DCMAKE_INSTALL_PREFIX=c:/RDKit > -DEIGEN3_INCLUDE_DIR=C:/Devel/RDBuild/eigen3 > -DFREETYPE_INCLUDE_DIRS=c:/Devel/RDbuild/freetype/include > -DFREETYPE_LIBRARY="c:/Devel/RDBuild/freetype/release > dll/win64/freetype.lib" -DZLIB_INCLUDE_DIR=c:/Devel/RDBuild/zlib/include > -DZLIB_LIBRARY=c:/Devel/RDBuild/zlib/libz.lib > -DCAIRO_INCLUDE_DIRS=c:/Devel/RDBuild/cairo/include > -DCAIRO_LIBRARIES=c:/Devel/RDBuild/cairo/lib/x64/cairo.lib > -DRDK_BUILD_FREETYPE_SUPPORT=ON -DRDK_BUILD_COMPRESSED_SUPPLIERS=ON > -G"Visual Studio 17 2022" -A x64 .. > > and the MSBuild command: > > & "C:\Program Files\Microsoft Visual > Studio\2022\Community\MSBuild\Current\Bin\MSBuild.exe" > /p:Configuration=Release INSTALL.vcxproj > > best regards > > Ernst-Georg > > > > ---------- Forwarded message ---------- > From: Ernst-Georg Schmid <pg...@tu...> > To: "rdk...@li..." < > rdk...@li...> > Cc: > Bcc: > Date: Thu, 11 May 2023 14:35:23 +0200 > Subject: Re: [Rdkit-discuss] Building RDKit on Windows. Only static > libraries produced? How to make the Python wrapper work? > Am 10.05.2023 um 16:32 schrieb pgchem pgchem: > > I'm currently trying to build RDKit 2023_03_1 on Windows 11 with Visual > > Studio 2022 Community. The building itself works: > > - No errors during the build > > - I get a working extension against PostgreSQL 15.2 > > - I get working static libraries for Visual Studio (RDGeneral.lib etc.) > > - I get a rdkit package for Python 3.10 > > BUT this does not work: > > from rdkit import Chem > > Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > File "C:\RDKit\lib\site-packages\rdkit\__init__.py", line 6, in <module> > > from . import rdBase > > ImportError: DLL load failed while importing rdBase: The specified > > module could not be found. > > Hello again, > > I have managed to build the shared objects also by now. I had to carve > this out of the azure pipelines, apparently it only works correctly if > you build like there, i.e. using cmake instead of MSBuild directly. > > E.g. > > C:\CMake\bin\cmake --build . --config=Release --target install > > instead of > > & "C:\Program Files\Microsoft Visual > Studio\2022\Community\MSBuild\Current\Bin\MSBuild.exe" > /p:Configuration=Release INSTALL.vcxproj (as the RDKit Book says) > > But the rdkit Python package still throws abovementioned error. I did > the same build on Ubuntu, and everything worked out of the box. I have > also checked various "solutions" from the WWW but to no avail. I cannot > see any unresolved dependencies for rdBase.pyd. Any helpful suggestions > or pointers are appreciated. > > best regards > > Ernst-Georg > > > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
|
From: Ernst-Georg S. <pg...@tu...> - 2023-05-11 12:35:40
|
Am 10.05.2023 um 16:32 schrieb pgchem pgchem: > I'm currently trying to build RDKit 2023_03_1 on Windows 11 with Visual > Studio 2022 Community. The building itself works: > - No errors during the build > - I get a working extension against PostgreSQL 15.2 > - I get working static libraries for Visual Studio (RDGeneral.lib etc.) > - I get a rdkit package for Python 3.10 > BUT this does not work: > from rdkit import Chem > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "C:\RDKit\lib\site-packages\rdkit\__init__.py", line 6, in <module> > from . import rdBase > ImportError: DLL load failed while importing rdBase: The specified > module could not be found. Hello again, I have managed to build the shared objects also by now. I had to carve this out of the azure pipelines, apparently it only works correctly if you build like there, i.e. using cmake instead of MSBuild directly. E.g. C:\CMake\bin\cmake --build . --config=Release --target install instead of & "C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Current\Bin\MSBuild.exe" /p:Configuration=Release INSTALL.vcxproj (as the RDKit Book says) But the rdkit Python package still throws abovementioned error. I did the same build on Ubuntu, and everything worked out of the box. I have also checked various "solutions" from the WWW but to no avail. I cannot see any unresolved dependencies for rdBase.pyd. Any helpful suggestions or pointers are appreciated. best regards Ernst-Georg |
|
From: pgchem p. <pg...@tu...> - 2023-05-10 14:44:44
|
Hello all, I'm currently trying to build RDKit 2023_03_1 on Windows 11 with Visual Studio 2022 Community. The building itself works: - No errors during the build - I get a working extension against PostgreSQL 15.2 - I get working static libraries for Visual Studio (RDGeneral.lib etc.) - I get a rdkit package for Python 3.10 BUT this does not work: from rdkit import Chem Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\RDKit\lib\site-packages\rdkit\__init__.py", line 6, in <module> from . import rdBase ImportError: DLL load failed while importing rdBase: The specified module could not be found. I am pretty sure, that the environment is ok, I have checked with the dependency walker that rdBase.pyd sees everything it needs. However, when I take a look at the rdkit package from PyPI, there are a lot of *.dlls. while my build only produces static libraries (*.lib). When I compare the sizes of rdBase.pyd from PyPI and mine, mine is much larger, so I assume that it also is statically linked - so it should work?, but it doesn't. Any pointers to what I am doing wrong here? This is my cmake command: c:/cmake/bin/cmake -DRDK_BUILD_PYTHON_WRAPPERS=ON -DBOOST_ROOT=C:/Devel/RDBuild/boost -DRDK_BUILD_CAIRO_SUPPORT=ON -DRDK_BUILD_INCHI_SUPPORT=ON -DRDK_BUILD_AVALON_SUPPORT=ON -DRDK_BUILD_PGSQL=ON -DPostgreSQL_ROOT="C:\PostgreSQL\15" -DRDK_INSTALL_INTREE=OFF -DCMAKE_INSTALL_PREFIX=c:/RDKit -DEIGEN3_INCLUDE_DIR=C:/Devel/RDBuild/eigen3 -DFREETYPE_INCLUDE_DIRS=c:/Devel/RDbuild/freetype/include -DFREETYPE_LIBRARY="c:/Devel/RDBuild/freetype/release dll/win64/freetype.lib" -DZLIB_INCLUDE_DIR=c:/Devel/RDBuild/zlib/include -DZLIB_LIBRARY=c:/Devel/RDBuild/zlib/libz.lib -DCAIRO_INCLUDE_DIRS=c:/Devel/RDBuild/cairo/include -DCAIRO_LIBRARIES=c:/Devel/RDBuild/cairo/lib/x64/cairo.lib -DRDK_BUILD_FREETYPE_SUPPORT=ON -DRDK_BUILD_COMPRESSED_SUPPLIERS=ON -G"Visual Studio 17 2022" -A x64 .. and the MSBuild command: & "C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Current\Bin\MSBuild.exe" /p:Configuration=Release INSTALL.vcxproj best regards Ernst-Georg |
|
From: Francois B. <ml...@li...> - 2023-05-10 01:42:27
|
Hello,
Maybe you can use this:
Chem.MolToSmiles(mol, allHsExplicit=True)
This will place each heavy atom between '[' and ']' and give you the
number
of hydrogens for each.
It get easier to work with SMILES strings after this (you don't need
anymore
a full blown SMILES parser).
Regards,
F.
On 09/05/2023 14:55, Haijun Feng wrote:
> [1]
>
> Hi All,
>
> I am trying to add atom numbers in smiles as belows,
>
> from rdkit import Chem
> mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1')
> for i, atom in enumerate(mol.GetAtoms()):
> atom.SetProp('molAtomMapNumber',str(i))
> smi=Chem.MolToSmiles(mol)
> print(smi)
>
> the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8]
>
> then I want to split the smiles into atoms, I did it like this:
>
> from rdkit import Chem
> mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1')
> for i, atom in enumerate(mol.GetAtoms()):
> atom.SetProp('molAtomMapNumber',str(i))
> print(i,atom.GetSymbol())
>
> the output is:
>
> 0 C
> 1 C
> 2 C
> 3 C
> 4 C
> 5 C
> 6 C
> 7 N
> 8 O
>
> But what I do want is something like this (with fragments instead of
> atoms):
>
> 0 cH
> 1 CH
> ...
> 7 NH2
> 8 O
>
> Can anyone help me figure out how to get each atom with H from the
> smiles as above. Thanks so much!
>
> best,
>
> Hal
>
> Links:
> ------
> [1] https://stackoverflow.com/posts/76197437/timeline
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
|
|
From: Andrew D. <da...@da...> - 2023-05-09 12:55:26
|
On May 9, 2023, at 07:55, Haijun Feng <hai...@gm...> wrote:
> Can anyone help me figure out how to get each atom with H from the smiles as above. Thanks so much!
Try using Chem.MolFragmentToSmiles to get the SMILES for each atom, with all hydrogens explicit, then strip off the leading and trailing []s.
from rdkit import Chem
mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
atom_smi = Chem.MolFragmentToSmiles(mol, allHsExplicit=True, atomsToUse=[atom.GetIdx()])
print(i, atom_smi.strip("[]"))
This prints
0 cH
1 cH
2 cH
3 cH
4 cH
5 c
6 C
7 NH2
8 O
Your code showed you using
atom.SetProp('molAtomMapNumber',str(i))
In the following, I'll set that property *after* getting the atom SMILES, so the map is not included as part of the output:
from rdkit import Chem
mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
atom_smi = Chem.MolFragmentToSmiles(mol, allHsExplicit=True, atomsToUse=[atom.GetIdx()])
print(i, atom_smi.strip("[]"))
atom.SetIntProp("molAtomMapNumber", i)
print(Chem.MolToSmiles(mol))
which gives the output
0 cH
1 cH
2 cH
3 cH
4 cH
5 c
6 C
7 NH2
8 O
[cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1[C:6]([NH2:7])=[O:8]
> the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8]
For what it's worth, I get the slightly different:
[cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1[C:6]([NH2:7])=[O:8]
You should be aware that the input order and the output SMILES order might be different.
Because of the simpler structure of your preferred output SMILES format, you can alternatively extract the atom terms from the output string by looking for the substrings inside of the []s, as in the following:
import re
>>> re.compile(r'\[[^]]+\]').findall("[cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1[C:6]([NH2:7])=[O:8]")
['[cH:0]', '[cH:1]', '[cH:2]', '[cH:3]', '[cH:4]', '[c:5]', '[C:6]', '[NH2:7]', '[O:8]']
This list will exactly match the output SMILES atom order.
Cheers,
Andrew
da...@da...
|
|
From: Wim D. <wim...@gm...> - 2023-05-09 09:43:01
|
Hi,
I think if you simply need H and the H count appended it is by far the
easiest by just appending it to the symbol string. See the codeblock below:
def get_symbol_with_Hs(a):
symbol=a.GetSymbol()
charge=a.GetFormalCharge()
hcount=a.GetTotalNumHs()
if hcount > 0:
symbol+="H"
if hcount > 1:
symbol+=str(hcount)
if charge==1:
symbol+="+"
if charge==-1:
symbol+="-"
if charge > 1:
symbol+=f"(+{charge})"
if charge < -1:
symbol+=f"(-{charge})"
return symbol
mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
atom.SetProp('molAtomMapNumber',str(i))
print(i,get_symbol_with_Hs(atom))
-----
another way I would recommend is using smiles and explicit hydrogens (i.e.
bracketed) instead. For your use case I would imagine this as follows:
from rdkit import Chem
mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1')
mol=Chem.AddHs(mol)
rwmol=Chem.RWMol(mol)
for b in list(rwmol.GetBonds()):
ba=b.GetBeginAtom()
ea=b.GetEndAtom()
if ba.GetAtomicNum()!=1 and ea.GetAtomicNum()!=1:
rwmol.RemoveBond(ba.GetIdx(),ea.GetIdx())
frags=Chem.GetMolFrags(rwmol, asMols=True,sanitizeFrags=False)
for i,f in enumerate(frags):
print(i,Chem.MolToSmiles(f))
this would output
0 [H]c
1 [H]c
2 [H]c
3 [H]c
4 [H]c
5 c
6 C
7 [H]N[H]
8 O
i hope that helps.
best wishes
wim
On Tue, May 9, 2023 at 7:58 AM Haijun Feng <hai...@gm...> wrote:
>
> <https://stackoverflow.com/posts/76197437/timeline>
>
> Hi All,
>
> I am trying to add atom numbers in smiles as belows,
>
> from rdkit import Chem
> mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1')
> for i, atom in enumerate(mol.GetAtoms()):
> atom.SetProp('molAtomMapNumber',str(i))
> smi=Chem.MolToSmiles(mol)
> print(smi)
>
> the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8]
>
> then I want to split the smiles into atoms, I did it like this:
>
> from rdkit import Chem
> mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1')
> for i, atom in enumerate(mol.GetAtoms()):
> atom.SetProp('molAtomMapNumber',str(i))
> print(i,atom.GetSymbol())
>
> the output is:
>
> 0 C
> 1 C
> 2 C
> 3 C
> 4 C
> 5 C
> 6 C
> 7 N
> 8 O
>
> *But what I do want is something like this (with fragments instead of
> atoms): *
>
>
>
>
>
>
> *0 cH1 CH...7 NH28 O *
>
> Can anyone help me figure out how to get each atom with H from the smiles
> as above. Thanks so much!
>
>
> best,
>
>
> Hal
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
|
|
From: Santiago F. <san...@me...> - 2023-05-09 07:03:31
|
Thank you for your answer.
I will try the new version.
Regards
Santiago
________________________________
De: David Cosgrove <dav...@gm...>
Enviado: jueves, 4 de mayo de 2023 11:29
Para: Santiago Fraga <san...@me...>
Cc: Wim Dehaen <wim...@gm...>; RDKit Discuss <rdk...@li...>
Asunto: Re: [Rdkit-discuss] Molfile from smiles
As part of the work on improving the way RDKit handles organometallics that is in the latest release, there is MolOps::cleanUpOrganometallics, which attempts to do the bond transformations in a similar way to that gist. The intention was that this would be part of the default sanitization, but late in the day it was discovered that it didn't work well with compounds with 2 metal atoms and bridging chlorine atoms, such as 'F[Pd]1(Cl)Cl->[Pd](Cl)(Cl)<-Cl1'. It's my intention to fix that at some point in the near future, but in the meantime if you're working in C++ it is available for use with caveats. Worth a try in this case, perhaps.
Dave
On Thu, May 4, 2023 at 9:51 AM Santiago Fraga <san...@me...<mailto:san...@me...>> wrote:
Good morning Wim
Yes, I know that the original smiles has problems with the dative bonds.
I am trying to load the molecule and then fix those bonds using this solution:
https://gist.github.com/greglandrum/6cd7aadcdedb1ebcafa9537e8a47e3a4
And then generate a new molfile. I will try to apply your code to see if I can improve the molecule
depiction.
Regards
Santiago
[http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg]
[http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com>
SANTIAGO FRAGA
Software Developer
san...@me...<mailto:+san...@me...>
MESTRELAB RESEARCH S.L.
PHONE +34881976775
FAX +34981941079
Feliciano Barrera, 9B-Bajo 15706
Santiago de Compostela (SPAIN)
Follow us:
[Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/>
________________________________
De: Wim Dehaen <wim...@gm...<mailto:wim...@gm...>>
Enviado: martes, 2 de mayo de 2023 21:37
Para: Santiago Fraga <san...@me...<mailto:san...@me...>>
Cc: Ling Chan <lin...@gm...<mailto:lin...@gm...>>; RDKit Discuss <rdk...@li...<mailto:rdk...@li...>>
Asunto: Re: [Rdkit-discuss] Molfile from smiles
Hi all,
unfortunately I can't offer a "fix" but I can offer these minor comments:
-it seems like the SMILES has some parsing error. You can make uses of RDKits extension for dative bonds in SMILES ("->") and replace the SMILES with the below, which will parse, and give (what i assume is) the intended structure:
"C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1"
-more fundamentally, i think the reason this molecule is hard to render is because, as a hexavalent iridium complex it is more fundamentally 3-dimensional and therefore tougher to sketch. you can see here on wikipedia Ir(ppy)3 even when manually sketched looks a bit funny:
https://upload.wikimedia.org/wikipedia/commons/c/c8/Ir%28ppy%29_Schematic.png
-in general, organometallic species have various limitations when it comes to their handling by cheminformatics packages. for this reason, some care is needed when dealing with species like this to make sure you won't have issues down the line. an overview of some rdkit related ones see this presentation by prof jan jensen: https://raw.githubusercontent.com/rdkit/UGM_2020/master/Presentations/JanJensen.pdf
Finally, if i embed the molecule and then display its 2D projection, it actually looks pretty good (despite a warning UFF doesnt recognize iridium). See below:
[image.png]
This was generated using the following codeblock (in Python, not C++, sorry for that):
mol = Chem.MolFromSmiles("C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1",sanitize=True)
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol,randomSeed=0xf00d)
mol = Chem.RemoveHs(mol)
display(mol)
best wishes
wim
On Tue, May 2, 2023 at 5:06 PM Santiago Fraga <san...@me...<mailto:san...@me...>> wrote:
Thanks for your answer, Ling Chan.
But I am already using that option with the C++ API.
Regards
Santiago
[http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg]
[http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com>
SANTIAGO FRAGA
Software Developer
san...@me...<mailto:+san...@me...>
MESTRELAB RESEARCH S.L.
PHONE +34881976775
FAX +34981941079
Feliciano Barrera, 9B-Bajo 15706
Santiago de Compostela (SPAIN)
Follow us:
[Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/>
________________________________
De: Ling Chan <lin...@gm...<mailto:lin...@gm...>>
Enviado: martes, 2 de mayo de 2023 4:15
Para: Santiago Fraga <san...@me...<mailto:san...@me...>>
Cc: RDKit Discuss <rdk...@li...<mailto:rdk...@li...>>
Asunto: Re: [Rdkit-discuss] Molfile from smiles
Hello Santiago,
In case you are still looking for an answer, somewhere in my notes I wrote the following.
to get a better depiction of complicated topology, do this before rendering.
from rdkit.Chem import rdDepictor
rdDepictor.SetPreferCoordGen(True)
Sometimes it helps. Good luck.
Ling
Santiago Fraga <san...@me...<mailto:san...@me...>> 於 2023年4月21日週五 上午2:17寫道:
Good morning
I am trying to generate a molfile from smiles, using the RDKit C++ implementation.
But in some cases the result molfile is like the one in the attached image.
My code is something like this:
string molecule = "C1=CC2=[N](C=C1)[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]4C=C1";
RDKit::ROMol* mol = RDKit::SmilesToMol(molecule, 0, false, nullptr);
mol->updatePropertyCache(false);
RDDepict::preferCoordGen = true;
RDDepict::compute2DCoords(*mol);
string molfile = RDKit::MolToMolBlock(*mol, true, -1, false, true)
How could I fix the molfile?
Regards
Santiago
_______________________________________________
Rdkit-discuss mailing list
Rdk...@li...<mailto:Rdk...@li...>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdk...@li...<mailto:Rdk...@li...>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdk...@li...<mailto:Rdk...@li...>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
|
|
From: Haijun F. <hai...@gm...> - 2023-05-09 05:55:55
|
<https://stackoverflow.com/posts/76197437/timeline> Hi All, I am trying to add atom numbers in smiles as belows, from rdkit import Chem mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1') for i, atom in enumerate(mol.GetAtoms()): atom.SetProp('molAtomMapNumber',str(i)) smi=Chem.MolToSmiles(mol) print(smi) the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8] then I want to split the smiles into atoms, I did it like this: from rdkit import Chem mol=Chem.MolFromSmiles('c1ccccc(C(N)=O)1') for i, atom in enumerate(mol.GetAtoms()): atom.SetProp('molAtomMapNumber',str(i)) print(i,atom.GetSymbol()) the output is: 0 C 1 C 2 C 3 C 4 C 5 C 6 C 7 N 8 O *But what I do want is something like this (with fragments instead of atoms): * *0 cH1 CH...7 NH28 O * Can anyone help me figure out how to get each atom with H from the smiles as above. Thanks so much! best, Hal |
|
From: David C. <dav...@gm...> - 2023-05-04 09:29:51
|
As part of the work on improving the way RDKit handles organometallics that is in the latest release, there is MolOps::cleanUpOrganometallics, which attempts to do the bond transformations in a similar way to that gist. The intention was that this would be part of the default sanitization, but late in the day it was discovered that it didn't work well with compounds with 2 metal atoms and bridging chlorine atoms, such as 'F[Pd]1(Cl)Cl->[Pd](Cl)(Cl)<-Cl1'. It's my intention to fix that at some point in the near future, but in the meantime if you're working in C++ it is available for use with caveats. Worth a try in this case, perhaps. Dave On Thu, May 4, 2023 at 9:51 AM Santiago Fraga <san...@me...> wrote: > Good morning Wim > Yes, I know that the original smiles has problems with the dative > bonds. > I am trying to load the molecule and then fix those bonds using this > solution: > https://gist.github.com/greglandrum/6cd7aadcdedb1ebcafa9537e8a47e3a4 > > And then generate a new molfile. I will try to apply your code to > see if I can improve the molecule > depiction. > > Regards > Santiago > > <http://www.mestrelab.com> > > SANTIAGO FRAGA > *Software Developer* > san...@me... <+san...@me...> > > *MESTRELAB RESEARCH S.L.* > PHONE *+34881976775* > FAX *+34981941079* > Feliciano Barrera, 9B-Bajo 15706 > Santiago de Compostela (SPAIN) > > Follow us: > [image: Mestrelab Twitter] <https://twitter.com/mestrelab> [image: > Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> > [image: Canal de YouTube Mestrelab] > <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [image: > MestreBlog] <http://mestrelab.com/blog/> > > > > ------------------------------ > *De:* Wim Dehaen <wim...@gm...> > *Enviado:* martes, 2 de mayo de 2023 21:37 > *Para:* Santiago Fraga <san...@me...> > *Cc:* Ling Chan <lin...@gm...>; RDKit Discuss < > rdk...@li...> > *Asunto:* Re: [Rdkit-discuss] Molfile from smiles > > Hi all, > unfortunately I can't offer a "fix" but I can offer these minor comments: > -it seems like the SMILES has some parsing error. You can make uses of > RDKits extension for dative bonds in SMILES ("->") and replace the SMILES > with the below, which will parse, and give (what i assume is) the intended > structure: > > "C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1" > -more fundamentally, i think the reason this molecule is hard to render is > because, as a hexavalent iridium complex it is more fundamentally > 3-dimensional and therefore tougher to sketch. you can see here on > wikipedia Ir(ppy)3 even when manually sketched looks a bit funny: > > https://upload.wikimedia.org/wikipedia/commons/c/c8/Ir%28ppy%29_Schematic.png > -in general, organometallic species have various limitations when it comes > to their handling by cheminformatics packages. for this reason, some care > is needed when dealing with species like this to make sure you won't have > issues down the line. an overview of some rdkit related ones see this > presentation by prof jan jensen: > https://raw.githubusercontent.com/rdkit/UGM_2020/master/Presentations/JanJensen.pdf > > Finally, if i embed the molecule and then display its 2D projection, it > actually looks pretty good (despite a warning UFF doesnt recognize > iridium). See below: > [image: image.png] > This was generated using the following codeblock (in Python, not C++, > sorry for that): > > mol = > Chem.MolFromSmiles("C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1",sanitize=True) > mol = Chem.AddHs(mol) > AllChem.EmbedMolecule(mol,randomSeed=0xf00d) > mol = Chem.RemoveHs(mol) > display(mol) > > best wishes > wim > > On Tue, May 2, 2023 at 5:06 PM Santiago Fraga <san...@me...> > wrote: > > Thanks for your answer, Ling Chan. > But I am already using that option with the C++ API. > > Regards > Santiago > > <http://www.mestrelab.com> > > SANTIAGO FRAGA > *Software Developer* > san...@me... <+san...@me...> > > *MESTRELAB RESEARCH S.L.* > PHONE *+34881976775* > FAX *+34981941079* > Feliciano Barrera, 9B-Bajo 15706 > Santiago de Compostela (SPAIN) > > Follow us: > [image: Mestrelab Twitter] <https://twitter.com/mestrelab> [image: > Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> > [image: Canal de YouTube Mestrelab] > <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [image: > MestreBlog] <http://mestrelab.com/blog/> > > > > ------------------------------ > *De:* Ling Chan <lin...@gm...> > *Enviado:* martes, 2 de mayo de 2023 4:15 > *Para:* Santiago Fraga <san...@me...> > *Cc:* RDKit Discuss <rdk...@li...> > *Asunto:* Re: [Rdkit-discuss] Molfile from smiles > > Hello Santiago, > > In case you are still looking for an answer, somewhere in my notes I wrote > the following. > > to get a better depiction of complicated topology, do this before > rendering. > from rdkit.Chem import rdDepictor > rdDepictor.SetPreferCoordGen(True) > > Sometimes it helps. Good luck. > > Ling > > > > Santiago Fraga <san...@me...> 於 2023年4月21日週五 上午2:17寫道: > > Good morning > > I am trying to generate a molfile from smiles, using the RDKit > C++ implementation. > But in some cases the result molfile is like the one in the > attached image. > > My code is something like this: > > string molecule = > "C1=CC2=[N](C=C1)[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]4C=C1"; > RDKit::ROMol* mol = RDKit::SmilesToMol(molecule, 0, false, nullptr); > mol->updatePropertyCache(false); > RDDepict::preferCoordGen = true; > RDDepict::compute2DCoords(*mol); > string molfile = RDKit::MolToMolBlock(*mol, true, -1, false, true) > > > How could I fix the molfile? > > Regards > Santiago > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- David Cosgrove Freelance computational chemistry and chemoinformatics developer http://cozchemix.co.uk |
|
From: Santiago F. <san...@me...> - 2023-05-04 08:48:44
|
Good morning Wim
Yes, I know that the original smiles has problems with the dative bonds.
I am trying to load the molecule and then fix those bonds using this solution:
https://gist.github.com/greglandrum/6cd7aadcdedb1ebcafa9537e8a47e3a4
And then generate a new molfile. I will try to apply your code to see if I can improve the molecule
depiction.
Regards
Santiago
[http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg]
[http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com>
SANTIAGO FRAGA
Software Developer
san...@me...<mailto:%20s...@me...>
MESTRELAB RESEARCH S.L.
PHONE +34881976775
FAX +34981941079
Feliciano Barrera, 9B-Bajo 15706
Santiago de Compostela (SPAIN)
Follow us:
[Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/>
________________________________
De: Wim Dehaen <wim...@gm...>
Enviado: martes, 2 de mayo de 2023 21:37
Para: Santiago Fraga <san...@me...>
Cc: Ling Chan <lin...@gm...>; RDKit Discuss <rdk...@li...>
Asunto: Re: [Rdkit-discuss] Molfile from smiles
Hi all,
unfortunately I can't offer a "fix" but I can offer these minor comments:
-it seems like the SMILES has some parsing error. You can make uses of RDKits extension for dative bonds in SMILES ("->") and replace the SMILES with the below, which will parse, and give (what i assume is) the intended structure:
"C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1"
-more fundamentally, i think the reason this molecule is hard to render is because, as a hexavalent iridium complex it is more fundamentally 3-dimensional and therefore tougher to sketch. you can see here on wikipedia Ir(ppy)3 even when manually sketched looks a bit funny:
https://upload.wikimedia.org/wikipedia/commons/c/c8/Ir%28ppy%29_Schematic.png
-in general, organometallic species have various limitations when it comes to their handling by cheminformatics packages. for this reason, some care is needed when dealing with species like this to make sure you won't have issues down the line. an overview of some rdkit related ones see this presentation by prof jan jensen: https://raw.githubusercontent.com/rdkit/UGM_2020/master/Presentations/JanJensen.pdf
Finally, if i embed the molecule and then display its 2D projection, it actually looks pretty good (despite a warning UFF doesnt recognize iridium). See below:
[image.png]
This was generated using the following codeblock (in Python, not C++, sorry for that):
mol = Chem.MolFromSmiles("C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1",sanitize=True)
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol,randomSeed=0xf00d)
mol = Chem.RemoveHs(mol)
display(mol)
best wishes
wim
On Tue, May 2, 2023 at 5:06 PM Santiago Fraga <san...@me...<mailto:san...@me...>> wrote:
Thanks for your answer, Ling Chan.
But I am already using that option with the C++ API.
Regards
Santiago
[http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg]
[http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com>
SANTIAGO FRAGA
Software Developer
san...@me...<mailto:+san...@me...>
MESTRELAB RESEARCH S.L.
PHONE +34881976775
FAX +34981941079
Feliciano Barrera, 9B-Bajo 15706
Santiago de Compostela (SPAIN)
Follow us:
[Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/>
________________________________
De: Ling Chan <lin...@gm...<mailto:lin...@gm...>>
Enviado: martes, 2 de mayo de 2023 4:15
Para: Santiago Fraga <san...@me...<mailto:san...@me...>>
Cc: RDKit Discuss <rdk...@li...<mailto:rdk...@li...>>
Asunto: Re: [Rdkit-discuss] Molfile from smiles
Hello Santiago,
In case you are still looking for an answer, somewhere in my notes I wrote the following.
to get a better depiction of complicated topology, do this before rendering.
from rdkit.Chem import rdDepictor
rdDepictor.SetPreferCoordGen(True)
Sometimes it helps. Good luck.
Ling
Santiago Fraga <san...@me...<mailto:san...@me...>> 於 2023年4月21日週五 上午2:17寫道:
Good morning
I am trying to generate a molfile from smiles, using the RDKit C++ implementation.
But in some cases the result molfile is like the one in the attached image.
My code is something like this:
string molecule = "C1=CC2=[N](C=C1)[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]4C=C1";
RDKit::ROMol* mol = RDKit::SmilesToMol(molecule, 0, false, nullptr);
mol->updatePropertyCache(false);
RDDepict::preferCoordGen = true;
RDDepict::compute2DCoords(*mol);
string molfile = RDKit::MolToMolBlock(*mol, true, -1, false, true)
How could I fix the molfile?
Regards
Santiago
_______________________________________________
Rdkit-discuss mailing list
Rdk...@li...<mailto:Rdk...@li...>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdk...@li...<mailto:Rdk...@li...>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
|
|
From: Gustavo S. <gus...@gm...> - 2023-05-03 19:26:23
|
Hi Guys,
I'm sorry it took me this long to try it... But I could finally get to it,
and it works well now. Thanks for your help!
--
Gustavo Seabra.
On Tue, Apr 11, 2023 at 3:19 AM Jan Halborg Jensen <jhj...@ch...>
wrote:
> Hi Gustavo
>
> raw_mol = Chem.MolFromXYZFile('acetate.xyz')
> mol = Chem.Mol(raw_mol)
> rdDetermineBonds.DetermineBonds(mol,charge=-1)
>
> Best regards, Jan
>
> On 7 Apr 2023, at 22.57, Gustavo Seabra <gus...@gm...> wrote:
>
> Hi everyone,
>
> I'm having difficulties using RDKit to read molecules from an XYZ file,
> and I would really appreciate some help.
>
> The problem is that whenever i read a molecule from an XYZ file, I get
> just a disconnected clump of atoms, not a molecule. For example: the
> following code:
>
> import rdkit
> from rdkit import Chem
> from rdkit.Chem import Draw, rdmolfiles
> mol = Chem.MolFromSmiles('COC1=C(O)C[C@@](O)(CO)CC1=O')
> mol = Chem.AddHs(mol)
> mol
>
> <image.png>
>
> Chem.AllChem.EmbedMolecule(mol)
> Chem.MolToXYZFile(mol, "rdkit_mol.xyz")
> mol2 = Chem.MolFromXYZFile('rdkit_mol.xyz')
> mol2
> <image.png>
> Is there a bug on the XYZ code, or am I missing something?
>
> Thanks!
> --
> Gustavo Seabra.
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
>
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&data=05%7C01%7Cjhjensen%40chem.ku.dk%7Ca747765687134eda68a708db37ab1ba1%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C638164980266752900%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FKeB%2FR%2FQzRDYIe9zpZfKMqbjNYULOH4VQ5jhfJmxK6I%3D&reserved=0
>
>
>
|
|
From: Wim D. <wim...@gm...> - 2023-05-02 19:38:08
|
Hi all,
unfortunately I can't offer a "fix" but I can offer these minor comments:
-it seems like the SMILES has some parsing error. You can make uses of
RDKits extension for dative bonds in SMILES ("->") and replace the SMILES
with the below, which will parse, and give (what i assume is) the intended
structure:
"C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1"
-more fundamentally, i think the reason this molecule is hard to render is
because, as a hexavalent iridium complex it is more fundamentally
3-dimensional and therefore tougher to sketch. you can see here on
wikipedia Ir(ppy)3 even when manually sketched looks a bit funny:
https://upload.wikimedia.org/wikipedia/commons/c/c8/Ir%28ppy%29_Schematic.png
-in general, organometallic species have various limitations when it comes
to their handling by cheminformatics packages. for this reason, some care
is needed when dealing with species like this to make sure you won't have
issues down the line. an overview of some rdkit related ones see this
presentation by prof jan jensen:
https://raw.githubusercontent.com/rdkit/UGM_2020/master/Presentations/JanJensen.pdf
Finally, if i embed the molecule and then display its 2D projection, it
actually looks pretty good (despite a warning UFF doesnt recognize
iridium). See below:
[image: image.png]
This was generated using the following codeblock (in Python, not C++, sorry
for that):
mol =
Chem.MolFromSmiles("C1=CC2=[N](C=C1)->[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]->1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]->4C=C1",sanitize=True)
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol,randomSeed=0xf00d)
mol = Chem.RemoveHs(mol)
display(mol)
best wishes
wim
On Tue, May 2, 2023 at 5:06 PM Santiago Fraga <san...@me...>
wrote:
> Thanks for your answer, Ling Chan.
> But I am already using that option with the C++ API.
>
> Regards
> Santiago
>
> <http://www.mestrelab.com>
>
> SANTIAGO FRAGA
> *Software Developer*
> san...@me... <+san...@me...>
>
> *MESTRELAB RESEARCH S.L.*
> PHONE *+34881976775*
> FAX *+34981941079*
> Feliciano Barrera, 9B-Bajo 15706
> Santiago de Compostela (SPAIN)
>
> Follow us:
> [image: Mestrelab Twitter] <https://twitter.com/mestrelab> [image:
> Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research>
> [image: Canal de YouTube Mestrelab]
> <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [image:
> MestreBlog] <http://mestrelab.com/blog/>
>
>
>
> ------------------------------
> *De:* Ling Chan <lin...@gm...>
> *Enviado:* martes, 2 de mayo de 2023 4:15
> *Para:* Santiago Fraga <san...@me...>
> *Cc:* RDKit Discuss <rdk...@li...>
> *Asunto:* Re: [Rdkit-discuss] Molfile from smiles
>
> Hello Santiago,
>
> In case you are still looking for an answer, somewhere in my notes I wrote
> the following.
>
> to get a better depiction of complicated topology, do this before
> rendering.
> from rdkit.Chem import rdDepictor
> rdDepictor.SetPreferCoordGen(True)
>
> Sometimes it helps. Good luck.
>
> Ling
>
>
>
> Santiago Fraga <san...@me...> 於 2023年4月21日週五 上午2:17寫道:
>
> Good morning
>
> I am trying to generate a molfile from smiles, using the RDKit
> C++ implementation.
> But in some cases the result molfile is like the one in the
> attached image.
>
> My code is something like this:
>
> string molecule =
> "C1=CC2=[N](C=C1)[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]4C=C1";
> RDKit::ROMol* mol = RDKit::SmilesToMol(molecule, 0, false, nullptr);
> mol->updatePropertyCache(false);
> RDDepict::preferCoordGen = true;
> RDDepict::compute2DCoords(*mol);
> string molfile = RDKit::MolToMolBlock(*mol, true, -1, false, true)
>
>
> How could I fix the molfile?
>
> Regards
> Santiago
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
|
|
From: Santiago F. <san...@me...> - 2023-05-02 14:59:44
|
Thanks for your answer, Ling Chan. But I am already using that option with the C++ API. Regards Santiago [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/line3.jpg] [http://www.mestrelab.com/mestrelab/wp-content/uploads/signs/M-red-200pxb.jpg]<http://www.mestrelab.com> SANTIAGO FRAGA Software Developer san...@me...<mailto:%20s...@me...> MESTRELAB RESEARCH S.L. PHONE +34881976775 FAX +34981941079 Feliciano Barrera, 9B-Bajo 15706 Santiago de Compostela (SPAIN) Follow us: [Mestrelab Twitter]<https://twitter.com/mestrelab> [Mestrelab Linkedin] <https://www.linkedin.com/company/mestrelab-research> [Canal de YouTube Mestrelab] <https://www.youtube.com/channel/UCf3MVnd3XZflv0acvTv14ww> [MestreBlog] <http://mestrelab.com/blog/> ________________________________ De: Ling Chan <lin...@gm...> Enviado: martes, 2 de mayo de 2023 4:15 Para: Santiago Fraga <san...@me...> Cc: RDKit Discuss <rdk...@li...> Asunto: Re: [Rdkit-discuss] Molfile from smiles Hello Santiago, In case you are still looking for an answer, somewhere in my notes I wrote the following. to get a better depiction of complicated topology, do this before rendering. from rdkit.Chem import rdDepictor rdDepictor.SetPreferCoordGen(True) Sometimes it helps. Good luck. Ling Santiago Fraga <san...@me...<mailto:san...@me...>> 於 2023年4月21日週五 上午2:17寫道: Good morning I am trying to generate a molfile from smiles, using the RDKit C++ implementation. But in some cases the result molfile is like the one in the attached image. My code is something like this: string molecule = "C1=CC2=[N](C=C1)[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]4C=C1"; RDKit::ROMol* mol = RDKit::SmilesToMol(molecule, 0, false, nullptr); mol->updatePropertyCache(false); RDDepict::preferCoordGen = true; RDDepict::compute2DCoords(*mol); string molfile = RDKit::MolToMolBlock(*mol, true, -1, false, true) How could I fix the molfile? Regards Santiago _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |
|
From: Ling C. <lin...@gm...> - 2023-05-02 02:16:03
|
Hello Santiago, In case you are still looking for an answer, somewhere in my notes I wrote the following. to get a better depiction of complicated topology, do this before rendering. from rdkit.Chem import rdDepictor rdDepictor.SetPreferCoordGen(True) Sometimes it helps. Good luck. Ling Santiago Fraga <san...@me...> 於 2023年4月21日週五 上午2:17寫道: > Good morning > > I am trying to generate a molfile from smiles, using the RDKit > C++ implementation. > But in some cases the result molfile is like the one in the > attached image. > > My code is something like this: > > string molecule = > "C1=CC2=[N](C=C1)[Ir]134(C5=CC=CC=C25)C2=CC=CC=C2C2=[N]1C=CC=C2.C1=CC(C2=CC=CC=C32)=[N]4C=C1"; > RDKit::ROMol* mol = RDKit::SmilesToMol(molecule, 0, false, nullptr); > mol->updatePropertyCache(false); > RDDepict::preferCoordGen = true; > RDDepict::compute2DCoords(*mol); > string molfile = RDKit::MolToMolBlock(*mol, true, -1, false, true) > > > How could I fix the molfile? > > Regards > Santiago > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
|
From: Thomas <odi...@gm...> - 2023-05-01 10:51:46
|
Thank you WIm for your clarification: so the library is not inferring the
valence, it's "choosing" one.
I think I've found the solution to my issue: I should probably use
rdkit.MolFromSmarts()
mol = chem.rdkit.MolFromSmarts('CCS=O')
chem.rdkit.MolToSmiles(mol)
'CCS=O'
Sometimes, just explaining your problem to others, helps finding the
solution.
Thomas
Il giorno sab 29 apr 2023 alle ore 20:45 Wim Dehaen <wim...@gm...>
ha scritto:
> THe reason for this is that it will prevent ambiguities due to
> nonstandard, higher valences. Because of this, it is not possible to infer
> the implicit hydrogen count, so it must be specified explicitly. For S and
> P the standard valence would be 2 and 3 respectively, just like for O and
> N. But S has nonstandard valences available: 4 and 6 as in sulfones and
> sulfoxides. P can commonly have valence of 5, as in phosphoranes.
> Your provided SMILES has a valence of at least 3, exceeding the standard
> valence of 2. This creates and ambiguity, where the SMILES parser has to
> decide whether the S has a valence of 4 or 6. Likewise, with the SMILES
> "FP(F)(F)F" a roundtrip through rdkit will convert this into
> "F[PH](F)(F)F", this means the notation is consistent with F[PH2](F)F and
> distinguishable from FP(F)F. In general when higher valence states are not
> possible rdkit will throw a valence error but there are some more examples
> available. For example "CIC" will become C[IH]C.
>
> best wishes
> wim
>
>
> On Sat, Apr 29, 2023 at 12:20 PM Thomas <odi...@gm...> wrote:
>
>> I am not a chemist, so it can be a silly question, but I am interested in
>> the logic behind it, also because other libraries (like OpenBabel) behave
>> differently.
>>
>> Why sometimes RDKit writes hydrogens explicitly?
>>
>> mol = rdkit.MolFromSmiles('CCS=O', sanitize=False)
>> rdkit.MolToSmiles(mol)
>> 'CC[SH]=O'
>>
>> The input SMILES is intended as a pattern, not a molecule. I make a mol
>> out of it only to get the canonical SMILES, that will be then used as
>> SMARTS.
>> Logically, I don't understand how the number of H attached to the S can
>> be "guessed" by the library, still it cannot be left implicit.
>>
>> Furthermore, I have seen this behaviour only with S and P. I was
>> wondering if it's a confined issue, or it can happen with any element.
>> Thank you
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdk...@li...
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
|
|
From: Wim D. <wim...@gm...> - 2023-04-29 18:45:28
|
THe reason for this is that it will prevent ambiguities due to nonstandard,
higher valences. Because of this, it is not possible to infer the implicit
hydrogen count, so it must be specified explicitly. For S and P the
standard valence would be 2 and 3 respectively, just like for O and N. But
S has nonstandard valences available: 4 and 6 as in sulfones and
sulfoxides. P can commonly have valence of 5, as in phosphoranes.
Your provided SMILES has a valence of at least 3, exceeding the standard
valence of 2. This creates and ambiguity, where the SMILES parser has to
decide whether the S has a valence of 4 or 6. Likewise, with the SMILES
"FP(F)(F)F" a roundtrip through rdkit will convert this into
"F[PH](F)(F)F", this means the notation is consistent with F[PH2](F)F and
distinguishable from FP(F)F. In general when higher valence states are not
possible rdkit will throw a valence error but there are some more examples
available. For example "CIC" will become C[IH]C.
best wishes
wim
On Sat, Apr 29, 2023 at 12:20 PM Thomas <odi...@gm...> wrote:
> I am not a chemist, so it can be a silly question, but I am interested in
> the logic behind it, also because other libraries (like OpenBabel) behave
> differently.
>
> Why sometimes RDKit writes hydrogens explicitly?
>
> mol = rdkit.MolFromSmiles('CCS=O', sanitize=False)
> rdkit.MolToSmiles(mol)
> 'CC[SH]=O'
>
> The input SMILES is intended as a pattern, not a molecule. I make a mol
> out of it only to get the canonical SMILES, that will be then used as
> SMARTS.
> Logically, I don't understand how the number of H attached to the S can be
> "guessed" by the library, still it cannot be left implicit.
>
> Furthermore, I have seen this behaviour only with S and P. I was wondering
> if it's a confined issue, or it can happen with any element.
> Thank you
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
|
|
From: Thomas <odi...@gm...> - 2023-04-29 10:17:47
|
I am not a chemist, so it can be a silly question, but I am interested in
the logic behind it, also because other libraries (like OpenBabel) behave
differently.
Why sometimes RDKit writes hydrogens explicitly?
mol = rdkit.MolFromSmiles('CCS=O', sanitize=False)
rdkit.MolToSmiles(mol)
'CC[SH]=O'
The input SMILES is intended as a pattern, not a molecule. I make a mol out
of it only to get the canonical SMILES, that will be then used as SMARTS.
Logically, I don't understand how the number of H attached to the S can be
"guessed" by the library, still it cannot be left implicit.
Furthermore, I have seen this behaviour only with S and P. I was wondering
if it's a confined issue, or it can happen with any element.
Thank you
|
|
From: Greg L. <gre...@gm...> - 2023-04-28 15:21:48
|
Hi Susan, The RDKit does not currently support SCSR. -greg On Fri, 28 Apr 2023 at 15:07, Susan Leung <sus...@gm...> wrote: > Hi all, > > I am trying to read in some Self-Contained Sequence Representation (SCSR) > structures > https://doi.org/10.1021/ci2001988 > > But I am encountering some issues. I just wanted to clarify, does RDKit > support this representation? > > Many thanks! > > Susan > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
|
From: Susan L. <sus...@gm...> - 2023-04-28 13:04:52
|
Hi all, I am trying to read in some Self-Contained Sequence Representation (SCSR) structures https://doi.org/10.1021/ci2001988 But I am encountering some issues. I just wanted to clarify, does RDKit support this representation? Many thanks! Susan |
|
From: Francois B. <ml...@li...> - 2023-04-26 08:48:42
|
Dear rdkiters, Is it possible to list all the torsion angles UFF parameters around single bonds out of rings (rotatable bonds) for a given molecule ? From what I found in the rdkit doc, it is (only?) possible to extract the Vjk value for four consecutive atoms indexed i j k l. But, Vjk is just one parameter (the torsion barrier in kcal/mol): for each torsion angle UFF also defines the multiplicity of the barrier (n_jk, an integer) and phi0 (the angle in degrees at which the barrier is 0), if I understand correctly. I am reading carefully the DREIDING and UFF papers, but I am not (yet?) sure I will be able to get that correctly. So, since rdkit has an UFF implementation, I wonder if it would not be safer to have just rdkit list for me all those torsions parameters for the molecule at hand. If rdkit cannot do that, I might post later a tentative solution so that another pair of eyes might tell me if I got this correctly. Regards, F. |