rdkit-discuss Mailing List for RDKit (Page 5)
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
(27) |
Oct
(4) |
Nov
(20) |
Dec
(4) |
2008 |
Jan
(12) |
Feb
(2) |
Mar
(23) |
Apr
(40) |
May
(30) |
Jun
(6) |
Jul
(35) |
Aug
(60) |
Sep
(31) |
Oct
(33) |
Nov
(35) |
Dec
(3) |
2009 |
Jan
(16) |
Feb
(77) |
Mar
(88) |
Apr
(57) |
May
(33) |
Jun
(27) |
Jul
(55) |
Aug
(26) |
Sep
(12) |
Oct
(45) |
Nov
(42) |
Dec
(23) |
2010 |
Jan
(64) |
Feb
(17) |
Mar
(30) |
Apr
(55) |
May
(30) |
Jun
(65) |
Jul
(112) |
Aug
(26) |
Sep
(67) |
Oct
(20) |
Nov
(67) |
Dec
(23) |
2011 |
Jan
(57) |
Feb
(43) |
Mar
(50) |
Apr
(66) |
May
(95) |
Jun
(73) |
Jul
(64) |
Aug
(47) |
Sep
(22) |
Oct
(56) |
Nov
(51) |
Dec
(34) |
2012 |
Jan
(64) |
Feb
(45) |
Mar
(65) |
Apr
(85) |
May
(76) |
Jun
(47) |
Jul
(75) |
Aug
(72) |
Sep
(31) |
Oct
(77) |
Nov
(61) |
Dec
(41) |
2013 |
Jan
(68) |
Feb
(63) |
Mar
(36) |
Apr
(73) |
May
(61) |
Jun
(69) |
Jul
(98) |
Aug
(60) |
Sep
(74) |
Oct
(102) |
Nov
(92) |
Dec
(63) |
2014 |
Jan
(112) |
Feb
(84) |
Mar
(72) |
Apr
(59) |
May
(96) |
Jun
(54) |
Jul
(91) |
Aug
(54) |
Sep
(38) |
Oct
(47) |
Nov
(33) |
Dec
(39) |
2015 |
Jan
(41) |
Feb
(115) |
Mar
(66) |
Apr
(87) |
May
(63) |
Jun
(53) |
Jul
(61) |
Aug
(59) |
Sep
(115) |
Oct
(42) |
Nov
(60) |
Dec
(20) |
2016 |
Jan
(52) |
Feb
(72) |
Mar
(100) |
Apr
(125) |
May
(61) |
Jun
(106) |
Jul
(62) |
Aug
(74) |
Sep
(151) |
Oct
(151) |
Nov
(117) |
Dec
(148) |
2017 |
Jan
(106) |
Feb
(75) |
Mar
(106) |
Apr
(67) |
May
(85) |
Jun
(144) |
Jul
(53) |
Aug
(73) |
Sep
(188) |
Oct
(106) |
Nov
(118) |
Dec
(74) |
2018 |
Jan
(96) |
Feb
(43) |
Mar
(40) |
Apr
(111) |
May
(77) |
Jun
(112) |
Jul
(64) |
Aug
(85) |
Sep
(73) |
Oct
(117) |
Nov
(97) |
Dec
(47) |
2019 |
Jan
(63) |
Feb
(112) |
Mar
(109) |
Apr
(61) |
May
(51) |
Jun
(41) |
Jul
(57) |
Aug
(68) |
Sep
(47) |
Oct
(126) |
Nov
(117) |
Dec
(96) |
2020 |
Jan
(84) |
Feb
(82) |
Mar
(80) |
Apr
(100) |
May
(78) |
Jun
(68) |
Jul
(76) |
Aug
(69) |
Sep
(76) |
Oct
(73) |
Nov
(69) |
Dec
(42) |
2021 |
Jan
(44) |
Feb
(30) |
Mar
(85) |
Apr
(65) |
May
(41) |
Jun
(72) |
Jul
(55) |
Aug
(9) |
Sep
(44) |
Oct
(44) |
Nov
(30) |
Dec
(40) |
2022 |
Jan
(35) |
Feb
(29) |
Mar
(55) |
Apr
(30) |
May
(31) |
Jun
(27) |
Jul
(49) |
Aug
(15) |
Sep
(17) |
Oct
(25) |
Nov
(15) |
Dec
(40) |
2023 |
Jan
(32) |
Feb
(10) |
Mar
(10) |
Apr
(21) |
May
(33) |
Jun
(31) |
Jul
(12) |
Aug
(17) |
Sep
(14) |
Oct
(12) |
Nov
(8) |
Dec
(12) |
2024 |
Jan
(10) |
Feb
(18) |
Mar
(7) |
Apr
(4) |
May
(6) |
Jun
(4) |
Jul
(5) |
Aug
(6) |
Sep
(8) |
Oct
(1) |
Nov
(1) |
Dec
|
2025 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Eduardo M. <edu...@gm...> - 2023-10-24 09:49:30
|
Hello all, I hope you all are doing well. I am struggling trying to find the code where all the smile to mol and mol to smile translation happens. Can someone point me in the right direction? kind regards, eduardo |
From: Jarod Y. <jar...@ho...> - 2023-10-20 11:48:19
|
Apologies for not posting code. I’m experiencing this error in a large class object and it occurs only a few times per thousand objects. In calling, RDKit::MolToSmiles() on an RWMol object (which is a member of the larger class object), sometimes it returns an empty string. However, I am able to perform other operations (e.g. in a Chemical Reaction). I am soliciting suggestions on how to approach debugging. Thanks, J |
From: Jeremy M. <je...@gm...> - 2023-10-09 23:56:31
|
Thanks, Diogo and Fio. That solved that problem. Jeremy On Sun, Oct 8, 2023 at 1:23 AM Diogo Martins <dio...@gm...> wrote: > Hi Jeremy, > > Chem.AddHs returns a new molecule, you could reassign the variable: > > mol = Chem.AddHs(mol) > > Best regards, > Diogo > > On Sat, Oct 7, 2023 at 9:36 PM Jeremy Monat <je...@gm...> wrote: > >> In Python, I'd like to iterate through all the atoms in a molecule, >> including hydrogens, so I can assign an isotope to each atom. I haven't >> been able to include hydrogens in the iterable of atoms: >> >> from rdkit import Chem >> >> mol = Chem.MolFromSmiles("CCO") # Example molecule: Ethanol (C2H5OH) >> >> # Add explicit hydrogens >> Chem.AddHs(mol) >> >> for atom in mol.GetAtoms(): >> print(f"Atom Symbol: {atom.GetSymbol()}") >> Output: >> Atom Symbol: C >> Atom Symbol: C >> Atom Symbol: O >> >> Similarly, mol.GetAtomWithIdx() works up to an index of only 3, giving >> C, C, and O atoms but no hydrogens. >> >> Thanks, >> Jeremy >> -- ~ -- ~ -- >> Jeremy Monat, PhD >> LinkedIn: http://www.linkedin.com/in/jemonat >> Portfolio: https://bertiewooster.github.io >> GitHub: https://github.com/bertiewooster >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdk...@li... >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Diogo M. <dio...@gm...> - 2023-10-08 05:21:42
|
Hi Jeremy, Chem.AddHs returns a new molecule, you could reassign the variable: mol = Chem.AddHs(mol) Best regards, Diogo On Sat, Oct 7, 2023 at 9:36 PM Jeremy Monat <je...@gm...> wrote: > In Python, I'd like to iterate through all the atoms in a molecule, > including hydrogens, so I can assign an isotope to each atom. I haven't > been able to include hydrogens in the iterable of atoms: > > from rdkit import Chem > > mol = Chem.MolFromSmiles("CCO") # Example molecule: Ethanol (C2H5OH) > > # Add explicit hydrogens > Chem.AddHs(mol) > > for atom in mol.GetAtoms(): > print(f"Atom Symbol: {atom.GetSymbol()}") > Output: > Atom Symbol: C > Atom Symbol: C > Atom Symbol: O > > Similarly, mol.GetAtomWithIdx() works up to an index of only 3, giving C, > C, and O atoms but no hydrogens. > > Thanks, > Jeremy > -- ~ -- ~ -- > Jeremy Monat, PhD > LinkedIn: http://www.linkedin.com/in/jemonat > Portfolio: https://bertiewooster.github.io > GitHub: https://github.com/bertiewooster > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Fiorella R. <rug...@gm...> - 2023-10-08 05:17:01
|
Hi Jeremy, iirc you have to write mol = Chem.AddHs(mol). In your code you are not keeping the object with the added Hs so there are no explicit Hs to find when you iterate. Cheers, Fio On Sat, Oct 7, 2023 at 9:36 PM Jeremy Monat <je...@gm...> wrote: > In Python, I'd like to iterate through all the atoms in a molecule, > including hydrogens, so I can assign an isotope to each atom. I haven't > been able to include hydrogens in the iterable of atoms: > > from rdkit import Chem > > mol = Chem.MolFromSmiles("CCO") # Example molecule: Ethanol (C2H5OH) > > # Add explicit hydrogens > Chem.AddHs(mol) > > for atom in mol.GetAtoms(): > print(f"Atom Symbol: {atom.GetSymbol()}") > Output: > Atom Symbol: C > Atom Symbol: C > Atom Symbol: O > > Similarly, mol.GetAtomWithIdx() works up to an index of only 3, giving C, > C, and O atoms but no hydrogens. > > Thanks, > Jeremy > -- ~ -- ~ -- > Jeremy Monat, PhD > LinkedIn: http://www.linkedin.com/in/jemonat > Portfolio: https://bertiewooster.github.io > GitHub: https://github.com/bertiewooster > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Jeremy M. <je...@gm...> - 2023-10-08 04:34:48
|
In Python, I'd like to iterate through all the atoms in a molecule, including hydrogens, so I can assign an isotope to each atom. I haven't been able to include hydrogens in the iterable of atoms: from rdkit import Chem mol = Chem.MolFromSmiles("CCO") # Example molecule: Ethanol (C2H5OH) # Add explicit hydrogens Chem.AddHs(mol) for atom in mol.GetAtoms(): print(f"Atom Symbol: {atom.GetSymbol()}") Output: Atom Symbol: C Atom Symbol: C Atom Symbol: O Similarly, mol.GetAtomWithIdx() works up to an index of only 3, giving C, C, and O atoms but no hydrogens. Thanks, Jeremy -- ~ -- ~ -- Jeremy Monat, PhD LinkedIn: http://www.linkedin.com/in/jemonat Portfolio: https://bertiewooster.github.io GitHub: https://github.com/bertiewooster |
From: Ling C. <lin...@gm...> - 2023-09-30 05:06:25
|
Thank you Andrew for the information. It is good to know that this is part of the standard. So I don't need to worry now. And I like the safety checking part of your code. Dan, I wrote my email because from the SD file definition documents that I could find, I did not see any mention of this. I could have overlooked. But if it really is not part of the definition, it is always possible to encounter I/O problems. And we have encountered several similar situations with non-conformed files and non-conformed parsers. I had to check the format definition to determine which (writer or reader side) customer support to write to. This is why I am careful now. Updating the software you use would not solve it. It's not a bug as far as the parsing software is concerned. Ling On Fri., Sep. 29, 2023, 10:07 Dan Nealschneider, < dan...@sc...> wrote: > I'd also be curious how the index is causing you problems. All SD reading > code that I know about ignores those suffixes. If you're not using RDKit to > read the SD file, maybe it would be best to update whatever it is you *are > *using to parse the file. > > dan nealschneider | senior staff developer > > *he/him/his* > > [image: Schrödinger, Inc.] <https://schrodinger.com/> > > > On Fri, Sep 29, 2023 at 1:08 AM Andrew Dalke <da...@da...> > wrote: > >> On Sep 26, 2023, at 01:17, Ling Chan <lin...@gm...> wrote: >> > > <pKa> (1) >> > 4.0999999 >> .. >> > Just wonder what was the rationale behind this extra "(1)" on the >> property field lines (pKa and logP in the above example)? >> > >> > And is there a way to get rid of these? I am not sure if this extra >> "(1)" is part of the standard sd format. >> >> RDKit uses the increasing value as a sort of per-file registry number. >> >> This is follows the part of the standard which says "External registry >> numbers must be enclosed in parentheses." >> >> The relevant code is in Code/GraphMol/FileParsers/SDWriter.cpp : >> >> if (d_molid >= 0) { >> (*dp_ostream) << "(" << d_molid + 1 << ") "; >> } >> >> There is no way to suppress this output. No only is there no direct way >> to change the d_molid, but d_molid cannot be negative as >> Code/GraphMol/FileParsers/MolWriters.h declares it as: >> >> unsigned int d_molid; // the number of the molecules we wrote so >> far >> >> >> Wim suggested a post-processing approach. Another is to write the SD data >> items yourself, that is, use MolToMolBlock() to generate the connection >> table/molfile as a string, then iterate through the properties and generate >> the data items. >> >> >> import sys >> from rdkit import Chem >> >> def MolToSDFRecord( >> mol, >> includeStereo: bool = True, >> confId: int = -1, >> kekulize: bool = True, >> forceV3000: bool = False): >> mol_block = Chem.MolToMolBlock(mol, includeStereo, confId, kekulize, >> forceV3000) >> >> lines = [] >> for prop_name in mol.GetPropNames(): >> if "\n" in prop_name or ">" in prop_name or "<" in prop_name: >> sys.stderr.write(f"WARNING: Skipping property {prop_name!r} >> because the " >> "name includes an unsupported character.\n") >> continue >> >> prop_value = mol.GetProp(prop_name) >> if "\n" in prop_value: >> if "\n\n" in prop_value or "\r\n\r\n" in prop_value: >> sys.stderr.write(f"WARNING: Skipping property >> {prop_name!r} because the " >> "value includes an embedded newline.\n") >> continue >> if prop_value.endswith("\r\n"): >> prop_value = prop_value[:-2] >> elif prop_value.endswith("\n"): >> prop_value = prop_value[:-1] >> >> lines.append(f"> <{prop_name}>\n{prop_value}\n\n") >> >> lines.append("$$$$\n") >> >> return mol_block + "".join(lines) >> >> mol = Chem.MolFromSmiles("CCO") >> mol.SetProp("pKa","3.3\r\n") >> print(MolToSDFRecord(mol)) >> >> >> Andrew >> da...@da... >> >> >> >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdk...@li... >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > |
From: Dan N. <dan...@sc...> - 2023-09-29 18:12:44
|
I'd also be curious how the index is causing you problems. All SD reading code that I know about ignores those suffixes. If you're not using RDKit to read the SD file, maybe it would be best to update whatever it is you *are *using to parse the file. dan nealschneider | senior staff developer *he/him/his* [image: Schrödinger, Inc.] <https://schrodinger.com/> On Fri, Sep 29, 2023 at 1:08 AM Andrew Dalke <da...@da...> wrote: > On Sep 26, 2023, at 01:17, Ling Chan <lin...@gm...> wrote: > > > <pKa> (1) > > 4.0999999 > .. > > Just wonder what was the rationale behind this extra "(1)" on the > property field lines (pKa and logP in the above example)? > > > > And is there a way to get rid of these? I am not sure if this extra > "(1)" is part of the standard sd format. > > RDKit uses the increasing value as a sort of per-file registry number. > > This is follows the part of the standard which says "External registry > numbers must be enclosed in parentheses." > > The relevant code is in Code/GraphMol/FileParsers/SDWriter.cpp : > > if (d_molid >= 0) { > (*dp_ostream) << "(" << d_molid + 1 << ") "; > } > > There is no way to suppress this output. No only is there no direct way to > change the d_molid, but d_molid cannot be negative as > Code/GraphMol/FileParsers/MolWriters.h declares it as: > > unsigned int d_molid; // the number of the molecules we wrote so far > > > Wim suggested a post-processing approach. Another is to write the SD data > items yourself, that is, use MolToMolBlock() to generate the connection > table/molfile as a string, then iterate through the properties and generate > the data items. > > > import sys > from rdkit import Chem > > def MolToSDFRecord( > mol, > includeStereo: bool = True, > confId: int = -1, > kekulize: bool = True, > forceV3000: bool = False): > mol_block = Chem.MolToMolBlock(mol, includeStereo, confId, kekulize, > forceV3000) > > lines = [] > for prop_name in mol.GetPropNames(): > if "\n" in prop_name or ">" in prop_name or "<" in prop_name: > sys.stderr.write(f"WARNING: Skipping property {prop_name!r} > because the " > "name includes an unsupported character.\n") > continue > > prop_value = mol.GetProp(prop_name) > if "\n" in prop_value: > if "\n\n" in prop_value or "\r\n\r\n" in prop_value: > sys.stderr.write(f"WARNING: Skipping property > {prop_name!r} because the " > "value includes an embedded newline.\n") > continue > if prop_value.endswith("\r\n"): > prop_value = prop_value[:-2] > elif prop_value.endswith("\n"): > prop_value = prop_value[:-1] > > lines.append(f"> <{prop_name}>\n{prop_value}\n\n") > > lines.append("$$$$\n") > > return mol_block + "".join(lines) > > mol = Chem.MolFromSmiles("CCO") > mol.SetProp("pKa","3.3\r\n") > print(MolToSDFRecord(mol)) > > > Andrew > da...@da... > > > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Andrew D. <da...@da...> - 2023-09-29 08:05:31
|
On Sep 26, 2023, at 01:17, Ling Chan <lin...@gm...> wrote: > > <pKa> (1) > 4.0999999 .. > Just wonder what was the rationale behind this extra "(1)" on the property field lines (pKa and logP in the above example)? > > And is there a way to get rid of these? I am not sure if this extra "(1)" is part of the standard sd format. RDKit uses the increasing value as a sort of per-file registry number. This is follows the part of the standard which says "External registry numbers must be enclosed in parentheses." The relevant code is in Code/GraphMol/FileParsers/SDWriter.cpp : if (d_molid >= 0) { (*dp_ostream) << "(" << d_molid + 1 << ") "; } There is no way to suppress this output. No only is there no direct way to change the d_molid, but d_molid cannot be negative as Code/GraphMol/FileParsers/MolWriters.h declares it as: unsigned int d_molid; // the number of the molecules we wrote so far Wim suggested a post-processing approach. Another is to write the SD data items yourself, that is, use MolToMolBlock() to generate the connection table/molfile as a string, then iterate through the properties and generate the data items. import sys from rdkit import Chem def MolToSDFRecord( mol, includeStereo: bool = True, confId: int = -1, kekulize: bool = True, forceV3000: bool = False): mol_block = Chem.MolToMolBlock(mol, includeStereo, confId, kekulize, forceV3000) lines = [] for prop_name in mol.GetPropNames(): if "\n" in prop_name or ">" in prop_name or "<" in prop_name: sys.stderr.write(f"WARNING: Skipping property {prop_name!r} because the " "name includes an unsupported character.\n") continue prop_value = mol.GetProp(prop_name) if "\n" in prop_value: if "\n\n" in prop_value or "\r\n\r\n" in prop_value: sys.stderr.write(f"WARNING: Skipping property {prop_name!r} because the " "value includes an embedded newline.\n") continue if prop_value.endswith("\r\n"): prop_value = prop_value[:-2] elif prop_value.endswith("\n"): prop_value = prop_value[:-1] lines.append(f"> <{prop_name}>\n{prop_value}\n\n") lines.append("$$$$\n") return mol_block + "".join(lines) mol = Chem.MolFromSmiles("CCO") mol.SetProp("pKa","3.3\r\n") print(MolToSDFRecord(mol)) Andrew da...@da... |
From: Ling C. <lin...@gm...> - 2023-09-28 23:35:11
|
Thank you Wim. I shall post-process the SDF as you suggested. Ling Wim Dehaen <wim...@gm...> 於 2023年9月25日週一 下午5:11寫道: > Why there is a counter between parentheses there, I don't know, but in > case there's no option to remove it, you might just manually remove it > using a regex to remove anything between parentheses on a line that starts > with > > for example: > > from rdkit import Chem > import re > from io import StringIO > m = Chem.MolFromSmiles("CCC") > m.SetProp("pKa","3.3") > sio = StringIO() > with Chem.SDWriter(sio) as o: > o.write(m) > sio.seek(0) > with open("temp3.sdf", "w") as f: > for line in sio.readlines(): > f.write(re.sub(r'^>(.*?)\((.*?)\)', r'>\1', line)) > > best wishes > wim > > On Tue, Sep 26, 2023 at 1:20 AM Ling Chan <lin...@gm...> wrote: > >> Dear Colleagues, >> >> I noticed that when writing out molecules using SDWriter() , the >> properties fields are followed by something like "(1)" , "(2)". I mean, the >> sdf looks like: >> >> propane >> RDKit 3D >> >> 3 2 0 0 0 0 0 0 0 0999 V2000 >> 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 >> 1.4280 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 >> 1.9040 1.3000 -0.3480 C 0 0 0 0 0 0 0 0 0 0 0 0 >> 1 2 1 0 >> 2 3 1 0 >> M END >> > <pKa> (1) >> 4.0999999 >> >> > <logP> (1) >> 2 >> >> $$$$ >> >> Just wonder what was the rationale behind this extra "(1)" on the >> property field lines (pKa and logP in the above example)? >> >> And is there a way to get rid of these? I am not sure if this extra "(1)" >> is part of the standard sd format. >> >> Thank you! >> >> Regards, >> Ling >> >> >> --------------------------------------------------------------------------------------------------- >> >> To create an sdf, you can do something like: >> >> >>> from rdkit import Chem >> >>> m = Chem.MolFromSmiles("CCC") >> >>> m.SetProp("pKa","3.3") >> >>> with Chem.SDWriter("temp3.sdf") as o: >> ... o.write(m) >> >> Or use Chem.SDMolSupplier() to get mols from another sdf. >> >> >> >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdk...@li... >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > |
From: Wim D. <wim...@gm...> - 2023-09-26 00:11:48
|
Why there is a counter between parentheses there, I don't know, but in case there's no option to remove it, you might just manually remove it using a regex to remove anything between parentheses on a line that starts with > for example: from rdkit import Chem import re from io import StringIO m = Chem.MolFromSmiles("CCC") m.SetProp("pKa","3.3") sio = StringIO() with Chem.SDWriter(sio) as o: o.write(m) sio.seek(0) with open("temp3.sdf", "w") as f: for line in sio.readlines(): f.write(re.sub(r'^>(.*?)\((.*?)\)', r'>\1', line)) best wishes wim On Tue, Sep 26, 2023 at 1:20 AM Ling Chan <lin...@gm...> wrote: > Dear Colleagues, > > I noticed that when writing out molecules using SDWriter() , the > properties fields are followed by something like "(1)" , "(2)". I mean, the > sdf looks like: > > propane > RDKit 3D > > 3 2 0 0 0 0 0 0 0 0999 V2000 > 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 1.4280 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 1.9040 1.3000 -0.3480 C 0 0 0 0 0 0 0 0 0 0 0 0 > 1 2 1 0 > 2 3 1 0 > M END > > <pKa> (1) > 4.0999999 > > > <logP> (1) > 2 > > $$$$ > > Just wonder what was the rationale behind this extra "(1)" on the property > field lines (pKa and logP in the above example)? > > And is there a way to get rid of these? I am not sure if this extra "(1)" > is part of the standard sd format. > > Thank you! > > Regards, > Ling > > > --------------------------------------------------------------------------------------------------- > > To create an sdf, you can do something like: > > >>> from rdkit import Chem > >>> m = Chem.MolFromSmiles("CCC") > >>> m.SetProp("pKa","3.3") > >>> with Chem.SDWriter("temp3.sdf") as o: > ... o.write(m) > > Or use Chem.SDMolSupplier() to get mols from another sdf. > > > > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Ling C. <lin...@gm...> - 2023-09-25 23:18:17
|
Dear Colleagues, I noticed that when writing out molecules using SDWriter() , the properties fields are followed by something like "(1)" , "(2)". I mean, the sdf looks like: propane RDKit 3D 3 2 0 0 0 0 0 0 0 0999 V2000 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.4280 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.9040 1.3000 -0.3480 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 2 3 1 0 M END > <pKa> (1) 4.0999999 > <logP> (1) 2 $$$$ Just wonder what was the rationale behind this extra "(1)" on the property field lines (pKa and logP in the above example)? And is there a way to get rid of these? I am not sure if this extra "(1)" is part of the standard sd format. Thank you! Regards, Ling --------------------------------------------------------------------------------------------------- To create an sdf, you can do something like: >>> from rdkit import Chem >>> m = Chem.MolFromSmiles("CCC") >>> m.SetProp("pKa","3.3") >>> with Chem.SDWriter("temp3.sdf") as o: ... o.write(m) Or use Chem.SDMolSupplier() to get mols from another sdf. |
From: Steven B. <sb...@za...> - 2023-09-22 03:38:33
|
Hi all, I’m working with several chemical structure sources and it seems like there are some differences between implementations of stereochemistry specification of bridged bicyclic compounds which leads to unlikely structures. For example, one such structure is: CC(C)(C)OC(=O)N1[C@]2([H])C[C@@](CC2)([H])[C@H]1C3Nc4c(ccc(c4)B5OC(C)(C)C(C)(C)O5)N=3 When I create a molecule from this string with RDKit, I get a rendering which indicates the structure is incorrectly specified (by opposite wedge directions for the hydrogens on the bridgehead carbons). The molecule additionally fails to produce a 3D conformer by `rdDistGeom.EmbedMolecule` (after AddHs). The fix seems to be to correct the original SMILES string by changing one of the bridgehead stereochemistry configurations, which then leads to same-wedge renderings and successful 3D conformer generation. My question is: where do these SMILES strings with “problematic” stereochemistry specifications originate? Are there software implementations of SMILES generation that are internally consistent but incompatible with RDKit’s internal consistency? Does such a disagreement in details originate from ambiguity in the SMILES specification? Best regards, Steve Brown |
From: Gianmarco G. <ghi...@gm...> - 2023-09-07 12:30:20
|
Hi Greg, Thanks for the info. I'll keep an eye on that then. Giammy On Thu, 7 Sept 2023 at 13:08, Greg Landrum <gre...@gm...> wrote: > Hi Giammy, > > We currently only have the Python implementation. Doing a C++ version is > on my ToDo list, but I'm not sure when we'll get there. > > best regards, > -greg > > > On Thu, Sep 7, 2023 at 1:17 PM Gianmarco Ghiandoni <ghi...@gm...> > wrote: > >> Hello all, >> >> I've been testing the Python module from rdkit.Chem import >> RegistrationHash for some time now and I would like to use it in Java >> too. I browsed the RDKit repository but I could not find it implemented in >> C++, and therefore, not available in the Java JARs. >> >> Am I missing it from somewhere else or is it just implemented in Python? >> >> Thanks, >> >> Giammy >> >> -- >> *Gianmarco* >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdk...@li... >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > -- *Gianmarco* |
From: Greg L. <gre...@gm...> - 2023-09-07 12:09:03
|
Hi Giammy, We currently only have the Python implementation. Doing a C++ version is on my ToDo list, but I'm not sure when we'll get there. best regards, -greg On Thu, Sep 7, 2023 at 1:17 PM Gianmarco Ghiandoni <ghi...@gm...> wrote: > Hello all, > > I've been testing the Python module from rdkit.Chem import > RegistrationHash for some time now and I would like to use it in Java > too. I browsed the RDKit repository but I could not find it implemented in > C++, and therefore, not available in the Java JARs. > > Am I missing it from somewhere else or is it just implemented in Python? > > Thanks, > > Giammy > > -- > *Gianmarco* > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Gianmarco G. <ghi...@gm...> - 2023-09-07 11:15:03
|
Hello all, I've been testing the Python module from rdkit.Chem import RegistrationHash for some time now and I would like to use it in Java too. I browsed the RDKit repository but I could not find it implemented in C++, and therefore, not available in the Java JARs. Am I missing it from somewhere else or is it just implemented in Python? Thanks, Giammy -- *Gianmarco* |
From: Ken <ke...@po...> - 2023-09-01 19:21:11
|
Hello, Is there a way to query a SQL database (through the cartridge) using a fingerprinting method other than the ones listed below? rdkit.morgan_fp_size : the size (in bits) of morgan fingerprints rdkit.featmorgan_fp_size : the size (in bits) of featmorgan fingerprints rdkit.layered_fp_size : the size (in bits) of layered fingerprints rdkit.rdkit_fp_size : the size (in bits) of RDKit fingerprints rdkit.torsion_fp_size : the size (in bits) of topological torsion bit vector fingerprints rdkit.atompair_fp_size : the size (in bits) of atom pair bit vector fingerprints rdkit.avalon_fp_size : the size (in bits) of avalon fingerprints I am interested in using the "pattern" method right now (rdkit.Chem.rdmolops.PatternFingerprint), but I would be interested in learning about a general method for implementing custom fingerprints in the RDKit cartridge. If such a thing is possible, I would also be interested in using similarity scoring methods other than Tanimoto and Dice at some point. Thanks in advance for your help. -Ken |
From: Ken <ke...@po...> - 2023-09-01 19:20:22
|
I just learned how to search the archives, and I found this well-titled resource: rdkit.blogspot.com/2017/04/using-custom-fingerprint-in-postgresql.html, which may have all the answers I need. Apologies for the (hopefully) unnecessary post. -Ken On Fri, Sep 1, 2023, at 12:03 PM, Ken wrote: > Hello, > > Is there a way to query a SQL database (through the cartridge) using a > fingerprinting method other than the ones listed below? > > rdkit.morgan_fp_size : the size (in bits) of morgan fingerprints > rdkit.featmorgan_fp_size : the size (in bits) of featmorgan fingerprints > rdkit.layered_fp_size : the size (in bits) of layered fingerprints > rdkit.rdkit_fp_size : the size (in bits) of RDKit fingerprints > rdkit.torsion_fp_size : the size (in bits) of topological torsion bit > vector fingerprints > rdkit.atompair_fp_size : the size (in bits) of atom pair bit vector > fingerprints > rdkit.avalon_fp_size : the size (in bits) of avalon fingerprints > > I am interested in using the "pattern" method right now > (rdkit.Chem.rdmolops.PatternFingerprint), but I would be interested in > learning about a general method for implementing custom fingerprints in > the RDKit cartridge. If such a thing is possible, I would also be > interested in using similarity scoring methods other than Tanimoto and > Dice at some point. > > Thanks in advance for your help. > -Ken |
From: Rafael L <raf...@us...> - 2023-09-01 06:04:40
|
Using rdReducedGraphs.GenerateMolExtendedReducedGraph() on RDKit mols obtained from certain SMILES (e.g. "O=C(OC3c1nccnc1C(=O)N3c2ncc(Cl)cc2)N4CCN(C)CC4") throws the following error: getNumImplicitHs() called without preceding call to calcImplicitValence() I tried removing sanitization/keeping hydrogens on conversion to mol, using Kekulized SMILES, mol.UpdatePropertyCache(), Chem.SanitizeMol()... but all I got were different errors, such as: - Can't kekulize mol - RingInfo not initialized I was finally able to solve all the errors by calling Chem.AddHs(mol) for all mols prior to calculation of the reduced graph. -- *Rafael da Fonseca Lameiro* PhD Student - Medicinal and Biological Chemistry Group (NEQUIMED) São Carlos Institute of Chemistry - University of São Paulo - Brazil [image: orcid logo 16px] https://orcid.org/0000-0003-4466-2682 |
From: Rafael L <raf...@us...> - 2023-09-01 05:26:39
|
However, now that I have visualized the graphs generated, I believe this was not an optimal solution, since the graphs do not look reduced at all. Also, the Ac/Ar/D/Hf nomenclature presented on the original ErG paper does not seem to be implemented in RDKit. For now, I believe it's best to stick with the fingerprints and forget about the representations On Fri, 1 Sept 2023 at 01:13, Rafael L <raf...@us...> wrote: > Using rdReducedGraphs.GenerateMolExtendedReducedGraph() on RDKit > mols obtained from certain SMILES > (e.g. "O=C(OC3c1nccnc1C(=O)N3c2ncc(Cl)cc2)N4CCN(C)CC4") throws the > following error: > getNumImplicitHs() called without preceding call to calcImplicitValence() > > I tried removing sanitization/keeping hydrogens on conversion to mol, > using Kekulized SMILES, mol.UpdatePropertyCache(), Chem.SanitizeMol()... > but all I got were different errors, such as: > - Can't kekulize mol > - RingInfo not initialized > > I was finally able to solve all the errors by calling Chem.AddHs(mol) for > all mols prior to calculation of the reduced graph. > > -- > *Rafael da Fonseca Lameiro* > PhD Student - Medicinal and Biological Chemistry Group (NEQUIMED) > São Carlos Institute of Chemistry - University of São Paulo - Brazil > [image: orcid logo 16px] https://orcid.org/0000-0003-4466-2682 > -- *Rafael da Fonseca Lameiro * PhD Student - Medicinal and Biological Chemistry Group (NEQUIMED) São Carlos Institute of Chemistry - University of São Paulo - Brazil [image: orcid logo 16px] https://orcid.org/0000-0003-4466-2682 |
From: Greg L. <gre...@gm...> - 2023-08-26 04:32:31
|
If you're willing to live with the RDKit's definition of bridgehead (see below), then there is built-in functionality you can use: from rdkit.Chem import rdqueries qa = rdqueries.IsBridgeheadQueryAtom() mol = Chem.MolFromSmiles('C1CC2CCC1C2') mol.GetAtomsMatchingQuery(qa) That last call returns a sequence with the matching atoms. The RDKit bridgehead definition: // at least three ring bonds, all ring bonds in a ring which shares at // least two bonds with another ring involving this atom is definitely not perfect, primarily because of the use of the ring systems, but it's the best that we were able to come up with while keeping things efficient. There's some discussion here https://github.com/rdkit/rdkit/pull/6061 and in the linked issue. -greg On Fri, Aug 25, 2023 at 11:23 PM Wim Dehaen <wim...@gm...> wrote: > greetings all, > i have thought about the problem some more, and in the end came to the > conclusion that looping through all rings really is necessary. In the gist > below you can see the adjusted code, making use of Pat Walters' method > <https://sourceforge.net/p/rdkit/mailman/message/30387811/> for finding > all rings. Apologies for the code being messy. > https://gist.github.com/dehaenw/41eb8e4c39c1158e88b36c6dfc2606d8 > fortunately, this one manages to also detect these difficult cases, see > below: > i did not check how fast it is, but i guess it will be a fair bit slower. > > best wishes, > wim > > On Fri, Aug 25, 2023 at 8:28 PM Wim Dehaen <wim...@gm...> wrote: > >> Dear Andreas, >> that's a good find. i agree the breaking case can be considered >> bridgehead structure, as it's essentially bicyclo-[3.2.1]-octane plus an >> extra bond. I need to think about this some more, but it might be related >> to getting the ringinfo as SSSR instead of exhaustively. The best solution >> may therefore be to just prune non ring atoms from the graph, enumerate all >> rings and check it really exhaustively. >> FWIW: rdMolDescriptors.CalcNumBridgeheadAtoms(mol) returns 0 for mol = >> Chem.MolFromSmiles("C1CC2C3C2C1C3") too, so this may be an rdkit bug on >> this end. >> best wishes >> wim >> >> On Fri, Aug 25, 2023 at 5:20 PM Andreas Luttens < >> and...@gm...> wrote: >> >>> Dear Wim, >>> >>> Thanks for your reply! >>> >>> Apologies for the delay, finally got time to pick up this project again. >>> >>> Your suggestion works great, though I have found some cases where it >>> breaks. For instance the molecule: >>> >>> mol = Chem.MolFromSmiles("C1CC2C3C2C1C3") >>> >>> It seems, in this case, a bridgehead atom is also a fused-ring atom. >>> Maybe these looped compounds have too complex topology for this type of >>> analysis. >>> >>> I don't see a straight way forward to identify just the bridgehead atoms. >>> >>> Best wishes, >>> Andreas >>> >>> On Sat, Dec 3, 2022 at 12:53 PM Wim Dehaen <wim...@gm...> wrote: >>> >>>> Hi Andreas, >>>> I don't have a good SMARTS pattern available for this but here is a >>>> function that should return bridgehead idx and not include non bridgehead >>>> fused ring atoms: >>>> >>>> ``` >>>> def return_bridgeheads_idx(mol): >>>> bh_list=[] >>>> intersections=[] >>>> sssr_idx = [set(x) for x in list(Chem.GetSymmSSSR(mol))] >>>> for i,ring1 in enumerate(sssr_idx): >>>> for j,ring2 in enumerate(sssr_idx): >>>> if i>j: >>>> intersections+=[ring1.intersection(ring2)] >>>> for iidx in intersections: >>>> if len(iidx)>2: #condition for bridgehead >>>> for idx in iidx: >>>> neighbors = [a.GetIdx() for a in >>>> mol.GetAtomWithIdx(idx).GetNeighbors()] >>>> bh_list+=[idx for nidx in neighbors if nidx not in iidx] >>>> return tuple(set(bh_list)) >>>> ``` >>>> >>>> Here are 6 test molecules: >>>> >>>> ``` >>>> mol1 = Chem.MolFromSmiles("C1CC2CCC1C2") >>>> mol2 = Chem.MolFromSmiles("C1CC2C1C1CCC2C1") >>>> mol3 = Chem.MolFromSmiles("N1(CC2)CCC2CC1") >>>> mol4 = Chem.MolFromSmiles("C1CCC12CCCCC2") >>>> mol5 = Chem.MolFromSmiles("C1CC2C1CCCCC2") >>>> mol6 = Chem.MolFromSmiles("C1CCC(C(CCC3)C23)C12") >>>> for mol in [mol1,mol2,mol3,mol4,mol5,mol6]: >>>> print(return_bridgeheads_idx(mol)) >>>> ``` >>>> >>>> giving the expected answer: >>>> >>>> (2, 5) >>>> (4, 7) >>>> (0, 5) >>>> () >>>> () >>>> () >>>> >>>> hope this is helpful! >>>> >>>> best wishes >>>> wim >>>> >>>> On Sat, Dec 3, 2022 at 8:34 AM Andreas Luttens < >>>> and...@gm...> wrote: >>>> >>>>> Dear users, >>>>> >>>>> I am trying to identify bridgehead atoms in multi-looped ring systems. >>>>> The issue I have is that it can be sometimes difficult to distinguish these >>>>> atoms from ring-fusion atoms. The pattern I used (see below) looks for >>>>> atoms that are part of three rings but cannot be bonded to an atom that >>>>> also fits this description, in order to avoid ring-fusion atoms. The code >>>>> works, except for cases where bridgehead atoms are bonded to a ring-fusion >>>>> atom. >>>>> >>>>> *PASS:* >>>>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]") >>>>> rdkit_mol = Chem.MolFromSmiles("C1CC2CCC1C2") >>>>> print(rdkit_mol.GetSubstructMatches(pattern)) >>>>> >>>((2,),(5,)) >>>>> >>>>> *FAIL:* >>>>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]") >>>>> rdkit_mol = Chem.MolFromSmiles("C1CC2C1C1CCC2C1") >>>>> print(rdkit_mol.GetSubstructMatches(pattern)) >>>>> >>>() >>>>> >>>>> Any hint on what alternative pattern I could use to isolate true >>>>> bridgeheads would be greatly appreciated. Maybe other strategies are more >>>>> suitable to find these atoms? >>>>> >>>>> Thanks in advance! >>>>> >>>>> Best regards, >>>>> Andreas >>>>> _______________________________________________ >>>>> Rdkit-discuss mailing list >>>>> Rdk...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>>> >>>> _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
From: Wim D. <wim...@gm...> - 2023-08-25 21:21:11
|
greetings all, i have thought about the problem some more, and in the end came to the conclusion that looping through all rings really is necessary. In the gist below you can see the adjusted code, making use of Pat Walters' method <https://sourceforge.net/p/rdkit/mailman/message/30387811/> for finding all rings. Apologies for the code being messy. https://gist.github.com/dehaenw/41eb8e4c39c1158e88b36c6dfc2606d8 fortunately, this one manages to also detect these difficult cases, see below: i did not check how fast it is, but i guess it will be a fair bit slower. best wishes, wim On Fri, Aug 25, 2023 at 8:28 PM Wim Dehaen <wim...@gm...> wrote: > Dear Andreas, > that's a good find. i agree the breaking case can be considered bridgehead > structure, as it's essentially bicyclo-[3.2.1]-octane plus an extra bond. I > need to think about this some more, but it might be related to getting the > ringinfo as SSSR instead of exhaustively. The best solution may therefore > be to just prune non ring atoms from the graph, enumerate all rings and > check it really exhaustively. > FWIW: rdMolDescriptors.CalcNumBridgeheadAtoms(mol) returns 0 for mol = > Chem.MolFromSmiles("C1CC2C3C2C1C3") too, so this may be an rdkit bug on > this end. > best wishes > wim > > On Fri, Aug 25, 2023 at 5:20 PM Andreas Luttens <and...@gm...> > wrote: > >> Dear Wim, >> >> Thanks for your reply! >> >> Apologies for the delay, finally got time to pick up this project again. >> >> Your suggestion works great, though I have found some cases where it >> breaks. For instance the molecule: >> >> mol = Chem.MolFromSmiles("C1CC2C3C2C1C3") >> >> It seems, in this case, a bridgehead atom is also a fused-ring atom. >> Maybe these looped compounds have too complex topology for this type of >> analysis. >> >> I don't see a straight way forward to identify just the bridgehead atoms. >> >> Best wishes, >> Andreas >> >> On Sat, Dec 3, 2022 at 12:53 PM Wim Dehaen <wim...@gm...> wrote: >> >>> Hi Andreas, >>> I don't have a good SMARTS pattern available for this but here is a >>> function that should return bridgehead idx and not include non bridgehead >>> fused ring atoms: >>> >>> ``` >>> def return_bridgeheads_idx(mol): >>> bh_list=[] >>> intersections=[] >>> sssr_idx = [set(x) for x in list(Chem.GetSymmSSSR(mol))] >>> for i,ring1 in enumerate(sssr_idx): >>> for j,ring2 in enumerate(sssr_idx): >>> if i>j: >>> intersections+=[ring1.intersection(ring2)] >>> for iidx in intersections: >>> if len(iidx)>2: #condition for bridgehead >>> for idx in iidx: >>> neighbors = [a.GetIdx() for a in >>> mol.GetAtomWithIdx(idx).GetNeighbors()] >>> bh_list+=[idx for nidx in neighbors if nidx not in iidx] >>> return tuple(set(bh_list)) >>> ``` >>> >>> Here are 6 test molecules: >>> >>> ``` >>> mol1 = Chem.MolFromSmiles("C1CC2CCC1C2") >>> mol2 = Chem.MolFromSmiles("C1CC2C1C1CCC2C1") >>> mol3 = Chem.MolFromSmiles("N1(CC2)CCC2CC1") >>> mol4 = Chem.MolFromSmiles("C1CCC12CCCCC2") >>> mol5 = Chem.MolFromSmiles("C1CC2C1CCCCC2") >>> mol6 = Chem.MolFromSmiles("C1CCC(C(CCC3)C23)C12") >>> for mol in [mol1,mol2,mol3,mol4,mol5,mol6]: >>> print(return_bridgeheads_idx(mol)) >>> ``` >>> >>> giving the expected answer: >>> >>> (2, 5) >>> (4, 7) >>> (0, 5) >>> () >>> () >>> () >>> >>> hope this is helpful! >>> >>> best wishes >>> wim >>> >>> On Sat, Dec 3, 2022 at 8:34 AM Andreas Luttens < >>> and...@gm...> wrote: >>> >>>> Dear users, >>>> >>>> I am trying to identify bridgehead atoms in multi-looped ring systems. >>>> The issue I have is that it can be sometimes difficult to distinguish these >>>> atoms from ring-fusion atoms. The pattern I used (see below) looks for >>>> atoms that are part of three rings but cannot be bonded to an atom that >>>> also fits this description, in order to avoid ring-fusion atoms. The code >>>> works, except for cases where bridgehead atoms are bonded to a ring-fusion >>>> atom. >>>> >>>> *PASS:* >>>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]") >>>> rdkit_mol = Chem.MolFromSmiles("C1CC2CCC1C2") >>>> print(rdkit_mol.GetSubstructMatches(pattern)) >>>> >>>((2,),(5,)) >>>> >>>> *FAIL:* >>>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]") >>>> rdkit_mol = Chem.MolFromSmiles("C1CC2C1C1CCC2C1") >>>> print(rdkit_mol.GetSubstructMatches(pattern)) >>>> >>>() >>>> >>>> Any hint on what alternative pattern I could use to isolate true >>>> bridgeheads would be greatly appreciated. Maybe other strategies are more >>>> suitable to find these atoms? >>>> >>>> Thanks in advance! >>>> >>>> Best regards, >>>> Andreas >>>> _______________________________________________ >>>> Rdkit-discuss mailing list >>>> Rdk...@li... >>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>> >>> |
From: Wim D. <wim...@gm...> - 2023-08-25 18:28:41
|
Dear Andreas, that's a good find. i agree the breaking case can be considered bridgehead structure, as it's essentially bicyclo-[3.2.1]-octane plus an extra bond. I need to think about this some more, but it might be related to getting the ringinfo as SSSR instead of exhaustively. The best solution may therefore be to just prune non ring atoms from the graph, enumerate all rings and check it really exhaustively. FWIW: rdMolDescriptors.CalcNumBridgeheadAtoms(mol) returns 0 for mol = Chem.MolFromSmiles("C1CC2C3C2C1C3") too, so this may be an rdkit bug on this end. best wishes wim On Fri, Aug 25, 2023 at 5:20 PM Andreas Luttens <and...@gm...> wrote: > Dear Wim, > > Thanks for your reply! > > Apologies for the delay, finally got time to pick up this project again. > > Your suggestion works great, though I have found some cases where it > breaks. For instance the molecule: > > mol = Chem.MolFromSmiles("C1CC2C3C2C1C3") > > It seems, in this case, a bridgehead atom is also a fused-ring atom. Maybe > these looped compounds have too complex topology for this type of analysis. > > I don't see a straight way forward to identify just the bridgehead atoms. > > Best wishes, > Andreas > > On Sat, Dec 3, 2022 at 12:53 PM Wim Dehaen <wim...@gm...> wrote: > >> Hi Andreas, >> I don't have a good SMARTS pattern available for this but here is a >> function that should return bridgehead idx and not include non bridgehead >> fused ring atoms: >> >> ``` >> def return_bridgeheads_idx(mol): >> bh_list=[] >> intersections=[] >> sssr_idx = [set(x) for x in list(Chem.GetSymmSSSR(mol))] >> for i,ring1 in enumerate(sssr_idx): >> for j,ring2 in enumerate(sssr_idx): >> if i>j: >> intersections+=[ring1.intersection(ring2)] >> for iidx in intersections: >> if len(iidx)>2: #condition for bridgehead >> for idx in iidx: >> neighbors = [a.GetIdx() for a in >> mol.GetAtomWithIdx(idx).GetNeighbors()] >> bh_list+=[idx for nidx in neighbors if nidx not in iidx] >> return tuple(set(bh_list)) >> ``` >> >> Here are 6 test molecules: >> >> ``` >> mol1 = Chem.MolFromSmiles("C1CC2CCC1C2") >> mol2 = Chem.MolFromSmiles("C1CC2C1C1CCC2C1") >> mol3 = Chem.MolFromSmiles("N1(CC2)CCC2CC1") >> mol4 = Chem.MolFromSmiles("C1CCC12CCCCC2") >> mol5 = Chem.MolFromSmiles("C1CC2C1CCCCC2") >> mol6 = Chem.MolFromSmiles("C1CCC(C(CCC3)C23)C12") >> for mol in [mol1,mol2,mol3,mol4,mol5,mol6]: >> print(return_bridgeheads_idx(mol)) >> ``` >> >> giving the expected answer: >> >> (2, 5) >> (4, 7) >> (0, 5) >> () >> () >> () >> >> hope this is helpful! >> >> best wishes >> wim >> >> On Sat, Dec 3, 2022 at 8:34 AM Andreas Luttens <and...@gm...> >> wrote: >> >>> Dear users, >>> >>> I am trying to identify bridgehead atoms in multi-looped ring systems. >>> The issue I have is that it can be sometimes difficult to distinguish these >>> atoms from ring-fusion atoms. The pattern I used (see below) looks for >>> atoms that are part of three rings but cannot be bonded to an atom that >>> also fits this description, in order to avoid ring-fusion atoms. The code >>> works, except for cases where bridgehead atoms are bonded to a ring-fusion >>> atom. >>> >>> *PASS:* >>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]") >>> rdkit_mol = Chem.MolFromSmiles("C1CC2CCC1C2") >>> print(rdkit_mol.GetSubstructMatches(pattern)) >>> >>>((2,),(5,)) >>> >>> *FAIL:* >>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]") >>> rdkit_mol = Chem.MolFromSmiles("C1CC2C1C1CCC2C1") >>> print(rdkit_mol.GetSubstructMatches(pattern)) >>> >>>() >>> >>> Any hint on what alternative pattern I could use to isolate true >>> bridgeheads would be greatly appreciated. Maybe other strategies are more >>> suitable to find these atoms? >>> >>> Thanks in advance! >>> >>> Best regards, >>> Andreas >>> _______________________________________________ >>> Rdkit-discuss mailing list >>> Rdk...@li... >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> |
From: S J. S. <swa...@gm...> - 2023-08-25 15:41:08
|
Perhaps using ring perception instead would work better? On Fri, Aug 25, 2023 at 10:22 AM Andreas Luttens <and...@gm...> wrote: > Dear Wim, > > Thanks for your reply! > > Apologies for the delay, finally got time to pick up this project again. > > Your suggestion works great, though I have found some cases where it > breaks. For instance the molecule: > > mol = Chem.MolFromSmiles("C1CC2C3C2C1C3") > > It seems, in this case, a bridgehead atom is also a fused-ring atom. Maybe > these looped compounds have too complex topology for this type of analysis. > > I don't see a straight way forward to identify just the bridgehead atoms. > > Best wishes, > Andreas > > On Sat, Dec 3, 2022 at 12:53 PM Wim Dehaen <wim...@gm...> wrote: > >> Hi Andreas, >> I don't have a good SMARTS pattern available for this but here is a >> function that should return bridgehead idx and not include non bridgehead >> fused ring atoms: >> >> ``` >> def return_bridgeheads_idx(mol): >> bh_list=[] >> intersections=[] >> sssr_idx = [set(x) for x in list(Chem.GetSymmSSSR(mol))] >> for i,ring1 in enumerate(sssr_idx): >> for j,ring2 in enumerate(sssr_idx): >> if i>j: >> intersections+=[ring1.intersection(ring2)] >> for iidx in intersections: >> if len(iidx)>2: #condition for bridgehead >> for idx in iidx: >> neighbors = [a.GetIdx() for a in >> mol.GetAtomWithIdx(idx).GetNeighbors()] >> bh_list+=[idx for nidx in neighbors if nidx not in iidx] >> return tuple(set(bh_list)) >> ``` >> >> Here are 6 test molecules: >> >> ``` >> mol1 = Chem.MolFromSmiles("C1CC2CCC1C2") >> mol2 = Chem.MolFromSmiles("C1CC2C1C1CCC2C1") >> mol3 = Chem.MolFromSmiles("N1(CC2)CCC2CC1") >> mol4 = Chem.MolFromSmiles("C1CCC12CCCCC2") >> mol5 = Chem.MolFromSmiles("C1CC2C1CCCCC2") >> mol6 = Chem.MolFromSmiles("C1CCC(C(CCC3)C23)C12") >> for mol in [mol1,mol2,mol3,mol4,mol5,mol6]: >> print(return_bridgeheads_idx(mol)) >> ``` >> >> giving the expected answer: >> >> (2, 5) >> (4, 7) >> (0, 5) >> () >> () >> () >> >> hope this is helpful! >> >> best wishes >> wim >> >> On Sat, Dec 3, 2022 at 8:34 AM Andreas Luttens <and...@gm...> >> wrote: >> >>> Dear users, >>> >>> I am trying to identify bridgehead atoms in multi-looped ring systems. >>> The issue I have is that it can be sometimes difficult to distinguish these >>> atoms from ring-fusion atoms. The pattern I used (see below) looks for >>> atoms that are part of three rings but cannot be bonded to an atom that >>> also fits this description, in order to avoid ring-fusion atoms. The code >>> works, except for cases where bridgehead atoms are bonded to a ring-fusion >>> atom. >>> >>> *PASS:* >>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]") >>> rdkit_mol = Chem.MolFromSmiles("C1CC2CCC1C2") >>> print(rdkit_mol.GetSubstructMatches(pattern)) >>> >>>((2,),(5,)) >>> >>> *FAIL:* >>> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]") >>> rdkit_mol = Chem.MolFromSmiles("C1CC2C1C1CCC2C1") >>> print(rdkit_mol.GetSubstructMatches(pattern)) >>> >>>() >>> >>> Any hint on what alternative pattern I could use to isolate true >>> bridgeheads would be greatly appreciated. Maybe other strategies are more >>> suitable to find these atoms? >>> >>> Thanks in advance! >>> >>> Best regards, >>> Andreas >>> _______________________________________________ >>> Rdkit-discuss mailing list >>> Rdk...@li... >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Sent from Gmail Mobile |
From: Andreas L. <and...@gm...> - 2023-08-25 15:20:26
|
Dear Wim, Thanks for your reply! Apologies for the delay, finally got time to pick up this project again. Your suggestion works great, though I have found some cases where it breaks. For instance the molecule: mol = Chem.MolFromSmiles("C1CC2C3C2C1C3") It seems, in this case, a bridgehead atom is also a fused-ring atom. Maybe these looped compounds have too complex topology for this type of analysis. I don't see a straight way forward to identify just the bridgehead atoms. Best wishes, Andreas On Sat, Dec 3, 2022 at 12:53 PM Wim Dehaen <wim...@gm...> wrote: > Hi Andreas, > I don't have a good SMARTS pattern available for this but here is a > function that should return bridgehead idx and not include non bridgehead > fused ring atoms: > > ``` > def return_bridgeheads_idx(mol): > bh_list=[] > intersections=[] > sssr_idx = [set(x) for x in list(Chem.GetSymmSSSR(mol))] > for i,ring1 in enumerate(sssr_idx): > for j,ring2 in enumerate(sssr_idx): > if i>j: > intersections+=[ring1.intersection(ring2)] > for iidx in intersections: > if len(iidx)>2: #condition for bridgehead > for idx in iidx: > neighbors = [a.GetIdx() for a in > mol.GetAtomWithIdx(idx).GetNeighbors()] > bh_list+=[idx for nidx in neighbors if nidx not in iidx] > return tuple(set(bh_list)) > ``` > > Here are 6 test molecules: > > ``` > mol1 = Chem.MolFromSmiles("C1CC2CCC1C2") > mol2 = Chem.MolFromSmiles("C1CC2C1C1CCC2C1") > mol3 = Chem.MolFromSmiles("N1(CC2)CCC2CC1") > mol4 = Chem.MolFromSmiles("C1CCC12CCCCC2") > mol5 = Chem.MolFromSmiles("C1CC2C1CCCCC2") > mol6 = Chem.MolFromSmiles("C1CCC(C(CCC3)C23)C12") > for mol in [mol1,mol2,mol3,mol4,mol5,mol6]: > print(return_bridgeheads_idx(mol)) > ``` > > giving the expected answer: > > (2, 5) > (4, 7) > (0, 5) > () > () > () > > hope this is helpful! > > best wishes > wim > > On Sat, Dec 3, 2022 at 8:34 AM Andreas Luttens <and...@gm...> > wrote: > >> Dear users, >> >> I am trying to identify bridgehead atoms in multi-looped ring systems. >> The issue I have is that it can be sometimes difficult to distinguish these >> atoms from ring-fusion atoms. The pattern I used (see below) looks for >> atoms that are part of three rings but cannot be bonded to an atom that >> also fits this description, in order to avoid ring-fusion atoms. The code >> works, except for cases where bridgehead atoms are bonded to a ring-fusion >> atom. >> >> *PASS:* >> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]") >> rdkit_mol = Chem.MolFromSmiles("C1CC2CCC1C2") >> print(rdkit_mol.GetSubstructMatches(pattern)) >> >>>((2,),(5,)) >> >> *FAIL:* >> pattern = Chem.MolFromSmarts("[$([x3]);!$([x3][x3])]") >> rdkit_mol = Chem.MolFromSmiles("C1CC2C1C1CCC2C1") >> print(rdkit_mol.GetSubstructMatches(pattern)) >> >>>() >> >> Any hint on what alternative pattern I could use to isolate true >> bridgeheads would be greatly appreciated. Maybe other strategies are more >> suitable to find these atoms? >> >> Thanks in advance! >> >> Best regards, >> Andreas >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdk...@li... >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > |