rdkit-discuss Mailing List for RDKit (Page 7)
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
You can subscribe to this list here.
| 2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
(27) |
Oct
(4) |
Nov
(20) |
Dec
(4) |
| 2008 |
Jan
(12) |
Feb
(2) |
Mar
(23) |
Apr
(40) |
May
(30) |
Jun
(6) |
Jul
(35) |
Aug
(60) |
Sep
(31) |
Oct
(33) |
Nov
(35) |
Dec
(3) |
| 2009 |
Jan
(16) |
Feb
(77) |
Mar
(88) |
Apr
(57) |
May
(33) |
Jun
(27) |
Jul
(55) |
Aug
(26) |
Sep
(12) |
Oct
(45) |
Nov
(42) |
Dec
(23) |
| 2010 |
Jan
(64) |
Feb
(17) |
Mar
(30) |
Apr
(55) |
May
(30) |
Jun
(65) |
Jul
(112) |
Aug
(26) |
Sep
(67) |
Oct
(20) |
Nov
(67) |
Dec
(23) |
| 2011 |
Jan
(57) |
Feb
(43) |
Mar
(50) |
Apr
(66) |
May
(95) |
Jun
(73) |
Jul
(64) |
Aug
(47) |
Sep
(22) |
Oct
(56) |
Nov
(51) |
Dec
(34) |
| 2012 |
Jan
(64) |
Feb
(45) |
Mar
(65) |
Apr
(85) |
May
(76) |
Jun
(47) |
Jul
(75) |
Aug
(72) |
Sep
(31) |
Oct
(77) |
Nov
(61) |
Dec
(41) |
| 2013 |
Jan
(68) |
Feb
(63) |
Mar
(36) |
Apr
(73) |
May
(61) |
Jun
(69) |
Jul
(98) |
Aug
(60) |
Sep
(74) |
Oct
(102) |
Nov
(92) |
Dec
(63) |
| 2014 |
Jan
(112) |
Feb
(84) |
Mar
(72) |
Apr
(59) |
May
(96) |
Jun
(54) |
Jul
(91) |
Aug
(54) |
Sep
(38) |
Oct
(47) |
Nov
(33) |
Dec
(39) |
| 2015 |
Jan
(41) |
Feb
(115) |
Mar
(66) |
Apr
(87) |
May
(63) |
Jun
(53) |
Jul
(61) |
Aug
(59) |
Sep
(115) |
Oct
(42) |
Nov
(60) |
Dec
(20) |
| 2016 |
Jan
(52) |
Feb
(72) |
Mar
(100) |
Apr
(125) |
May
(61) |
Jun
(106) |
Jul
(62) |
Aug
(74) |
Sep
(151) |
Oct
(151) |
Nov
(117) |
Dec
(148) |
| 2017 |
Jan
(106) |
Feb
(75) |
Mar
(106) |
Apr
(67) |
May
(85) |
Jun
(144) |
Jul
(53) |
Aug
(73) |
Sep
(188) |
Oct
(106) |
Nov
(118) |
Dec
(74) |
| 2018 |
Jan
(96) |
Feb
(43) |
Mar
(40) |
Apr
(111) |
May
(77) |
Jun
(112) |
Jul
(64) |
Aug
(85) |
Sep
(73) |
Oct
(117) |
Nov
(97) |
Dec
(47) |
| 2019 |
Jan
(63) |
Feb
(112) |
Mar
(109) |
Apr
(61) |
May
(51) |
Jun
(41) |
Jul
(57) |
Aug
(68) |
Sep
(47) |
Oct
(126) |
Nov
(117) |
Dec
(96) |
| 2020 |
Jan
(84) |
Feb
(82) |
Mar
(80) |
Apr
(100) |
May
(78) |
Jun
(68) |
Jul
(76) |
Aug
(69) |
Sep
(76) |
Oct
(73) |
Nov
(69) |
Dec
(42) |
| 2021 |
Jan
(44) |
Feb
(30) |
Mar
(85) |
Apr
(65) |
May
(41) |
Jun
(72) |
Jul
(55) |
Aug
(9) |
Sep
(44) |
Oct
(44) |
Nov
(30) |
Dec
(40) |
| 2022 |
Jan
(35) |
Feb
(29) |
Mar
(55) |
Apr
(30) |
May
(31) |
Jun
(27) |
Jul
(49) |
Aug
(15) |
Sep
(17) |
Oct
(25) |
Nov
(15) |
Dec
(40) |
| 2023 |
Jan
(32) |
Feb
(10) |
Mar
(10) |
Apr
(21) |
May
(33) |
Jun
(31) |
Jul
(12) |
Aug
(17) |
Sep
(14) |
Oct
(12) |
Nov
(8) |
Dec
(12) |
| 2024 |
Jan
(10) |
Feb
(18) |
Mar
(7) |
Apr
(4) |
May
(6) |
Jun
(4) |
Jul
(5) |
Aug
(6) |
Sep
(8) |
Oct
(1) |
Nov
(1) |
Dec
|
| 2025 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(2) |
Sep
(3) |
Oct
(2) |
Nov
|
Dec
|
|
From: Jean-Marc N. <jm....@un...> - 2023-07-26 19:43:18
|
Dear all, I use the following code to produce PNG drawings. I use RDKit version 2023.03.1 . The SMILES chain describes a molecule with a single chiral center of defined configuration. from rdkit import Chem from rdkit.Chem import rdCoordGen from rdkit.Chem.Draw import rdMolDraw2D from PIL import Image from io import BytesIO smi = "C=C[C@@](/C=C/c1ccc(cc1)O)(CCC=C(C)C)C" filenameOut = "img.png" mol = Chem.MolFromSmiles(smi) rdCoordGen.AddCoords(mol) print(Chem.MolToMolBlock(mol)) d2d = rdMolDraw2D.MolDraw2DCairo(350, 300) dopts = d2d.drawOptions() dopts.baseFontSize = 0.6 dopts.prepareMolsBeforeDrawing = False mol_draw = rdMolDraw2D.PrepareMolForDrawing(mol, addChiralHs=False, wedgeBonds=False) Chem.ReapplyMolBlockWedging(mol_draw) d2d.DrawMolecule(mol_draw, legend='', highlightAtoms=[]) d2d.FinishDrawing() bio = BytesIO(d2d.GetDrawingText()) draw_code = Image.open(bio) draw_code.save(filenameOut) The resulting image does not show the chirality wedge: The script prints the MolBlock that comes from the SMILES and the calculation of the 2D atomic coordinates: _________________ RDKit 2D 19 19 0 0 0 0 0 0 0 0999 V2000 2.0515 1.2242 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.0515 1.2214 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5539 0.3540 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.4459 0.3512 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.9435 -0.5162 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -1.9435 -0.5190 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.4411 -1.3866 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -3.4411 -1.3892 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -3.9435 -0.5246 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -3.4459 0.3428 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -2.4459 0.3456 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -4.9435 -0.5274 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 1.4215 -0.1436 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.2861 0.3588 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.1535 -0.1388 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.0181 0.3636 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.0153 1.3636 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.8855 -0.1338 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5567 -0.6460 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 2 3 1 0 3 4 1 0 4 5 2 0 5 6 1 0 6 7 2 0 7 8 1 0 8 9 2 0 9 10 1 0 10 11 2 0 9 12 1 0 3 13 1 0 13 14 1 0 14 15 1 0 15 16 2 0 16 17 1 0 16 18 1 0 3 19 1 1 11 6 1 0 M END _____________________ The wedge bond (3-19) is apparently here but is not drawn as such. Is there a remedy for that? Best regards, Jean-Marc -- Jean-Marc Nuzillard Directeur de Recherches au CNRS Institut de Chimie Moléculaire de Reims CNRS UMR 7312 Moulin de la Housse CPCBAI, Bâtiment 18 BP 1039 51687 REIMS Cedex 2 France ORCID : 0000-0002-5120-2556 Tel : +33 (0)3 26 91 82 10 http://www.univ-reims.fr/icmr https://nuzillard.github.io/PyLSD |
|
From: Ling C. <lin...@gm...> - 2023-07-26 05:20:56
|
Interesting. Thanks Rafael. So it's a bug of BindingDB. Perhaps you should let them know too. Ling Rafael L via Rdkit-discuss <rdk...@li...> 於 2023年7月25日週二 上午12:53寫道: > Hi, I'm just creating this thread to get the problem and the solution > indexed by Google > > I downloaded several SDF datasets from BindingDB and got errors like this > one when using Chem.SDMolSupplier: > > ERROR: Cannot convert 1. to unsigned int > > After some digging I found [ > https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAB3Bi0AYKAYOzMUumk6BscipsujkmG_-uuho3%3Dsf9mNyfoBAwA%40mail.gmail.com/#msg32641808], > and it turns out the headers (before every mol) should have three lines. > The BindingDB files only had two. > > In these files, each mol+properties block was separated by four dollar > sign symbols. My solution was to add a blank line after each $$$$ by using > Notepad++ Find and replace: $$$$ by $$$$ + (new line). > > -- > *Rafael da Fonseca Lameiro* > PhD Student - Medicinal and Biological Chemistry Group (NEQUIMED) > São Carlos Institute of Chemistry - University of São Paulo - Brazil > [image: orcid logo 16px] https://orcid.org/0000-0003-4466-2682 > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
|
From: Rafael L <raf...@al...> - 2023-07-25 07:50:50
|
Hi, I'm just creating this thread to get the problem and the solution indexed by Google I downloaded several SDF datasets from BindingDB and got errors like this one when using Chem.SDMolSupplier: ERROR: Cannot convert 1. to unsigned int After some digging I found [ https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAB3Bi0AYKAYOzMUumk6BscipsujkmG_-uuho3%3Dsf9mNyfoBAwA%40mail.gmail.com/#msg32641808], and it turns out the headers (before every mol) should have three lines. The BindingDB files only had two. In these files, each mol+properties block was separated by four dollar sign symbols. My solution was to add a blank line after each $$$$ by using Notepad++ Find and replace: $$$$ by $$$$ + (new line). -- *Rafael da Fonseca Lameiro* PhD Student - Medicinal and Biological Chemistry Group (NEQUIMED) São Carlos Institute of Chemistry - University of São Paulo - Brazil [image: orcid logo 16px] https://orcid.org/0000-0003-4466-2682 |
|
From: Jarod Y. <jar...@ho...> - 2023-07-03 19:36:30
|
I figured it. I was using std::maked_shared on the returned ChemicalReaction pointer, but the shared pointer was not performing as anticipated in the copy constructors. Eliminating the shared pointer conversion, adding an explicit delete to the destructor, and deep pointer copies to the copy constructors fixed the memory leak. Sent from my iPhone On Jul 3, 2023, at 1:36 PM, Jarod Younker <jar...@ho...> wrote: Posted the Valgrind HEAP Summary showing the memory allocation in the shared library … When the pointer goes out of scope, that memory must not be deallocated. Bug? ==10036== HEAP SUMMARY: ==10036== in use at exit: 2,324,704 bytes in 31,944 blocks ==10036== total heap usage: 140,499 allocs, 108,555 frees, 8,952,289 bytes allocated ==10036== ==10036== 2,324,704 (15,232 direct, 2,309,472 indirect) bytes in 136 blocks are definitely lost in loss record 93 of 93 ==10036== at 0x4845013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==10036== by 0x4B0D550: RDKit::RxnSmartsToChemicalReaction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char _traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allo cator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std:: char_traits<char>, std::allocator<char> > > > >*, bool, bool) (in /opt/rdkit-Release_2022_09_2/lib/libRDKitChemReactions.so.1.2022.09.1) ==10036== by 0x11E943: kMC::Reaction::Reaction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, double, st d::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (Reaction.cpp:18) Sent from my iPhone On Jul 3, 2023, at 12:48 PM, Jarod Younker <jar...@ho...> wrote: The function RxnSmartsToChemicalReaction() returns a pointer to a ChemicalReaction object. Is this pointer a std::unique_ptr? How does one deallocate said pointer after it goes out of scope? Valgrind identifies a definite memory loss in libRDKitChemReactions.so from a new(unsigned long) Thanks, J Sent from my iPhone _______________________________________________ Rdkit-discuss mailing list Rdk...@li... https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&data=05%7C01%7C%7C8d25b02bc4ab44cc1ad708db7beda725%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638240032871322956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8Dmfydb9VO9ApXMonLqqRQot3LUvpD4ijtWUmOrUPWw%3D&reserved=0 |
|
From: Jarod Y. <jar...@ho...> - 2023-07-03 18:34:15
|
Posted the Valgrind HEAP Summary showing the memory allocation in the shared library … When the pointer goes out of scope, that memory must not be deallocated. Bug? ==10036== HEAP SUMMARY: ==10036== in use at exit: 2,324,704 bytes in 31,944 blocks ==10036== total heap usage: 140,499 allocs, 108,555 frees, 8,952,289 bytes allocated ==10036== ==10036== 2,324,704 (15,232 direct, 2,309,472 indirect) bytes in 136 blocks are definitely lost in loss record 93 of 93 ==10036== at 0x4845013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==10036== by 0x4B0D550: RDKit::RxnSmartsToChemicalReaction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char _traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allo cator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std:: char_traits<char>, std::allocator<char> > > > >*, bool, bool) (in /opt/rdkit-Release_2022_09_2/lib/libRDKitChemReactions.so.1.2022.09.1) ==10036== by 0x11E943: kMC::Reaction::Reaction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, double, st d::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (Reaction.cpp:18) Sent from my iPhone On Jul 3, 2023, at 12:48 PM, Jarod Younker <jar...@ho...> wrote: The function RxnSmartsToChemicalReaction() returns a pointer to a ChemicalReaction object. Is this pointer a std::unique_ptr? How does one deallocate said pointer after it goes out of scope? Valgrind identifies a definite memory loss in libRDKitChemReactions.so from a new(unsigned long) Thanks, J Sent from my iPhone _______________________________________________ Rdkit-discuss mailing list Rdk...@li... https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&data=05%7C01%7C%7C8d25b02bc4ab44cc1ad708db7beda725%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638240032871322956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8Dmfydb9VO9ApXMonLqqRQot3LUvpD4ijtWUmOrUPWw%3D&reserved=0 |
|
From: Jarod Y. <jar...@ho...> - 2023-07-03 17:46:00
|
The function RxnSmartsToChemicalReaction() returns a pointer to a ChemicalReaction object. Is this pointer a std::unique_ptr? How does one deallocate said pointer after it goes out of scope? Valgrind identifies a definite memory loss in libRDKitChemReactions.so from a new(unsigned long) Thanks, J Sent from my iPhone |
|
From: Wim D. <wim...@gm...> - 2023-06-30 23:22:23
|
Hi Joey,
I think the most straightforward way to do this is to use GetNeighbors() on
all atoms. See below for an example:
from rdkit import Chem
mol=Chem.MolFromSmiles("O1COc2c1ccc(CC(NC)C)c2")
substruct=Chem.MolFromSmarts("c1ccccc1")
a=mol.GetSubstructMatch(substruct)
print("substructure benzene can be found at atoms with idx",a)
extended_idx=set([])
for idx in a:
for n in mol.GetAtomWithIdx(idx).GetNeighbors():
nidx=n.GetIdx()
if nidx not in a and nidx not in extended_idx:
extended_idx.add(nidx)
print("neighboring atoms of substructure benzene are at idx",extended_idx)
print("idx of atoms in entire extended substructure
are",extended_idx.union(a))
if you'd like to include hydrogens in the extending of the substructure as
well you can add explicit hydrogens using mol=Chem.AddHs(mol)
best wishes,
wim
On Fri, Jun 30, 2023 at 11:07 PM Storer, Joey (J) via Rdkit-discuss <
rdk...@li...> wrote:
> Dear RDKit experts,
>
>
>
> Substructure search is working well these days. RDKit is wonderful.
>
>
>
> For subsequent QM calcs., I would like to get the “next atom over” or the
> “one-atom-neighborhood” surrounding a substructure.
>
>
>
> The result would be something bigger than the original substructure with
> open valence capped by Hydrogen.
>
>
>
> Thanks for your thoughts,
>
> Joey Storer
>
> Dow Inc.
>
> General Business
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
|
|
From: Storer, J. (J) <JWS...@do...> - 2023-06-30 21:05:09
|
Dear RDKit experts, Substructure search is working well these days. RDKit is wonderful. For subsequent QM calcs., I would like to get the "next atom over" or the "one-atom-neighborhood" surrounding a substructure. The result would be something bigger than the original substructure with open valence capped by Hydrogen. Thanks for your thoughts, Joey Storer Dow Inc. General Business |
|
From: Jarod Y. <jar...@ho...> - 2023-06-23 18:25:43
|
The function RxnSmartsToChemicalReaction() returns a pointer to a ChemicalReaction object. Is this pointer a std::unique_ptr? How does one deallocate said pointer after it goes out of scope? Thanks, J Sent from my iPhone |
|
From: מיכל ר. <mic...@gm...> - 2023-06-22 08:36:14
|
Thank you so much for your help!
This was very helpful
Michal
בתאריך יום ד׳, 21 ביוני 2023 ב-13:56 מאת Ivan Tubert-Brohman <
iva...@sc...>:
> Hi Michal,
>
> A key point to consider is that the default bond order in SMARTS is not
> single, but "single or aromatic". If you really want to match single bonds
> only, you can specify a single bond with "-".
>
> However, it sounds as if you actually expect aromatic bonds to match as
> well, since you expect Cc1ccccc1C to become CC=CC=CC=CC; but only bonds
> that would show up as "single" on a Kekule structure? You _could_ do that
> if you kekulize the molecule first (Chem.Kekulize(mol)), but the problem
> there is that there is more than one Kekule structure and you'll only get
> one of them, where the bond that you were thinking of breaking in this
> example might be double.
>
> Regarding "aromatic" non-ring atoms on the products, I think sanitizing
> the products might help. (See Chem.SanitizeMol).
>
> Hope this helps,
> Ivan
>
> On Wed, Jun 21, 2023 at 5:30 AM מיכל רוט <mic...@gm...>
> wrote:
>
>> Hello,
>> I am trying to run reactions using RunReactants.
>> I want to define a general reaבtion involving braking bond in a chain of
>> one-single bonded carbons: '[#6:1][#6:2][#6:3][#6:4] >>
>> ([#6:1][#6:2].[#6:3][#6:4])'.
>> When the reaction does not involve any aromatic carbons all works fine,
>> bun when I hava a reactant with aromatic carbons I get strange things..
>> For example - if I use the following structure as reactant - 'Cc1ccccc1C'
>> I expect to get after the reaction only one product - 'CC=CC=CC=CC'.
>> But it seems that any chain in the aromatic ring recognized as one-single
>> bonded carbons and I get many other fragments( - after converting the
>> pruducts to smiles format using Chem.MolToSmiles):
>> ['CccccccC',
>> 'ccccc(C)cC',
>> 'C',
>> 'Cc1ccccc-1',
>> 'ccccc(C)CC',
>> 'cccc(C)c(C)C',
>> 'C',
>> 'Cc1ccccc-1',
>> 'ccccc(C)CC',
>> 'cccc(C)c(C)C',
>> 'CccccccC',
>> 'ccccc(C)cC',
>> 'CcccccCC',
>> 'C',
>> 'Cc1c-cccc1',
>> 'ccc(C)c(C)cC',
>> 'Ccccc(C)cC',
>> 'cc(C)c(C)ccC',
>> 'cc(C)c(C)ccC',
>> 'Ccccc(C)cC',
>> 'ccc(C)c(C)cC',
>> 'C',
>> 'Cc1c-cccc1',
>> 'CcccccCC']
>> More over, atoms are marked as aromatic eventhough they are not anymore.
>> When I try to change it back to mol object (using Chem.MolFromSmiles) I
>> get an error message:
>> "non-ring atom 1 marked aromatic"
>>
>> I will be greatfull if anyone has an idea for solving this problem
>> without specifing the aromatic ring in the reaction (I want to keep the
>> reaction general so it could happen in any place in the molecule that has a
>> chain of 3 single bonds).
>>
>> Thank you!
>> Michal
>>
>> my code:
>>
>> def flatten(lst): # take list of lists and convert them to one list.
>> return sum(([x] if not isinstance(x, list) else flatten(x)
>> for x in lst), [])
>>
>> def flat_children(children):
>> return flatten(children)
>>
>> def mol_to_smile(MolList): # convert mol to smiles
>> smiles = []
>> for x in range(len(MolList)):
>> new_smile = Chem.MolToSmiles(MolList[x])
>> new_smiles = split_smile(new_smile)
>> for n_s in new_smiles:
>> smiles.append(n_s)
>> return smiles
>>
>> def split_smile(smile):
>> """The function split two fragments in one smile into two smiles"""
>> smile = smile.split(".")
>> return(smile)
>>
>> smile = 'CC1=CC=CC=C1C'
>> mol = Chem.MolFromSmiles(smile)
>> rxn = AllChem.ReactionFromSmarts('[#6:1][#6:2][#6:3] >>
>> ([#6:1][#6:2].[#6:3])')
>> products = rxn.RunReactants((mol,))
>> mols = []
>> for product in products:
>> mols.append([x for x in product]) # converts the products from tuples
>> (products) to list (mols)
>> mols = flat_children(mols) # return [Mol, Mol, ..]
>> smiles = mol_to_smile(mols)
>> print(smiles)
>>
>> for smile in smiles:
>> m = Chem.MolFromSmiles(smile)
>> img = Draw.MolToImage(m)
>> img.show()
>>
>>
>> output:
>>
>> ['CccccccC',
>> 'ccccc(C)cC',
>> 'C',
>> 'Cc1ccccc-1',
>> 'ccccc(C)CC',
>> 'cccc(C)c(C)C',
>> 'C',
>> 'Cc1ccccc-1',
>> 'ccccc(C)CC',
>> 'cccc(C)c(C)C',
>> 'CccccccC',
>> 'ccccc(C)cC',
>> 'CcccccCC',
>> 'C',
>> 'Cc1c-cccc1',
>> 'ccc(C)c(C)cC',
>> 'Ccccc(C)cC',
>> 'cc(C)c(C)ccC',
>> 'cc(C)c(C)ccC',
>> 'Ccccc(C)cC',
>> 'ccc(C)c(C)cC',
>> 'C',
>> 'Cc1c-cccc1',
>> 'CcccccCC']
>>
>> [12:06:11] non-ring atom 1 marked aromatic
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdk...@li...
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
|
|
From: Ivan Tubert-B. <iva...@sc...> - 2023-06-21 11:24:19
|
Hi Michal,
A key point to consider is that the default bond order in SMARTS is not
single, but "single or aromatic". If you really want to match single bonds
only, you can specify a single bond with "-".
However, it sounds as if you actually expect aromatic bonds to match as
well, since you expect Cc1ccccc1C to become CC=CC=CC=CC; but only bonds
that would show up as "single" on a Kekule structure? You _could_ do that
if you kekulize the molecule first (Chem.Kekulize(mol)), but the problem
there is that there is more than one Kekule structure and you'll only get
one of them, where the bond that you were thinking of breaking in this
example might be double.
Regarding "aromatic" non-ring atoms on the products, I think sanitizing the
products might help. (See Chem.SanitizeMol).
Hope this helps,
Ivan
On Wed, Jun 21, 2023 at 5:30 AM מיכל רוט <mic...@gm...>
wrote:
> Hello,
> I am trying to run reactions using RunReactants.
> I want to define a general reaבtion involving braking bond in a chain of
> one-single bonded carbons: '[#6:1][#6:2][#6:3][#6:4] >>
> ([#6:1][#6:2].[#6:3][#6:4])'.
> When the reaction does not involve any aromatic carbons all works fine,
> bun when I hava a reactant with aromatic carbons I get strange things..
> For example - if I use the following structure as reactant - 'Cc1ccccc1C'
> I expect to get after the reaction only one product - 'CC=CC=CC=CC'.
> But it seems that any chain in the aromatic ring recognized as one-single
> bonded carbons and I get many other fragments( - after converting the
> pruducts to smiles format using Chem.MolToSmiles):
> ['CccccccC',
> 'ccccc(C)cC',
> 'C',
> 'Cc1ccccc-1',
> 'ccccc(C)CC',
> 'cccc(C)c(C)C',
> 'C',
> 'Cc1ccccc-1',
> 'ccccc(C)CC',
> 'cccc(C)c(C)C',
> 'CccccccC',
> 'ccccc(C)cC',
> 'CcccccCC',
> 'C',
> 'Cc1c-cccc1',
> 'ccc(C)c(C)cC',
> 'Ccccc(C)cC',
> 'cc(C)c(C)ccC',
> 'cc(C)c(C)ccC',
> 'Ccccc(C)cC',
> 'ccc(C)c(C)cC',
> 'C',
> 'Cc1c-cccc1',
> 'CcccccCC']
> More over, atoms are marked as aromatic eventhough they are not anymore.
> When I try to change it back to mol object (using Chem.MolFromSmiles) I
> get an error message:
> "non-ring atom 1 marked aromatic"
>
> I will be greatfull if anyone has an idea for solving this problem without
> specifing the aromatic ring in the reaction (I want to keep the reaction
> general so it could happen in any place in the molecule that has a chain of
> 3 single bonds).
>
> Thank you!
> Michal
>
> my code:
>
> def flatten(lst): # take list of lists and convert them to one list.
> return sum(([x] if not isinstance(x, list) else flatten(x)
> for x in lst), [])
>
> def flat_children(children):
> return flatten(children)
>
> def mol_to_smile(MolList): # convert mol to smiles
> smiles = []
> for x in range(len(MolList)):
> new_smile = Chem.MolToSmiles(MolList[x])
> new_smiles = split_smile(new_smile)
> for n_s in new_smiles:
> smiles.append(n_s)
> return smiles
>
> def split_smile(smile):
> """The function split two fragments in one smile into two smiles"""
> smile = smile.split(".")
> return(smile)
>
> smile = 'CC1=CC=CC=C1C'
> mol = Chem.MolFromSmiles(smile)
> rxn = AllChem.ReactionFromSmarts('[#6:1][#6:2][#6:3] >>
> ([#6:1][#6:2].[#6:3])')
> products = rxn.RunReactants((mol,))
> mols = []
> for product in products:
> mols.append([x for x in product]) # converts the products from tuples
> (products) to list (mols)
> mols = flat_children(mols) # return [Mol, Mol, ..]
> smiles = mol_to_smile(mols)
> print(smiles)
>
> for smile in smiles:
> m = Chem.MolFromSmiles(smile)
> img = Draw.MolToImage(m)
> img.show()
>
>
> output:
>
> ['CccccccC',
> 'ccccc(C)cC',
> 'C',
> 'Cc1ccccc-1',
> 'ccccc(C)CC',
> 'cccc(C)c(C)C',
> 'C',
> 'Cc1ccccc-1',
> 'ccccc(C)CC',
> 'cccc(C)c(C)C',
> 'CccccccC',
> 'ccccc(C)cC',
> 'CcccccCC',
> 'C',
> 'Cc1c-cccc1',
> 'ccc(C)c(C)cC',
> 'Ccccc(C)cC',
> 'cc(C)c(C)ccC',
> 'cc(C)c(C)ccC',
> 'Ccccc(C)cC',
> 'ccc(C)c(C)cC',
> 'C',
> 'Cc1c-cccc1',
> 'CcccccCC']
>
> [12:06:11] non-ring atom 1 marked aromatic
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
|
|
From: מיכל ר. <mic...@gm...> - 2023-06-21 09:28:01
|
Hello,
I am trying to run reactions using RunReactants.
I want to define a general reaבtion involving braking bond in a chain of
one-single bonded carbons: '[#6:1][#6:2][#6:3][#6:4] >>
([#6:1][#6:2].[#6:3][#6:4])'.
When the reaction does not involve any aromatic carbons all works fine, bun
when I hava a reactant with aromatic carbons I get strange things..
For example - if I use the following structure as reactant - 'Cc1ccccc1C' I
expect to get after the reaction only one product - 'CC=CC=CC=CC'.
But it seems that any chain in the aromatic ring recognized as one-single
bonded carbons and I get many other fragments( - after converting the
pruducts to smiles format using Chem.MolToSmiles):
['CccccccC',
'ccccc(C)cC',
'C',
'Cc1ccccc-1',
'ccccc(C)CC',
'cccc(C)c(C)C',
'C',
'Cc1ccccc-1',
'ccccc(C)CC',
'cccc(C)c(C)C',
'CccccccC',
'ccccc(C)cC',
'CcccccCC',
'C',
'Cc1c-cccc1',
'ccc(C)c(C)cC',
'Ccccc(C)cC',
'cc(C)c(C)ccC',
'cc(C)c(C)ccC',
'Ccccc(C)cC',
'ccc(C)c(C)cC',
'C',
'Cc1c-cccc1',
'CcccccCC']
More over, atoms are marked as aromatic eventhough they are not anymore.
When I try to change it back to mol object (using Chem.MolFromSmiles) I
get an error message:
"non-ring atom 1 marked aromatic"
I will be greatfull if anyone has an idea for solving this problem without
specifing the aromatic ring in the reaction (I want to keep the reaction
general so it could happen in any place in the molecule that has a chain of
3 single bonds).
Thank you!
Michal
my code:
def flatten(lst): # take list of lists and convert them to one list.
return sum(([x] if not isinstance(x, list) else flatten(x)
for x in lst), [])
def flat_children(children):
return flatten(children)
def mol_to_smile(MolList): # convert mol to smiles
smiles = []
for x in range(len(MolList)):
new_smile = Chem.MolToSmiles(MolList[x])
new_smiles = split_smile(new_smile)
for n_s in new_smiles:
smiles.append(n_s)
return smiles
def split_smile(smile):
"""The function split two fragments in one smile into two smiles"""
smile = smile.split(".")
return(smile)
smile = 'CC1=CC=CC=C1C'
mol = Chem.MolFromSmiles(smile)
rxn = AllChem.ReactionFromSmarts('[#6:1][#6:2][#6:3] >>
([#6:1][#6:2].[#6:3])')
products = rxn.RunReactants((mol,))
mols = []
for product in products:
mols.append([x for x in product]) # converts the products from tuples
(products) to list (mols)
mols = flat_children(mols) # return [Mol, Mol, ..]
smiles = mol_to_smile(mols)
print(smiles)
for smile in smiles:
m = Chem.MolFromSmiles(smile)
img = Draw.MolToImage(m)
img.show()
output:
['CccccccC',
'ccccc(C)cC',
'C',
'Cc1ccccc-1',
'ccccc(C)CC',
'cccc(C)c(C)C',
'C',
'Cc1ccccc-1',
'ccccc(C)CC',
'cccc(C)c(C)C',
'CccccccC',
'ccccc(C)cC',
'CcccccCC',
'C',
'Cc1c-cccc1',
'ccc(C)c(C)cC',
'Ccccc(C)cC',
'cc(C)c(C)ccC',
'cc(C)c(C)ccC',
'Ccccc(C)cC',
'ccc(C)c(C)cC',
'C',
'Cc1c-cccc1',
'CcccccCC']
[12:06:11] non-ring atom 1 marked aromatic
|
|
From: Hilleke M. <mat...@ph...> - 2023-06-20 12:56:17
|
Dear all,
I am new to working with the Pharm3D module and I have a question regarding pharmacophore matching: Is it possible to match a molecule to a pharmacophore that requires one atom to match to two different features (e.g. H-bond acceptor and donor)? Here is a dummy code example for a case where I am trying to match ethanol back to all of its pharmacophore features (I am using rdkit version 2022.09.4):
import os
from rdkit import Chem, RDConfig
from rdkit.Chem import AllChem, ChemicalFeatures, rdDistGeom
from rdkit.Chem.Pharm3D import Pharmacophore, EmbedLib
feature_definition_file = os.path.join(RDConfig.RDDataDir,'BaseFeatures.fdef')
featFactory = ChemicalFeatures.BuildFeatureFactory(feature_definition_file)
m1 = Chem.MolFromSmiles('CCO') # ethanol
m1 = Chem.AddHs(m1)
AllChem.EmbedMolecule(m1)
features = featFactory.GetFeaturesForMol(m1)
# This is to keep track of features
for i in features:
print(i.GetFamily())
#features = features[1:] #removing donor feature will make it work
print()
for i in features:
print(i.GetFamily())
pcophore = Pharmacophore.Pharmacophore(features)
m2 = Chem.MolFromSmiles('CCO') # ethanol again
m2 = Chem.AddHs(m2)
AllChem.EmbedMolecule(m2)
bool_match, matches = EmbedLib.MatchPharmacophoreToMol(m2, featFactory, pcophore)
print("bool_match =", bool_match)
boundsMat = rdDistGeom.GetMoleculeBoundsMatrix(m2)
failed, boundsMatMatched, matched, matchDetails = EmbedLib.MatchPharmacophore(matches, boundsMat, pcophore, useDownsampling=True)
print("matched =", matched)
MatchPharmacophore() appears to be working only by removing one of the two features derived from the oxygen. Am I doing something wrong or is there a neat way to fix this?
Kind regards,
Mattis
|
|
From: Ling C. <lin...@gm...> - 2023-06-20 05:09:37
|
Great to know, many thanks Greg!
Ling
Greg Landrum <gre...@gm...> 於 2023年6月19日週一 下午9:53寫道:
> Hi Ling,
>
> On Mon, Jun 19, 2023 at 3:03 AM Ling Chan <lin...@gm...> wrote:
>
>>
>> I got some questions about atom indexing. Just wonder if you could help
>> me?
>>
>> 1. In m3=Chem.CombineMols(m1,m2) , is it guaranteed that the atom
>> indices in m3 is equivalent to the indices in m1 followed by the indices in
>> m2?
>>
>> Yes
>
>>
>> 1. If I construct an editable mol from m1, is it that the atomic
>> indices in the editable mol is equivalent to that in m1? And when I convert
>> the editable mol back, suppose the atom indexing is also preserved?
>> 2.
>>
>> Yes
>
>>
>> 1. Same as #2, but for an RWMol instead of an editable mol.
>>
>> Yes
>
>>
>> 1. If I delete an "F" atom from an editable mol, is there a way to
>> mark the atom in the new mol that was originally bonded to the "F"? I mean,
>> if I get its atomic index before the deletion, suppose it won't be
>> preserved.
>>
>> You can set a property on the neighboring atom with something like:
> atom.SetProp("F_Neighbor","1")
>
>>
>> 1. Similar to #4, but for DeleteSubstructs.
>>
>> Same answer: you can always use SetProp
>
>
>> Alternatively, if there is a way to mark atoms, I don't need the atom
>> indices anyway.
>>
>
> As mentioned, SetProp is great for this. Note that using the property
> interface is slower than just relying on the indexing remaining the same,
> which you can do in the first two cases.
>
> best regards,
> -greg
>
>
|
|
From: Greg L. <gre...@gm...> - 2023-06-20 04:53:51
|
Hi Ling,
On Mon, Jun 19, 2023 at 3:03 AM Ling Chan <lin...@gm...> wrote:
>
> I got some questions about atom indexing. Just wonder if you could help me?
>
> 1. In m3=Chem.CombineMols(m1,m2) , is it guaranteed that the atom
> indices in m3 is equivalent to the indices in m1 followed by the indices in
> m2?
>
> Yes
>
> 1. If I construct an editable mol from m1, is it that the atomic
> indices in the editable mol is equivalent to that in m1? And when I convert
> the editable mol back, suppose the atom indexing is also preserved?
> 2.
>
> Yes
>
> 1. Same as #2, but for an RWMol instead of an editable mol.
>
> Yes
>
> 1. If I delete an "F" atom from an editable mol, is there a way to
> mark the atom in the new mol that was originally bonded to the "F"? I mean,
> if I get its atomic index before the deletion, suppose it won't be
> preserved.
>
> You can set a property on the neighboring atom with something like:
atom.SetProp("F_Neighbor","1")
>
> 1. Similar to #4, but for DeleteSubstructs.
>
> Same answer: you can always use SetProp
> Alternatively, if there is a way to mark atoms, I don't need the atom
> indices anyway.
>
As mentioned, SetProp is great for this. Note that using the property
interface is slower than just relying on the indexing remaining the same,
which you can do in the first two cases.
best regards,
-greg
|
|
From: Ling C. <lin...@gm...> - 2023-06-19 01:00:23
|
Dear colleagues, I got some questions about atom indexing. Just wonder if you could help me? 1. In m3=Chem.CombineMols(m1,m2) , is it guaranteed that the atom indices in m3 is equivalent to the indices in m1 followed by the indices in m2? 2. If I construct an editable mol from m1, is it that the atomic indices in the editable mol is equivalent to that in m1? And when I convert the editable mol back, suppose the atom indexing is also preserved? 3. Same as #2, but for an RWMol instead of an editable mol. 4. If I delete an "F" atom from an editable mol, is there a way to mark the atom in the new mol that was originally bonded to the "F"? I mean, if I get its atomic index before the deletion, suppose it won't be preserved. 5. Similar to #4, but for DeleteSubstructs. Alternatively, if there is a way to mark atoms, I don't need the atom indices anyway. Thank you! Ling |
|
From: Andrew D. <da...@da...> - 2023-06-16 06:20:53
|
On Jun 16, 2023, at 03:15, S Joshua Swamidass <swa...@gm...> wrote: > In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. In other words, it can be drawn in such a way that no edges cross each other. Years ago at http://www.dalkescientific.com/writings/diary/archive/2012/05/18/nonplanar_compounds.html I did a search of a subset of 28.5 million PubChem structures and found 224 topologically non-planar examples, like https://pubchem.ncbi.nlm.nih.gov/compound/50919058 . I also gave some literature citations, like "Synthesis of the first topologically non-planar molecule" (1981) at https://www.sciencedirect.com/science/article/abs/pii/0040403981800779 . > On Jun 15, 2023, at 22:50, S Joshua Swamidass <swa...@gm...> wrote: > > Have any other libraries adopted your approach? It's clever. It isn't my approach. It depends on which part you consider clever. I've been told some of the ideas can be traced back to Bernhard Rohde's PhD thesis, citation 20 in the paper ("who employed a stable numbering for equivalence classes instead of a sequential index"). Rohde is also thanked in the acknowledgements. And Rohde has/had his own in-house library at Novartis which included canonicalization. I wouldn't be surprised if NextMove has their own implementation, given how Roger Sayle is a co-author of the paper. > the covalent bonds of proteins of arbitrary size are a planar graph too, even though most (all?) proteins have a 3D structure. FWIW, due to disulfide bonds in cystines, proteins can be topologically non-planar. https://academic.oup.com/nar/article/47/D1/D367/5223942?login=false give PDB entry 1AOC as an example, with their database entry at https://knotprot.cent.uw.edu.pl/view/1aoc/A/ . Best regards, Andrew da...@da... |
|
From: S J. S. <swa...@gm...> - 2023-06-16 01:16:14
|
Planar graphs are... In graph theory <https://en.wikipedia.org/wiki/Graph_theory>, a *planar graph* is a graph <https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)> that can be embedded <https://en.wikipedia.org/wiki/Graph_embedding> in the plane <https://en.wikipedia.org/wiki/Plane_(geometry)>, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. In other words, it can be drawn in such a way that no edges cross each other. https://en.wikipedia.org/wiki/Planar_graph We aren't talking about whether or not the chemical structure is 2D or 3D. Molecules representable as planar graphs can certainly be 3D molecules that do not have planar structures. For example, cyclohexane is not a planar molecule (vs. benzene for example), but it can trivally be drawn in 2D without any crossing bonds. Likewise, (if ignore disulfide bonds and other gotchas), the covalent bonds of proteins of arbitrary size are a planar graph too, even though most (all?) proteins have a 3D structure. On Thu, Jun 15, 2023 at 8:05 PM Francois Berenger <ml...@li...> wrote: > On 16/06/2023 03:49, S Joshua Swamidass wrote: > > Incidentally, > > > > I came across this O(log N) canonization algorithm for planar graphs: > > https://arxiv.org/pdf/0809.2319.pdf > > > > I wonder if this algorithm can be adapted for chemistry? Molecules are > > usually planar, but I believe they occasionally are "nearly" planar, > > by which I mean planar graphs plus a few edges that break the > > planarity. > > Dear Joshua, > > Some natural product are notoriously complex 3D molecules. > > What do you exactly mean by planar? > > Many chemical groups are 3D: methyl, adamantane, etc. > > > And what (generally speaking) is the algorithm used by rdkit? Do we > > know it's complexity? > > > > On Thu, Jun 15, 2023 at 1:38 PM S Joshua Swamidass > > <swa...@gm...> wrote: > > > >> Andrew, > >> > >> Thanks! According to wikipedia (and my recollections of algorithms > >> class)... > >> "The problem is not known to be solvable in polynomial time [1] nor > >> to be NP-complete [2], and therefore may be in the computational > >> complexity class [3] NP-intermediate [4]." > >> > >> https://en.wikipedia.org/wiki/Graph_isomorphism_problem > >> > >> Your reference though is really helpful. The key phrase seems to be > >> "bounded valence" which is certainly true of molecular graphs. Each > >> atom can only bound some fairly low number of other atoms, i.e. > >> bounded valence. That's probably the reason why we do have a > >> polynomial time algorithm... > >> > >> Thank you! > >> > >> Joshua > >> > >> On Thu, Jun 15, 2023 at 1:21 PM Andrew Dalke > >> <da...@da...> wrote: > >> > >>> On Jun 15, 2023, at 18:20, S Joshua Swamidass > >>> <swa...@gm...> wrote: > >>>> It's well known that the graph-isomorphism problem is NP > >>> > >>> While P is contained in NP, I don't think that's the NP you mean. > >>> > >>> I suspect you may be thinking of subgraph isomorphism, which is > >>> NP-hard. Graph isomorphism may be quasi-polynomial time, if > >>> Babai's (unpublished) claim is correct. > >>> > >>> Also, "Isomorphism of graphs of bounded valence can be tested in > >>> polynomial time" - > >>> https://www.sciencedirect.com/science/article/pii/0022000082900095 > >>> . > >>> > >>>> So here is my question. What are the cases that are very > >>> difficult to canonize a graph? > >>> > >>> As I recall, handling chirality and other non-local properties is > >>> difficult. I have not worked on this problem. > >>> > >>> Cheers, > >>> > >>> Andrew > >>> da...@da... > > > > > > Links: > > ------ > > [1] https://en.wikipedia.org/wiki/Polynomial_time > > [2] https://en.wikipedia.org/wiki/NP-complete > > [3] https://en.wikipedia.org/wiki/Complexity_class > > [4] https://en.wikipedia.org/wiki/NP-intermediate > > _______________________________________________ > > Rdkit-discuss mailing list > > Rdk...@li... > > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
|
From: Francois B. <ml...@li...> - 2023-06-16 01:05:34
|
On 16/06/2023 03:49, S Joshua Swamidass wrote: > Incidentally, > > I came across this O(log N) canonization algorithm for planar graphs: > https://arxiv.org/pdf/0809.2319.pdf > > I wonder if this algorithm can be adapted for chemistry? Molecules are > usually planar, but I believe they occasionally are "nearly" planar, > by which I mean planar graphs plus a few edges that break the > planarity. Dear Joshua, Some natural product are notoriously complex 3D molecules. What do you exactly mean by planar? Many chemical groups are 3D: methyl, adamantane, etc. > And what (generally speaking) is the algorithm used by rdkit? Do we > know it's complexity? > > On Thu, Jun 15, 2023 at 1:38 PM S Joshua Swamidass > <swa...@gm...> wrote: > >> Andrew, >> >> Thanks! According to wikipedia (and my recollections of algorithms >> class)... >> "The problem is not known to be solvable in polynomial time [1] nor >> to be NP-complete [2], and therefore may be in the computational >> complexity class [3] NP-intermediate [4]." >> >> https://en.wikipedia.org/wiki/Graph_isomorphism_problem >> >> Your reference though is really helpful. The key phrase seems to be >> "bounded valence" which is certainly true of molecular graphs. Each >> atom can only bound some fairly low number of other atoms, i.e. >> bounded valence. That's probably the reason why we do have a >> polynomial time algorithm... >> >> Thank you! >> >> Joshua >> >> On Thu, Jun 15, 2023 at 1:21 PM Andrew Dalke >> <da...@da...> wrote: >> >>> On Jun 15, 2023, at 18:20, S Joshua Swamidass >>> <swa...@gm...> wrote: >>>> It's well known that the graph-isomorphism problem is NP >>> >>> While P is contained in NP, I don't think that's the NP you mean. >>> >>> I suspect you may be thinking of subgraph isomorphism, which is >>> NP-hard. Graph isomorphism may be quasi-polynomial time, if >>> Babai's (unpublished) claim is correct. >>> >>> Also, "Isomorphism of graphs of bounded valence can be tested in >>> polynomial time" - >>> https://www.sciencedirect.com/science/article/pii/0022000082900095 >>> . >>> >>>> So here is my question. What are the cases that are very >>> difficult to canonize a graph? >>> >>> As I recall, handling chirality and other non-local properties is >>> difficult. I have not worked on this problem. >>> >>> Cheers, >>> >>> Andrew >>> da...@da... > > > Links: > ------ > [1] https://en.wikipedia.org/wiki/Polynomial_time > [2] https://en.wikipedia.org/wiki/NP-complete > [3] https://en.wikipedia.org/wiki/Complexity_class > [4] https://en.wikipedia.org/wiki/NP-intermediate > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |
|
From: Andrew D. <da...@da...> - 2023-06-15 20:22:13
|
On Jun 15, 2023, at 20:49, S Joshua Swamidass <swa...@gm...> wrote: > > And what (generally speaking) is the algorithm used by rdkit? Do we know it's complexity? https://pubs.acs.org/doi/abs/10.1021/acs.jcim.5b00543 "Get Your Atoms in Order—An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm" Andrew da...@da... |
|
From: Peter S. S. <sh...@gm...> - 2023-06-15 18:58:12
|
Well, if I'm recalling correctly, a highly symmetric structure like buckminsterfullerene takes a long time to canonicalize. I don't know what the formal definition of a planar graph is, but I would guess it's not what chemists mean when they say a molecule is planar. -P. On Thu, Jun 15, 2023 at 2:51 PM S Joshua Swamidass <swa...@gm...> wrote: > Incidentally, > > I came across this O(log N) canonization algorithm for planar graphs: > https://arxiv.org/pdf/0809.2319.pdf > > I wonder if this algorithm can be adapted for chemistry? Molecules are > usually planar, but I believe they occasionally are "nearly" planar, by > which I mean planar graphs plus a few edges that break the planarity. > > And what (generally speaking) is the algorithm used by rdkit? Do we know > it's complexity? > > > > > On Thu, Jun 15, 2023 at 1:38 PM S Joshua Swamidass <swa...@gm...> > wrote: > >> Andrew, >> >> Thanks! According to wikipedia (and my recollections of algorithms >> class)... >> "The problem is not known to be solvable in polynomial time >> <https://en.wikipedia.org/wiki/Polynomial_time> nor to be NP-complete >> <https://en.wikipedia.org/wiki/NP-complete>, and therefore may be in the >> computational complexity class >> <https://en.wikipedia.org/wiki/Complexity_class> NP-intermediate >> <https://en.wikipedia.org/wiki/NP-intermediate>." >> https://en.wikipedia.org/wiki/Graph_isomorphism_problem >> >> Your reference though is really helpful. The key phrase seems to be >> "bounded valence" which is certainly true of molecular graphs. Each atom >> can only bound some fairly low number of other atoms, i.e. bounded valence. >> That's probably the reason why we do have a polynomial time algorithm... >> >> Thank you! >> >> Joshua >> >> On Thu, Jun 15, 2023 at 1:21 PM Andrew Dalke <da...@da...> >> wrote: >> >>> On Jun 15, 2023, at 18:20, S Joshua Swamidass <swa...@gm...> >>> wrote: >>> > It's well known that the graph-isomorphism problem is NP >>> >>> While P is contained in NP, I don't think that's the NP you mean. >>> >>> I suspect you may be thinking of subgraph isomorphism, which is NP-hard. >>> Graph isomorphism may be quasi-polynomial time, if Babai's (unpublished) >>> claim is correct. >>> >>> Also, "Isomorphism of graphs of bounded valence can be tested in >>> polynomial time" - >>> https://www.sciencedirect.com/science/article/pii/0022000082900095 . >>> >>> >>> > So here is my question. What are the cases that are very difficult to >>> canonize a graph? >>> >>> As I recall, handling chirality and other non-local properties is >>> difficult. I have not worked on this problem. >>> >>> Cheers, >>> >>> Andrew >>> da...@da... >>> >>> >>> _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |
|
From: S J. S. <swa...@gm...> - 2023-06-15 18:49:30
|
Incidentally, I came across this O(log N) canonization algorithm for planar graphs: https://arxiv.org/pdf/0809.2319.pdf I wonder if this algorithm can be adapted for chemistry? Molecules are usually planar, but I believe they occasionally are "nearly" planar, by which I mean planar graphs plus a few edges that break the planarity. And what (generally speaking) is the algorithm used by rdkit? Do we know it's complexity? On Thu, Jun 15, 2023 at 1:38 PM S Joshua Swamidass <swa...@gm...> wrote: > Andrew, > > Thanks! According to wikipedia (and my recollections of algorithms > class)... > "The problem is not known to be solvable in polynomial time > <https://en.wikipedia.org/wiki/Polynomial_time> nor to be NP-complete > <https://en.wikipedia.org/wiki/NP-complete>, and therefore may be in the > computational complexity class > <https://en.wikipedia.org/wiki/Complexity_class> NP-intermediate > <https://en.wikipedia.org/wiki/NP-intermediate>." > https://en.wikipedia.org/wiki/Graph_isomorphism_problem > > Your reference though is really helpful. The key phrase seems to be > "bounded valence" which is certainly true of molecular graphs. Each atom > can only bound some fairly low number of other atoms, i.e. bounded valence. > That's probably the reason why we do have a polynomial time algorithm... > > Thank you! > > Joshua > > On Thu, Jun 15, 2023 at 1:21 PM Andrew Dalke <da...@da...> > wrote: > >> On Jun 15, 2023, at 18:20, S Joshua Swamidass <swa...@gm...> >> wrote: >> > It's well known that the graph-isomorphism problem is NP >> >> While P is contained in NP, I don't think that's the NP you mean. >> >> I suspect you may be thinking of subgraph isomorphism, which is NP-hard. >> Graph isomorphism may be quasi-polynomial time, if Babai's (unpublished) >> claim is correct. >> >> Also, "Isomorphism of graphs of bounded valence can be tested in >> polynomial time" - >> https://www.sciencedirect.com/science/article/pii/0022000082900095 . >> >> >> > So here is my question. What are the cases that are very difficult to >> canonize a graph? >> >> As I recall, handling chirality and other non-local properties is >> difficult. I have not worked on this problem. >> >> Cheers, >> >> Andrew >> da...@da... >> >> >> |
|
From: S J. S. <swa...@gm...> - 2023-06-15 18:39:00
|
Andrew, Thanks! According to wikipedia (and my recollections of algorithms class)... "The problem is not known to be solvable in polynomial time <https://en.wikipedia.org/wiki/Polynomial_time> nor to be NP-complete <https://en.wikipedia.org/wiki/NP-complete>, and therefore may be in the computational complexity class <https://en.wikipedia.org/wiki/Complexity_class> NP-intermediate <https://en.wikipedia.org/wiki/NP-intermediate>." https://en.wikipedia.org/wiki/Graph_isomorphism_problem Your reference though is really helpful. The key phrase seems to be "bounded valence" which is certainly true of molecular graphs. Each atom can only bound some fairly low number of other atoms, i.e. bounded valence. That's probably the reason why we do have a polynomial time algorithm... Thank you! Joshua On Thu, Jun 15, 2023 at 1:21 PM Andrew Dalke <da...@da...> wrote: > On Jun 15, 2023, at 18:20, S Joshua Swamidass <swa...@gm...> wrote: > > It's well known that the graph-isomorphism problem is NP > > While P is contained in NP, I don't think that's the NP you mean. > > I suspect you may be thinking of subgraph isomorphism, which is NP-hard. > Graph isomorphism may be quasi-polynomial time, if Babai's (unpublished) > claim is correct. > > Also, "Isomorphism of graphs of bounded valence can be tested in > polynomial time" - > https://www.sciencedirect.com/science/article/pii/0022000082900095 . > > > > So here is my question. What are the cases that are very difficult to > canonize a graph? > > As I recall, handling chirality and other non-local properties is > difficult. I have not worked on this problem. > > Cheers, > > Andrew > da...@da... > > > |
|
From: Andrew D. <da...@da...> - 2023-06-15 18:21:47
|
On Jun 15, 2023, at 18:20, S Joshua Swamidass <swa...@gm...> wrote: > It's well known that the graph-isomorphism problem is NP While P is contained in NP, I don't think that's the NP you mean. I suspect you may be thinking of subgraph isomorphism, which is NP-hard. Graph isomorphism may be quasi-polynomial time, if Babai's (unpublished) claim is correct. Also, "Isomorphism of graphs of bounded valence can be tested in polynomial time" - https://www.sciencedirect.com/science/article/pii/0022000082900095 . > So here is my question. What are the cases that are very difficult to canonize a graph? As I recall, handling chirality and other non-local properties is difficult. I have not worked on this problem. Cheers, Andrew da...@da... |
|
From: S J. S. <swa...@gm...> - 2023-06-15 16:20:26
|
Hello, I had a theoretical question I thought the RDkit team might have some insight on. It's well known that the graph-isomorphism problem is NP, which means that for some examples run-time will be worse than polynomial using known algorithms. This problem is connected to molecule canonization. By comparing canonical SMILES strings, we very efficiently determine whether or not two molecular graphs are isomorphic or not. That means canonizing SMILES graphs is either: 1. potentially very costly in some cases. AND/OR 2. fails to produce a unique string for particular graph structures. So here is my question. What are the cases that are very difficult to canonize a graph? By extension, are any particular classes of real molecules difficult or impossible to canonize? If not, is that because there are some restrictions on molecule graphs that can guarantee we avoid the difficult cases (e.g. limited branching factor, or that molecules are nearly planar?)? Thanks for considering my question. *S. Joshua Swamidass M.D. Ph.D.* http://swami.wustl.edu/ Associate Professor, Laboratory and Genomic Medicine Associate Professor, Biomedical Engineering Washington University in St. Louis Administrator: Lori Scantlan <*lls...@wu... <lls...@wu...>*> |