Re: [Rdkit-discuss] ReplaceSubstructs
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Paolo T. <pao...@gm...> - 2021-10-25 10:05:44
|
Hi Ling, the way ReplaceSubstructs works is that: 1) the substructure match is uniquified (i.e., a single match is returned for each set of matching atoms). Therefore, even though pyridine due to its symmetry matches twice, a single match is returned. 2) only the bond between first atom in the uniquified match and the remainder of the molecule is restored 3) Results are not sanitized (hence the kekulization error) Therefore, in your first example the first atom in the pyridine match is indeed connected to the imidazole core: c = "c1cccnc1" s1 = "Cn1cnc(-c2cccnc2)c1" ^ | atom 0 is connected to imidazole | c1cccnc1 so, after sanitization, the only result you get back is astatine connected to imidazole: u = AllChem.ReplaceSubstructs(ms1, mc, bh) for ui in u: Chem.SanitizeMol(ui) print(",".join(Chem.MolToSmiles(ui) for ui in u)) Cn1cnc([At])c1 In your second example, the first atom in the pyridine match is not connected to the thiazole core: c = "c1cccnc1" s1 = "c1cncc(-c2nccs2)c1" ^ | atom 0 of the pyridine match is not connected to thiazole | c1cccnc1 so, after sanitization, the only result you get back is disconnected astatine and thiazole: u = AllChem.ReplaceSubstructs(ms1, mc, bh) for ui in u: Chem.SanitizeMol(ui) print(",".join(Chem.MolToSmiles(ui) for ui in u)) [At].c1cscn1 If you reorder the atoms in your second example such that the first atom in the pyridine match is connected to the thiazole core, you will get the expected result: c = "c1cccnc1" s1 = "s1ccnc1c1cnccc1" ^ | atom 0 of the pyridine match is connected to thiazole | c1cccnc1 so, after sanitization, the only result you get back is astatine connected to thiazole: u = AllChem.ReplaceSubstructs(ms1, mc, bh) for ui in u: Chem.SanitizeMol(ui) print(",".join(Chem.MolToSmiles(ui) for ui in u)) [At]c1nccs1 That said, I do not think the way ReplaceSubstructs works at the moment, though it can be rationalized as shown above, is optimal. I will try and come up with a PR implementing a more useful behavior and see what others think. Cheers, p. On Mon, Oct 25, 2021 at 7:28 AM Ling Chan <lin...@gm...> wrote: > Dear colleagues, > > I would like to replace a pyridine group with an "At" atom, to denote that > it is an R group. To illustrate, > [image: Screenshot from 2021-10-24 21-46-00.png] > For my purpose, this is good, although I was expecting two solutions. As > per the discussion at > > https://www.mail-archive.com/rdk...@li.../msg09680.html > , since there are two ways to match the pyridine ring, there should be one > more result, corresponding to "[At].Cn1cncc1" (two pieces) > > However, when I try it on another molecule, it does not work. Here is the > input. > [image: Screenshot from 2021-10-24 21-47-56.png] > When I tried to draw the result, it crashed, due to kekulization problem. > [image: Screenshot from 2021-10-24 21-49-22.png] > I could, however, get the smiles string. > [image: Screenshot from 2021-10-24 21-50-32.png] > I was expecting two results, with one comprising one piece and one > comprising two pieces. But now only the two-pieces result was given. > [image: Screenshot from 2021-10-24 22-14-24.png] > Just wonder why does it not give two results? (Actually for my purpose, as > long as it gives the one-piece result, it's good.) > > Thank you for your ideas. > > Ling Chan > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |