Re: [Rdkit-discuss] Mapping back Maximum Common Subgraph
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Brian K. <fus...@gm...> - 2016-02-25 11:27:46
|
I meant the smarts not matching the atom comparisons might be a bug, the "or" approach for combining atoms and bonds is much cleaner and makes a lot more sense. I can see use cases for both actually. ---- Brian Kelley > On Feb 25, 2016, at 1:31 AM, Greg Landrum <gre...@gm...> wrote: > > > >> On Thu, Feb 25, 2016 at 2:16 AM, Brian Kelley <fus...@gm...> wrote: >> It is hard to tell if this is a bug or not, however: >> >> atomCompare=rdFMCS.AtomCompare.CompareAny, >> bondCompare=rdFMCS.BondCompare.CompareAny, >> >> Means that any atom matches any other atom and any bond matches any other bond. The smarts being returned does not have the appropriate wildcards. >> >> '[#6](:[#6](-[#6]):[#7]:[#7]:[#6]=[#8]):[#6]' >> >> The mcs you actually computed should have been something like: >> >> mcs = Chem.MolFromSmarts('[*](~[*](~[*])~[*]~[*]~[*]~[*])~[*]') >> print moli_noh.GetSubstructMatch(mcs) >> print molj_noh.GetSubstructMatch(mcs) >> >> (0, 1, 7, 9, 10, 2, 11, 12) >> (0, 1, 6, 8, 9, 5, 3, 2) >> >> >> So it looks like the smarts generation portion of the MCS code doesn't apply the rules of the mcs matcher. Bug? Maybe :) > > Certainly not. It's a feature, and one I really like. There is definitely a bug somewhere that's causing the problems with Gaetano's example, but here's a small example that shows how it's supposed to work: > > In [21]: mol1 = Chem.MolFromSmiles('Cc1nc(O)ccc1') > > In [22]: mol2 = Chem.MolFromSmiles('Cc1cc(O)ccc1') > > In [23]: mcs = rdFMCS.FindMCS([mol1,mol2], > timeout=20, > atomCompare=rdFMCS.AtomCompare.CompareAny, > bondCompare=rdFMCS.BondCompare.CompareAny, > matchValences=False, > ringMatchesRingOnly=True, > completeRingsOnly=False, > matchChiralTag=False) > > In [25]: mcs.smartsString > Out[25]: '[#6]-[#6]1:[#7,#6]:[#6](-[#8]):[#6]:[#6]:[#6]:1' > > The interesting atom here is the third one. When the MCS is found the identity of the atoms is ignored while determining whether or not they match each other, but the actual maximum common substructure of those two molecules has either a N or a C as the third atom. This is what the SMARTS tells you. > > Gaetano, you have found a bug. We'll look into it. > > -greg > > |