Re: [Rdkit-discuss] SMARTS/SMARTS and SMILES/SMARTS substructure matching
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Greg L. <gre...@gm...> - 2014-03-05 04:44:33
|
Hi Christos, On Tue, Mar 4, 2014 at 3:46 PM, Christos Kannas <chr...@gm...>wrote: > Hi all, > > Why does the following happen? > > In [1]: from rdkit import Chem > In [2]: from rdkit.Chem import AllChem > In [3]: from rdkit.Chem import Draw > > In [4]: patt = Chem.MolFromSmarts("[CH;D2;!$(C-[!#6;!#1])]=O") > > In [5]: z2 = Chem.MolFromSmarts("[*]-C-C([H])(=O)", 1) > In [6]: print Chem.MolToSmiles(z2) > [*]CC=O > In [7]: print Chem.MolToSmarts(z2) > *-C-[C&!H0]=O > In [9]: z2.HasSubstructMatch(patt) > Out[9]: False > > In [10]: z3 = Chem.MolFromSmiles(Chem.MolToSmiles(z2)) > In [11]: print Chem.MolToSmiles(z3) > [*]CC=O > In [12]: print Chem.MolToSmarts(z3) > [*]-[#6]-[#6]=[#8] > In [13]: z3.HasSubstructMatch(patt) > Out[13]: True > > Shouldn't be that z2 and z3 have the same information? > The way SMARTS/SMARTS matches is handled is different than the way SMARTS/SMILES matches works. The short answer is that when doing a SMARTS/SMARTS match, the RDKit compares the queries to each other; when doing a SMARTS/SMILES match, on the other hand, it checks to see if the atoms in the SMILES molecule match the queries in the SMARTS molecule. A bit longer answer: Molecules built using MolFromSmiles contain Atoms, molecules built using MolFromSmarts contain QueryAtoms. Both atoms and QueryAtoms have a Match() method that takes another Atom or QueryAtom as an argument and returns whether or not the two match. The substructure matching code makes heavy use of this Match() method. QueryAtom.Match(Atom) checks to see if the Atom satisfies the query. QueryAtom.Match(QueryAtom) checks to see if the queries on the atoms are the same. This uses a crude approach that is easy to fool, but I assume that a SMARTS-SMARTS match is not a frequent thing someone wants to do. query-query matching is also not a particularly easy problem to solve in a general way. -greg |