Re: [Rdkit-discuss] Matching Generalized Compounds
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Paolo T. <pao...@gm...> - 2018-08-24 18:03:09
|
Dear Kovas,
you should be able to achieve what you need applying the following patch
and rebuilding the RDKit:
--- Code/GraphMol/Atom.cpp 2018-08-23 19:33:34.669598140 +0100
+++ Code/GraphMol/Atom.cpp 2018-08-24 19:02:18.308912142 +0100
@@ -432,7 +432,8 @@
bool Atom::Match(Atom const *what) const {
PRECONDITION(what, "bad query atom");
- bool res = getAtomicNum() == what->getAtomicNum();
+ bool res = getAtomicNum() == what->getAtomicNum()
+ || ((!getAtomicNum() && hasQuery()) || (!what->getAtomicNum() &&
what->hasQuery()));
// special dummy--dummy match case:
// [*] matches [*],[1*],[2*],etc.
This change does not break any existing unit tests, and does what you need:
from rdkit import Chem
from rdkit.Chem import rdFMCS
from rdkit.Chem.Draw import IPythonConsole
m1 = Chem.MolFromSmiles('[*:1][CH2:2][C:3]([CH3:4])=[CH2:5]')
m2 = Chem.MolFromSmiles('[F:11][CH2:12][C:13]([*:14])=[CH2:15]')
m1
m2
qp = Chem.AdjustQueryParameters()
qp.makeDummiesQueries = True
m1 = Chem.AdjustQueryProperties(m1, qp)
m2 = Chem.AdjustQueryProperties(m2, qp)
m1.GetSubstructMatches(m2)
((0, 1, 2, 3, 4),)
m2.GetSubstructMatches(m1)
((0, 1, 2, 3, 4),)
HTH, cheers
p.
On 08/23/18 18:20, Kovas Palunas wrote:
>
> Thanks for the feedback and code example!
>
> I understand that it works to make a third query mol using MCS that
> matches both the original mols to then match with. However, this
> seems like overkill (overly expensive) for this particular problem –
> as I understand it MCS can be very expensive depending on the
> compounds you are comparing. Would it not work to simply override the
> atom.Match function with one that will always match dummies no matter
> what the other atom is? I am not planning to compare SMARTSy queries
> with my matching with any complexity beyond simply dummy atoms. In
> fact, as I understand it, my example compounds are not made up of any
> query atoms when they are read into rdkit – the dummies are just made
> into queries after the read by the QueryParameters code. I am
> definitely not interested in doing generic query to query matching.
>
> - Kovas
>
> *From: *Christos Kannas <chr...@gm...>
> *Date: *Thursday, August 23, 2018 at 7:53 AM
> *To: *Kovas Palunas <kov...@ar...>
> *Cc: *RDKit <rdk...@li...>, Paolo Tosco
> <pao...@gm...>
> *Subject: *Re: [Rdkit-discuss] Matching Generalized Compounds
>
> Hi Kovas,
>
> You have two fuzzy compounds that you try to match them, because our
> intuition says that any atom notation [*:1] from m1 should match the
> Fluorine [F:11] in m2 and any atom [*:14] in m2 should match Carbon
> [CH3:4] in m1.
>
> The issue here is that you create two query compounds from m1 and m2
> which will match their own specific substructures. Query to query
> matching is not trivial.
>
> In order to do what you want you need a query compound that combines
> their characteristic, which is what Paolo showed.
>
> Paolo with MCS and modifying atom properties created that query
> compound '[*:1]-[CH2:2]-[C:3](-[*:4])=[CH2:5]' or
> '[*:1]-[CH2X4:2]-[CX3:3](-[*:4])=[CH2X3:5]'
>
> Also bare in mind that Paolo's approach changed the starting
> compounds, as now they resemble the generic query compound that
> combines their fuzzy atoms.
>
> https://gist.github.com/CKannas/ac1a4791dec909552d7c8899cfaff030
>
> Best,
>
> Christos
>
> Christos Kannas
>
> Chem[o]informatics Researcher & Software Developer
>
> Image removed by sender. View Christos Kannas's profile on LinkedIn
> <http://cy.linkedin.com/in/christoskannas>
>
> On Thu, 23 Aug 2018 at 12:36, Paolo Tosco <pao...@gm...
> <mailto:pao...@gm...>> wrote:
>
> Dear Kovas,
>
> It looks like GetSubstructMatch() only finds a match if the dummy
> atom is in the query, not if it is in the molecule they you are
> matching the query against.
>
> This notebook present a possible solution off the top of my head:
>
> https://gist.github.com/ptosco/a35ac28a14103b47096f6d6af1aec831
>
> which does not involve changes to the C++ layer, even though it is
> computationally more expensive and will fail with disconnected
> fragments as it uses FindMCS(). There may be better solutions -
> this is what I came out with yesterday night in the little time I
> had available.
>
> Cheers,
> P.
>
> On 08/22/18 19:34, Kovas Palunas wrote:
>
> Hi All,
>
> I’m interested in having GetSubstructMatches return non-“null”
> results in the following example. The results should lead to
> a match where atom 1 maps to atom 11, 2 to 12, etc.
>
> m1 = Chem.MolFromSmiles('[*:1][CH2:2][C:3]([CH3:4])=[CH2:5]')
>
> m2 = Chem.MolFromSmiles('[F:11][CH2:12][C:13]([*:14])=[CH2:15]')
>
> ### do something here so that the mols will match ###
>
> qp = Chem.AdjustQueryParameters()
>
> qp.makeDummiesQueries = True
>
> m1 = Chem.AdjustQueryProperties(m1, qp)
>
> m2 = Chem.AdjustQueryProperties(m2, qp)
>
> # I’d like both of the following to return results
>
> m1.GetSubstructMatches(m2)
>
> m2.GetSubstructMatches(m1)
>
> My understanding of why these mols currently do not match is
> as follows: because only the dummy atoms are made queries
> (based on my query parameter adjustment), when one mol is
> matched to another dummy 1 may match to F:11, but dummy 14
> will then not match to methyl:14. This is because (as I
> understand), normal atoms can only be matched by queries, and
> cannot match them themselves.
>
> Potential ideas to make this work as I’d like:
>
> 1. Override atom.Match in the python code – not sure that
> this would work since the C++ version of this function is
> what would be called during GetSubstructMatches
> 2. Override atom.Match in the C++ code – not quite sure how
> to do this, or what side affects it might have. Ideally
> the changes I make would only affect this example (and
> other similar ones)
> 3. Make all atoms in both molecules QueryAtoms, but otherwise
> leave them unchanged. I’m not quite sure how to do this!
>
> Does anyone have any ideas for what the best approach here
> would be, or knows if there is already built in functionality
> for something like this? I’d prefer to not use SMARTS to
> construct my molecules if possible, since I don’t really think
> of them as queries, just as other molecules in the system that
> happen to not be fully specified.
>
> - Kovas
>
>
>
> ------------------------------------------------------------------------------
>
> Check out the vibrant tech community on one of the world's most
>
> engaging tech sites, Slashdot.org!http://sdm.link/slashdot
>
>
>
> _______________________________________________
>
> Rdkit-discuss mailing list
>
> Rdk...@li...
> <mailto:Rdk...@li...>
>
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> <http://sdm.link/slashdot>_______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> <mailto:Rdk...@li...>
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
|