Thread: [Joelib-devel] Atom mapping in substructure searching

Brought to you by: hinselma, nhfechner, wegner

joelib-devel

[Joelib-devel] Atom mapping in substructure searching

From: Peter Murray-R. <pm...@ca...> - 2005-03-30 12:41:54

Crossposted to 3 lists. Please reply carefully.

We have a requirement to extract atom-atom mapping from a substructure 
search (or a maximal common subgraph).  It's because our MACiE database 
contains reactions without atom-atom maps. We have the complete products 
and reactions for each step in CMLReact and want to either create a map or 
relabel the atoms consistently. Since these are multistep reactions the 
products of step N and usually the reactants of stepN+1 though some 
components may be missing.

I've used obgrep (SMARTS) but only to give a boolean answer - which 
molecules in a list have a given substructure. I used CDK a long time ago 
for maximal common subgraph, but that was based on bonds, not atoms and I 
haven't used it since. I haven't used JOELib.

If I have something like A + B => C + D + E
(1) which of the tools can be easily configured to output one or more 
substructure mappings (including null) from A=>C, A=>D, A=>E. B=>C, B=>D, 
B=>E. And the reverse C=>A, C=>B, etc. . This only works when one species 
is a precise substructure of another.
(2) Maximal common subgraph mapping A<=>C, A<=>D, etc.
(2) ditto, but for (A+B) <=> (C+D+E) without overlaps.

Performance is not an issue - this only (we hope) has to be done once as 
then we shall use JChempaint as the entry tool. Even a webservice would do. 
Nor (in principle) is semistructured output though we'd prefer XML or a 
clear API. But we'd prefer not to have to write new code if possible.

Hope this makes sense. The only alternative is to reinput hundreds of 
reactions by hand :-(

P.



Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069 Fax: +44 1223 763076

Re: [Joelib-devel] Atom mapping in substructure searching

From: Joerg K. W. <we...@in...> - 2005-03-30 13:11:09

Hi all,

JOELib has an unpublished extended module for MCS, and the publication 
is still in the review process for the QSAR&Comb.Sci. We can also deal 
with physicochemical atom properties, which can make sense for 
reactions. The method returns all, full combinatorial and/or unique, 
MCS, so there are a lot of index lists for two molecules.

graph isomorphism(1) is a special case of maximum subgraph 
isomorphism(2) so this is no problem. I am not sure if I understand (3)

Are there plans for a publication? This module is a real developer 
version and looks also like a developer version and I am ambigious to 
share it, until it was not published.
You know GPL, means sharing all or nothing.

Kind regards, Joerg

> Crossposted to 3 lists. Please reply carefully.
> 
> We have a requirement to extract atom-atom mapping from a substructure 
> search (or a maximal common subgraph).  It's because our MACiE database 
> contains reactions without atom-atom maps. We have the complete products 
> and reactions for each step in CMLReact and want to either create a map 
> or relabel the atoms consistently. Since these are multistep reactions 
> the products of step N and usually the reactants of stepN+1 though some 
> components may be missing.
> 
> I've used obgrep (SMARTS) but only to give a boolean answer - which 
> molecules in a list have a given substructure. I used CDK a long time 
> ago for maximal common subgraph, but that was based on bonds, not atoms 
> and I haven't used it since. I haven't used JOELib.
> 
> If I have something like A + B => C + D + E
> (1) which of the tools can be easily configured to output one or more 
> substructure mappings (including null) from A=>C, A=>D, A=>E. B=>C, 
> B=>D, B=>E. And the reverse C=>A, C=>B, etc. . This only works when one 
> species is a precise substructure of another.
> (2) Maximal common subgraph mapping A<=>C, A<=>D, etc.
> (2) ditto, but for (A+B) <=> (C+D+E) without overlaps.
> 
> Performance is not an issue - this only (we hope) has to be done once as 
> then we shall use JChempaint as the entry tool. Even a webservice would 
> do. Nor (in principle) is semistructured output though we'd prefer XML 
> or a clear API. But we'd prefer not to have to write new code if possible.
> 
> Hope this makes sense. The only alternative is to reinput hundreds of 
> reactions by hand :-(
> 
> P.
> 
> 
> 
> Peter Murray-Rust
> Unilever Centre for Molecular Informatics
> Chemistry Department, Cambridge University
> Lensfield Road, CAMBRIDGE, CB2 1EW, UK
> Tel: +44-1223-763069 Fax: +44 1223 763076
> 
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Joelib-devel mailing list
> Joe...@li...
> https://lists.sourceforge.net/lists/listinfo/joelib-devel
> 


-- 
Dipl. Chem. Joerg K. Wegner
Center of Bioinformatics Tuebingen (ZBIT)
Department of Computer Architecture
Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany
Phone: (+49/0) 7071 29 78970
Fax: (+49/0) 7071 29 5091
E-Mail: mailto:we...@in...
WWW:    http://www-ra.informatik.uni-tuebingen.de
--
Never mistake motion for action.
                                     (E. Hemingway)

Never mistake action for meaningful action.
                                (Hugo Kubinyi,2004)

Re: [Joelib-devel] Atom mapping in substructure searching

From: Peter Murray-R. <pm...@ca...> - 2005-03-30 13:28:00

At 14:10 30/03/2005, Joerg K. Wegner wrote:
>Hi all,
>
>JOELib has an unpublished extended module for MCS, and the publication is 
>still in the review process for the QSAR&Comb.Sci.

Understood. Of course if we were in any other discipline than chemistry you 
could have posted the preprint...

>We can also deal with physicochemical atom properties, which can make 
>sense for reactions.

Do you mean more than formal charges and isotopes? Ore atom subtyping? Most 
other properties will be doubles so cannot match exactly and there is some 
judgment.

>The method returns all, full combinatorial and/or unique, MCS, so there 
>are a lot of index lists for two molecules.

Looks very useful.

>graph isomorphism(1) is a special case of maximum subgraph isomorphism(2) 
>so this is no problem.

I listed them separately because other implementations might not have MCS

>  I am not sure if I understand (3)

this requires the disjoint graphs in the reactants to be mapped to the 
disjoint graphs in the products. For example a water molecule will probably 
map to all products. However if many of the O atoms have already been 
mapped, then it has less possibilities. IOW it is a mapping of graphLists 
(forests), not just graphs.

>Are there plans for a publication?

Yes, but we cannot promise timescales or the type of publication(s).

>This module is a real developer version and looks also like a developer 
>version and I am ambigious to share it, until it was not published.
>You know GPL, means sharing all or nothing.

Indeed. And probably technically you have to send us the full source and so on.

That is an advantage of WS - you can share the functionality without 
releasing the code.

P.

Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069 Fax: +44 1223 763076

Re: [Joelib-devel] Atom mapping in substructure searching

From: Christoph S. <c.s...@un...> - 2005-03-30 14:10:02

Sorry for not answering this question comprehensively,
I just want to point out that we have convenience methods for getting=20
the atom-to-atom mapping in CDK MCSS, not just bond maps.

> (1) which of the tools can be easily configured to output one or more=20
> substructure mappings (including null) from A=3D>C, A=3D>D, A=3D>E. B=3D=
>C,=20
> B=3D>D, B=3D>E. And the reverse C=3D>A, C=3D>B, etc. . This only works =
when one=20
> species is a precise substructure of another.
> (2) Maximal common subgraph mapping A<=3D>C, A<=3D>D, etc.

I don't see why the
org.openscience.cdk.isomorphism.UniversalIsomorphismTester should not be=20
able to give you exactly what you ask for in (1) and (2), as atom-atom=20
mappings.

> (3) ditto, but for (A+B) <=3D> (C+D+E) without overlaps.

Don't quite get this one.
Suppose you put AtomContainer A and B into a new AtomContainer "educts",=20
which yields a disconnected graph and the same with putting C and D and=20
E into a new AtomContainer "products". You would then do a MCSS of both?!=
?

Cheers,

Chris

--=20
Priv. Doz. Dr. Christoph Steinbeck (c.s...@un...)
Head of the Research Group for Molecular Informatics
Cologne University BioInformatics Center (http://www.cubic.uni-koeln.de)
Z=FClpicher Str. 47, 50674 Cologne
Tel: +49(0)221-470-7426   Fax: +49 (0) 221-470-7786

What is man but that lofty spirit - that sense of enterprise.
... Kirk, "I, Mudd," stardate 4513.3..

Re: [Joelib-devel] Atom mapping in substructure searching

From: Peter Murray-R. <pm...@ca...> - 2005-03-30 14:25:21

At 15:09 30/03/2005, Christoph Steinbeck wrote:
>Sorry for not answering this question comprehensively,
>I just want to point out that we have convenience methods for getting the=
=20
>atom-to-atom mapping in CDK MCSS, not just bond maps.

Thanks - that's great. Wasn't aware that those were now in.

Are there any examples of code so we can get off the ground quickly?

>>(1) which of the tools can be easily configured to output one or more=20
>>substructure mappings (including null) from A=3D>C, A=3D>D, A=3D>E. B=3D>C=
, B=3D>D,=20
>>B=3D>E. And the reverse C=3D>A, C=3D>B, etc. . This only works when one=
 species=20
>>is a precise substructure of another.
>>(2) Maximal common subgraph mapping A<=3D>C, A<=3D>D, etc.
>
>I don't see why the
>org.openscience.cdk.isomorphism.UniversalIsomorphismTester should not be=20
>able to give you exactly what you ask for in (1) and (2), as atom-atom=20
>mappings.

Excellent

>>(3) ditto, but for (A+B) <=3D> (C+D+E) without overlaps.
>
>Don't quite get this one.
>Suppose you put AtomContainer A and B into a new AtomContainer "educts",=20
>which yields a disconnected graph and the same with putting C and D and E=
=20
>into a new AtomContainer "products". You would then do a MCSS of both?!?

Maybe this is rubbish, but consider the reaction:
CCC(=3DO)OCC+ O -> CCC(=3DO)O + CCO
If you do a MCS of O vs CCC(=3DO)OC you will get 2 hits. Only one is=
 "useful"=20
and you don't know which. However if you require that all MCS are=20
simultaneously matched then the O will match what is left after the larger=
=20
fragments have been matched.

In many (not all) reactions we have all the atoms in the reaction so it=20
makes sens to map them all at once - it's a strong constraint.

P.


>Cheers,
>
>Chris
>
>--
>Priv. Doz. Dr. Christoph Steinbeck (c.s...@un...)
>Head of the Research Group for Molecular Informatics
>Cologne University BioInformatics Center (http://www.cubic.uni-koeln.de)
>Z=FClpicher Str. 47, 50674 Cologne
>Tel: +49(0)221-470-7426   Fax: +49 (0) 221-470-7786
>
>What is man but that lofty spirit - that sense of enterprise.
>... Kirk, "I, Mudd," stardate 4513.3..

Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069 Fax: +44 1223 763076