From: Peter Murray-R. <pm...@ca...> - 2005-03-30 12:41:54
|
Crossposted to 3 lists. Please reply carefully. We have a requirement to extract atom-atom mapping from a substructure search (or a maximal common subgraph). It's because our MACiE database contains reactions without atom-atom maps. We have the complete products and reactions for each step in CMLReact and want to either create a map or relabel the atoms consistently. Since these are multistep reactions the products of step N and usually the reactants of stepN+1 though some components may be missing. I've used obgrep (SMARTS) but only to give a boolean answer - which molecules in a list have a given substructure. I used CDK a long time ago for maximal common subgraph, but that was based on bonds, not atoms and I haven't used it since. I haven't used JOELib. If I have something like A + B => C + D + E (1) which of the tools can be easily configured to output one or more substructure mappings (including null) from A=>C, A=>D, A=>E. B=>C, B=>D, B=>E. And the reverse C=>A, C=>B, etc. . This only works when one species is a precise substructure of another. (2) Maximal common subgraph mapping A<=>C, A<=>D, etc. (2) ditto, but for (A+B) <=> (C+D+E) without overlaps. Performance is not an issue - this only (we hope) has to be done once as then we shall use JChempaint as the entry tool. Even a webservice would do. Nor (in principle) is semistructured output though we'd prefer XML or a clear API. But we'd prefer not to have to write new code if possible. Hope this makes sense. The only alternative is to reinput hundreds of reactions by hand :-( P. Peter Murray-Rust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 Fax: +44 1223 763076 |
From: Joerg K. W. <we...@in...> - 2005-03-30 13:11:09
|
Hi all, JOELib has an unpublished extended module for MCS, and the publication is still in the review process for the QSAR&Comb.Sci. We can also deal with physicochemical atom properties, which can make sense for reactions. The method returns all, full combinatorial and/or unique, MCS, so there are a lot of index lists for two molecules. graph isomorphism(1) is a special case of maximum subgraph isomorphism(2) so this is no problem. I am not sure if I understand (3) Are there plans for a publication? This module is a real developer version and looks also like a developer version and I am ambigious to share it, until it was not published. You know GPL, means sharing all or nothing. Kind regards, Joerg > Crossposted to 3 lists. Please reply carefully. > > We have a requirement to extract atom-atom mapping from a substructure > search (or a maximal common subgraph). It's because our MACiE database > contains reactions without atom-atom maps. We have the complete products > and reactions for each step in CMLReact and want to either create a map > or relabel the atoms consistently. Since these are multistep reactions > the products of step N and usually the reactants of stepN+1 though some > components may be missing. > > I've used obgrep (SMARTS) but only to give a boolean answer - which > molecules in a list have a given substructure. I used CDK a long time > ago for maximal common subgraph, but that was based on bonds, not atoms > and I haven't used it since. I haven't used JOELib. > > If I have something like A + B => C + D + E > (1) which of the tools can be easily configured to output one or more > substructure mappings (including null) from A=>C, A=>D, A=>E. B=>C, > B=>D, B=>E. And the reverse C=>A, C=>B, etc. . This only works when one > species is a precise substructure of another. > (2) Maximal common subgraph mapping A<=>C, A<=>D, etc. > (2) ditto, but for (A+B) <=> (C+D+E) without overlaps. > > Performance is not an issue - this only (we hope) has to be done once as > then we shall use JChempaint as the entry tool. Even a webservice would > do. Nor (in principle) is semistructured output though we'd prefer XML > or a clear API. But we'd prefer not to have to write new code if possible. > > Hope this makes sense. The only alternative is to reinput hundreds of > reactions by hand :-( > > P. > > > > Peter Murray-Rust > Unilever Centre for Molecular Informatics > Chemistry Department, Cambridge University > Lensfield Road, CAMBRIDGE, CB2 1EW, UK > Tel: +44-1223-763069 Fax: +44 1223 763076 > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Joelib-devel mailing list > Joe...@li... > https://lists.sourceforge.net/lists/listinfo/joelib-devel > -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |
From: Peter Murray-R. <pm...@ca...> - 2005-03-30 13:28:00
|
At 14:10 30/03/2005, Joerg K. Wegner wrote: >Hi all, > >JOELib has an unpublished extended module for MCS, and the publication is >still in the review process for the QSAR&Comb.Sci. Understood. Of course if we were in any other discipline than chemistry you could have posted the preprint... >We can also deal with physicochemical atom properties, which can make >sense for reactions. Do you mean more than formal charges and isotopes? Ore atom subtyping? Most other properties will be doubles so cannot match exactly and there is some judgment. >The method returns all, full combinatorial and/or unique, MCS, so there >are a lot of index lists for two molecules. Looks very useful. >graph isomorphism(1) is a special case of maximum subgraph isomorphism(2) >so this is no problem. I listed them separately because other implementations might not have MCS > I am not sure if I understand (3) this requires the disjoint graphs in the reactants to be mapped to the disjoint graphs in the products. For example a water molecule will probably map to all products. However if many of the O atoms have already been mapped, then it has less possibilities. IOW it is a mapping of graphLists (forests), not just graphs. >Are there plans for a publication? Yes, but we cannot promise timescales or the type of publication(s). >This module is a real developer version and looks also like a developer >version and I am ambigious to share it, until it was not published. >You know GPL, means sharing all or nothing. Indeed. And probably technically you have to send us the full source and so on. That is an advantage of WS - you can share the functionality without releasing the code. P. Peter Murray-Rust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 Fax: +44 1223 763076 |
From: Christoph S. <c.s...@un...> - 2005-03-30 14:10:02
|
Sorry for not answering this question comprehensively, I just want to point out that we have convenience methods for getting=20 the atom-to-atom mapping in CDK MCSS, not just bond maps. > (1) which of the tools can be easily configured to output one or more=20 > substructure mappings (including null) from A=3D>C, A=3D>D, A=3D>E. B=3D= >C,=20 > B=3D>D, B=3D>E. And the reverse C=3D>A, C=3D>B, etc. . This only works = when one=20 > species is a precise substructure of another. > (2) Maximal common subgraph mapping A<=3D>C, A<=3D>D, etc. I don't see why the org.openscience.cdk.isomorphism.UniversalIsomorphismTester should not be=20 able to give you exactly what you ask for in (1) and (2), as atom-atom=20 mappings. > (3) ditto, but for (A+B) <=3D> (C+D+E) without overlaps. Don't quite get this one. Suppose you put AtomContainer A and B into a new AtomContainer "educts",=20 which yields a disconnected graph and the same with putting C and D and=20 E into a new AtomContainer "products". You would then do a MCSS of both?!= ? Cheers, Chris --=20 Priv. Doz. Dr. Christoph Steinbeck (c.s...@un...) Head of the Research Group for Molecular Informatics Cologne University BioInformatics Center (http://www.cubic.uni-koeln.de) Z=FClpicher Str. 47, 50674 Cologne Tel: +49(0)221-470-7426 Fax: +49 (0) 221-470-7786 What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. |
From: Peter Murray-R. <pm...@ca...> - 2005-03-30 14:25:21
|
At 15:09 30/03/2005, Christoph Steinbeck wrote: >Sorry for not answering this question comprehensively, >I just want to point out that we have convenience methods for getting the= =20 >atom-to-atom mapping in CDK MCSS, not just bond maps. Thanks - that's great. Wasn't aware that those were now in. Are there any examples of code so we can get off the ground quickly? >>(1) which of the tools can be easily configured to output one or more=20 >>substructure mappings (including null) from A=3D>C, A=3D>D, A=3D>E. B=3D>C= , B=3D>D,=20 >>B=3D>E. And the reverse C=3D>A, C=3D>B, etc. . This only works when one= species=20 >>is a precise substructure of another. >>(2) Maximal common subgraph mapping A<=3D>C, A<=3D>D, etc. > >I don't see why the >org.openscience.cdk.isomorphism.UniversalIsomorphismTester should not be=20 >able to give you exactly what you ask for in (1) and (2), as atom-atom=20 >mappings. Excellent >>(3) ditto, but for (A+B) <=3D> (C+D+E) without overlaps. > >Don't quite get this one. >Suppose you put AtomContainer A and B into a new AtomContainer "educts",=20 >which yields a disconnected graph and the same with putting C and D and E= =20 >into a new AtomContainer "products". You would then do a MCSS of both?!? Maybe this is rubbish, but consider the reaction: CCC(=3DO)OCC+ O -> CCC(=3DO)O + CCO If you do a MCS of O vs CCC(=3DO)OC you will get 2 hits. Only one is= "useful"=20 and you don't know which. However if you require that all MCS are=20 simultaneously matched then the O will match what is left after the larger= =20 fragments have been matched. In many (not all) reactions we have all the atoms in the reaction so it=20 makes sens to map them all at once - it's a strong constraint. P. >Cheers, > >Chris > >-- >Priv. Doz. Dr. Christoph Steinbeck (c.s...@un...) >Head of the Research Group for Molecular Informatics >Cologne University BioInformatics Center (http://www.cubic.uni-koeln.de) >Z=FClpicher Str. 47, 50674 Cologne >Tel: +49(0)221-470-7426 Fax: +49 (0) 221-470-7786 > >What is man but that lofty spirit - that sense of enterprise. >... Kirk, "I, Mudd," stardate 4513.3.. Peter Murray-Rust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 Fax: +44 1223 763076 |