From: Egon W. <ego...@gm...> - 2014-07-15 11:42:18
|
Rajarshi, could you please have a look at this patch: https://sourceforge.net/p/cdk/patches/770/ It reverts code specifically added by you, but I cannot find enough context to decide whether you code should be reverted or not... Thanks, Egon -- E.L. Willighagen Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers ORCID: 0000-0001-7542-0286 |
From: Rajarshi G. <gu...@ma...> - 2014-07-15 12:49:12
|
The patch is correct - I misread the daylight spec On Tue, Jul 15, 2014 at 7:41 AM, Egon Willighagen < ego...@gm...> wrote: > Rajarshi, > > could you please have a look at this patch: > > https://sourceforge.net/p/cdk/patches/770/ > > It reverts code specifically added by you, but I cannot find enough > context to decide whether you code should be reverted or not... > > Thanks, > > Egon > > -- > E.L. Willighagen > Department of Bioinformatics - BiGCaT > Maastricht University (http://www.bigcat.unimaas.nl/) > Homepage: http://egonw.github.com/ > LinkedIn: http://se.linkedin.com/in/egonw > Blog: http://chem-bla-ics.blogspot.com/ > PubList: http://www.citeulike.org/user/egonw/tag/papers > ORCID: 0000-0001-7542-0286 > -- Rajarshi Guha | http://blog.rguha.net National Center for Advancing Translational Science |
From: John M. <joh...@gm...> - 2014-07-15 20:19:56
|
Hi all, I managed to dig up a message that seems to back up the reason for the current behaviour (see below). We could keep the D<n> behaviour as it is, but I needed the ‘*’ for matching MMFF forcefield types. Since this was dealing with a hydrogen unsuppressed graph (i.e. 3D model) I needed the ability to match heavy atoms next to any atom (including hydrogen). To me, it makes more sense that ‘*’ doesn’t match explicit hydrogens when i’ve preprocessed the structure and removed them (not automatic) rather than automatically exclude hydrogens. Cheers, J > > 2. The Daylight SMARTS theory page says that the D<n> token > > represents <n> explicit connections. When counting these connections, > > should explicit H's be included? The Daylight SMARTS test page > > suggests no: > > That is correct. Internally, a hydrogen is a hydrogen, and it doesn't matter whether it's represented as an atom or an H-count. The D<n> spec is about non-hydrogen atoms, not about whether the toolkit coders elect to represent hydrogens explicitely. > > > Matching [OD1] against CO and CO[H] both give 1 hit (explicit H > > matching turned off) > > > > 3. The Daylight spec for * indicates it matches any atom. But the > > Daylight depict service suggests that it will not match against H > > (unless explicit H matching is selected). > > This last part ("unless explicit H matching is selected") was only for backwards compatibility, as I recall. The Daylight Toolkit originally treated implicit/explicit H differently, but that was a disaster -- the idea that a SMARTS had a different meaning depending on how the molecule was represented internally was a brain-dead idea, and was quickly corrected in the early versions of the Daylight Toolkit. The explicit-H matching was put in just so that old applications wouldn't break that relied on the old behavior. > > Craig On 15 Jul 2014, at 13:49, Rajarshi Guha <gu...@ma...> wrote: > The patch is correct - I misread the daylight spec > > > On Tue, Jul 15, 2014 at 7:41 AM, Egon Willighagen <ego...@gm...> wrote: > Rajarshi, > > could you please have a look at this patch: > > https://sourceforge.net/p/cdk/patches/770/ > > It reverts code specifically added by you, but I cannot find enough > context to decide whether you code should be reverted or not... > > Thanks, > > Egon > > -- > E.L. Willighagen > Department of Bioinformatics - BiGCaT > Maastricht University (http://www.bigcat.unimaas.nl/) > Homepage: http://egonw.github.com/ > LinkedIn: http://se.linkedin.com/in/egonw > Blog: http://chem-bla-ics.blogspot.com/ > PubList: http://www.citeulike.org/user/egonw/tag/papers > ORCID: 0000-0001-7542-0286 > > > > -- > Rajarshi Guha | http://blog.rguha.net > National Center for Advancing Translational Science > ------------------------------------------------------------------------------ > Want fast and easy access to all the code in your enterprise? Index and > search up to 200,000 lines of code with a free copy of Black Duck > Code Sight - the same software that powers the world's largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds_______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel |
From: Egon W. <ego...@gm...> - 2014-07-16 11:15:20
|
John, sorry, but I am a bit lost now... what is the resolution now? You will ping me when updates patches are ready? Egon On Tue, Jul 15, 2014 at 10:19 PM, John May <joh...@gm...> wrote: > Hi all, > > I managed to dig up a message that seems to back up the reason for the > current behaviour (see below). We could keep the D<n> behaviour as it is, > but I needed the ‘*’ for matching MMFF forcefield types. Since this was > dealing with a hydrogen unsuppressed graph (i.e. 3D model) I needed the > ability to match heavy atoms next to any atom (including hydrogen). > > To me, it makes more sense that ‘*’ doesn’t match explicit hydrogens when > i’ve preprocessed the structure and removed them (not automatic) rather than > automatically exclude hydrogens. > > Cheers, > J > >> 2. The Daylight SMARTS theory page says that the D<n> token > >> represents <n> explicit connections. When counting these connections, >> should explicit H's be included? The Daylight SMARTS test page >> suggests no: > > That is correct. Internally, a hydrogen is a hydrogen, and it doesn't > matter whether it's represented as an atom or an H-count. The D<n> spec is > about non-hydrogen atoms, not about whether the toolkit coders elect to > represent hydrogens explicitely. > >> Matching [OD1] against CO and CO[H] both give 1 hit (explicit H >> matching turned off) >> >> 3. The Daylight spec for * indicates it matches any atom. But the >> Daylight depict service suggests that it will not match against H >> (unless explicit H matching is selected). > > This last part ("unless explicit H matching is selected") was only for > backwards compatibility, as I recall. The Daylight Toolkit originally > treated implicit/explicit H differently, but that was a disaster -- the idea > that a SMARTS had a different meaning depending on how the molecule was > represented internally was a brain-dead idea, and was quickly corrected in > the early versions of the Daylight Toolkit. The explicit-H matching was put > in just so that old applications wouldn't break that relied on the old > behavior. > > Craig > > > > On 15 Jul 2014, at 13:49, Rajarshi Guha <gu...@ma...> wrote: > > The patch is correct - I misread the daylight spec > > > On Tue, Jul 15, 2014 at 7:41 AM, Egon Willighagen > <ego...@gm...> wrote: >> >> Rajarshi, >> >> could you please have a look at this patch: >> >> https://sourceforge.net/p/cdk/patches/770/ >> >> It reverts code specifically added by you, but I cannot find enough >> context to decide whether you code should be reverted or not... >> >> Thanks, >> >> Egon >> >> -- >> E.L. Willighagen >> Department of Bioinformatics - BiGCaT >> Maastricht University (http://www.bigcat.unimaas.nl/) >> Homepage: http://egonw.github.com/ >> LinkedIn: http://se.linkedin.com/in/egonw >> Blog: http://chem-bla-ics.blogspot.com/ >> PubList: http://www.citeulike.org/user/egonw/tag/papers >> ORCID: 0000-0001-7542-0286 > > > > > -- > Rajarshi Guha | http://blog.rguha.net > National Center for Advancing Translational Science > ------------------------------------------------------------------------------ > Want fast and easy access to all the code in your enterprise? Index and > search up to 200,000 lines of code with a free copy of Black Duck > Code Sight - the same software that powers the world's largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds_______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel > > -- E.L. Willighagen Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers ORCID: 0000-0001-7542-0286 |
From: John M. <joh...@gm...> - 2014-07-16 12:45:41
|
No patches need updating, both are optional. I actually only needed to modify a single SMARTS pattern for MMFF so neither patch 770/776 is actually needed now. From a theoretical standpoint I think the behaviour makes more sense. I would vote for leaving the ‘D<N>’ behaviour as it is (i.e. ignore patch 776) but still change the wildcard ‘*’ (i.e include patch 770). At the moment ‘*’ matches ‘[2H]’ but not ‘[H+]’, ‘B1[H]B[H]1’, ‘[H][H]’, and ‘[H:1]’. Cheers, J |
From: Rajarshi G. <gu...@ma...> - 2014-07-16 14:39:01
|
On Wed, Jul 16, 2014 at 8:45 AM, John May <joh...@gm...> wrote: > > I would vote for leaving the 'D<N>' behaviour as it is (i.e. ignore patch > 776) but still change the wildcard '*' (i.e include patch 770). At the > moment '*' matches '[2H]' but not '[H+]', 'B1[H]B[H]1', '[H][H]', and > '[H:1]'. > I'd vote for fixing the handling of '*' -- Rajarshi Guha | http://blog.rguha.net National Center for Advancing Translational Science |