From: Egon W. <ego...@gm...> - 2008-10-06 13:17:50
|
Hi all (and the Dazhi/Rajarshi in particular), I have playing with pKa prediction based on a JCIM paper release last Thursday [0] and ran into SMARTS parsing problems, because MOE extends the original specification in ways CDK does not support (yet): http://chem-bla-ics.blogspot.com/2008/10/pka-prediction-or-how-to-convert-jcim.html I identified at least the following constructs not supported: 1. [#G6] (or [#G7] etc) 2. [i] 3. [#X] Any possibility to see support for this in the CDK SMARTS engine? Egon -- ---- http://chem-bla-ics.blogspot.com/ |
From: Rajarshi G. <rg...@in...> - 2008-10-06 13:45:30
|
On Oct 6, 2008, at 9:15 AM, Egon Willighagen wrote: > I identified at least the following constructs not supported: > > 1. [#G6] (or [#G7] etc) Shouldn't be difficult I expect > 2. [i] I'd rather support the OEChem version of this: C^2 means SP2. Also, OpenBabel supports this, so it's better than modifying our code to support just 1 application However, I disagree with your statement: "I'd rather see the CDK SMARTS engine support these industry adopted extensions" These extensions are just provided by one company, specific to their application. IMO, if I wanted to track extensions, I'd go for OpenEye. ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- Q: What's yellow, linear, normed and complete? A: A Bananach space. |
From: Egon W. <ego...@gm...> - 2008-10-06 13:57:36
|
On Mon, Oct 6, 2008 at 3:41 PM, Rajarshi Guha <rg...@in...> wrote: > > On Oct 6, 2008, at 9:15 AM, Egon Willighagen wrote: > >> I identified at least the following constructs not supported: >> >> 1. [#G6] (or [#G7] etc) > > Shouldn't be difficult I expect > >> 2. [i] > > I'd rather support the OEChem version of this: C^2 means SP2. Also, > OpenBabel supports this, so it's better than modifying our code to support > just 1 application >From what the JCIM paper says, [i] != [C^2]... the former is a subset, really, of SP2 atoms involved in delocalized systems... so, stronger having electronic effects... > However, I disagree with your statement: > > "I'd rather see the CDK SMARTS engine support these industry adopted > extensions" > > These extensions are just provided by one company, specific to their > application. IMO, if I wanted to track extensions, I'd go for OpenEye. I'd most preferably see one fixed open standard, but I'd rather see support for these extensions *than* to have *approximating* SMARTS queries... that was my thought behind it... Egon -- ---- http://chem-bla-ics.blogspot.com/ |
From: Rajarshi G. <rg...@in...> - 2008-10-06 14:05:06
|
On Oct 6, 2008, at 9:44 AM, Egon Willighagen wrote: > On Mon, Oct 6, 2008 at 3:41 PM, Rajarshi Guha <rg...@in...> > wrote: >> >> On Oct 6, 2008, at 9:15 AM, Egon Willighagen wrote: >> >>> I identified at least the following constructs not supported: >>> >>> 1. [#G6] (or [#G7] etc) >> >> Shouldn't be difficult I expect Personally, I'd go for rewriting the above as an OR. Also if G6 indicates Group 6 of the periodic table then the elements are Cr, Mo, W so I'm not sure how it matches C, S etc Without MOE documentation, implementing these extensions is tricky. I'd rather not guess >> I'd most preferably see one fixed open standard, but I'd rather see > support for these extensions *than* to have *approximating* SMARTS > queries... that was my thought behind it... I see your point. We should also then publicize the CDK extension :) Specifically, the pharmacophore code allows you to do group wise OR's as P1|P2, where P1 and P2 are complete SMARTS patterns ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- ...but there was no one in it....... - RG |
From: Egon W. <ego...@gm...> - 2008-10-06 14:16:16
|
On Mon, Oct 6, 2008 at 4:02 PM, Rajarshi Guha <rg...@in...> wrote: >>>> 1. [#G6] (or [#G7] etc) >>> >>> Shouldn't be difficult I expect > > Personally, I'd go for rewriting the above as an OR. Yes, that's what I did... though it becomes rather long then... Moreover, there are a lot of variants (from my Perl script): $cdkSmarts =~ s/#G6H/#6H,#16H/g; $cdkSmarts =~ s/#G6;H/#6;H,#16;H/g; $cdkSmarts =~ s/#G4!H0/#4!H0,#14H0/g; I still have to check if I have rewritten the second and third variants correctly... > Also if G6 indicates Group 6 of the periodic table then the elements are Cr, > Mo, W so I'm not sure how it matches C, S etc Right... so my analysis that it meant 'Group' was so wrong... > Without MOE documentation, implementing these extensions is tricky. I'd > rather not guess Agreed. >>> I'd most preferably see one fixed open standard, but I'd rather see >> >> support for these extensions *than* to have *approximating* SMARTS >> queries... that was my thought behind it... > > I see your point. > > We should also then publicize the CDK extension :) Indeed :) I so only good in that... more papers, more citations :) BTW, can we parse a MQL query into a IQueryAtomContainer in the CDK already? Egon -- ---- http://chem-bla-ics.blogspot.com/ |
From: Egon W. <ego...@gm...> - 2008-10-06 15:03:36
|
On Mon, Oct 6, 2008 at 4:02 PM, Rajarshi Guha <rg...@in...> wrote: > > On Oct 6, 2008, at 9:44 AM, Egon Willighagen wrote: > >> On Mon, Oct 6, 2008 at 3:41 PM, Rajarshi Guha <rg...@in...> wrote: >>> >>> On Oct 6, 2008, at 9:15 AM, Egon Willighagen wrote: >>> >>>> I identified at least the following constructs not supported: >>>> >>>> 1. [#G6] (or [#G7] etc) >>> >>> Shouldn't be difficult I expect > > Personally, I'd go for rewriting the above as an OR. > > Also if G6 indicates Group 6 of the periodic table then the elements are Cr, > Mo, W so I'm not sure how it matches C, S etc Maybe Group VI of the main group elements? [1] describes [#G7!F] as "halogens (other than fluorine)"... Egon 1.http://www.chemcomp.com/journal/sdtools.htm -- ---- http://chem-bla-ics.blogspot.com/ |
From: Rajarshi G. <rg...@in...> - 2008-10-06 14:15:57
|
On Oct 6, 2008, at 10:11 AM, Egon Willighagen wrote: > BTW, can we parse a MQL query into a IQueryAtomContainer in the CDK > already? Dazhi mentioned something on these lines sometime back, but I don't know for sure. My understanding is that it'd be difficult, since MQL goes beyond what SMARTS provides ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- The Heineken Uncertainty Principle: You can never be sure how many beers you had last night. |
From: Egon W. <ego...@gm...> - 2008-10-06 14:18:55
|
On Mon, Oct 6, 2008 at 4:14 PM, Rajarshi Guha <rg...@in...> wrote: > On Oct 6, 2008, at 10:11 AM, Egon Willighagen wrote: >> BTW, can we parse a MQL query into a IQueryAtomContainer in the CDK >> already? > > Dazhi mentioned something on these lines sometime back, but I don't know for > sure. My understanding is that it'd be difficult, since MQL goes beyond what > SMARTS provides Ah, that would be translating MQL into SMARTS... I was wondering about MQL directly into IQueryAtomContainer... Egon -- ---- http://chem-bla-ics.blogspot.com/ |
From: Rajarshi G. <rg...@in...> - 2008-10-06 14:22:31
|
On Oct 6, 2008, at 10:16 AM, Egon Willighagen wrote: > On Mon, Oct 6, 2008 at 4:14 PM, Rajarshi Guha <rg...@in...> > wrote: >> On Oct 6, 2008, at 10:11 AM, Egon Willighagen wrote: >>> BTW, can we parse a MQL query into a IQueryAtomContainer in the CDK >>> already? >> >> Dazhi mentioned something on these lines sometime back, but I >> don't know for >> sure. My understanding is that it'd be difficult, since MQL goes >> beyond what >> SMARTS provides > > Ah, that would be translating MQL into SMARTS... I was wondering about > MQL directly into IQueryAtomContainer... Oh, OK. Don't know about that ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- Q: What's purple and commutes? A: An abelian grape. |
From: Nina J. <ni...@ac...> - 2008-10-06 14:51:00
|
Hi all, I would also be interested in finding a solution to the MOE SMARTS extensions. Does MOE documentation come only with their products? With the supporting info the implementation is pretty quick and straightforward (here is mine from Friday evening :) https://ambit.svn.sourceforge.net/svnroot/ambit/trunk/ambit2-all/ambit2-descriptors/src/main/java/ambit2/descriptors/PKASmartsDescriptor.java Regards, Nina Egon Willighagen wrote: > Hi all (and the Dazhi/Rajarshi in particular), > > I have playing with pKa prediction based on a JCIM paper release last > Thursday [0] and ran into SMARTS parsing problems, because MOE extends > the original specification in ways CDK does not support (yet): > > http://chem-bla-ics.blogspot.com/2008/10/pka-prediction-or-how-to-convert-jcim.html > > I identified at least the following constructs not supported: > > 1. [#G6] (or [#G7] etc) > 2. [i] > 3. [#X] > > Any possibility to see support for this in the CDK SMARTS engine? > > Egon > > > |
From: Rajarshi G. <rg...@in...> - 2008-10-06 14:59:39
|
On Oct 6, 2008, at 10:48 AM, Nina Jeliazkova wrote: > Hi all, > > I would also be interested in finding a solution to the MOE SMARTS > extensions. Does MOE documentation come only with their products? Apparently so. I've asked Adam if he can shed some light. AFAICS, the G? identifier means group of periodic table, since they have a entry in the table where they say G7 is a halogen. But the MOE implementation is not using IUPAC group numbering system. So G6 does mean C,S etc - but it's referring to VIB and VIA ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- A beer delayed is a beer denied. |
From: Egon W. <ego...@gm...> - 2008-10-06 15:01:39
|
On Mon, Oct 6, 2008 at 4:48 PM, Nina Jeliazkova <ni...@ac...> wrote: > I would also be interested in finding a solution to the MOE SMARTS > extensions. Does MOE documentation come only with their products? > > With the supporting info the implementation is pretty quick and > straightforward (here is mine from Friday evening :) > https://ambit.svn.sourceforge.net/svnroot/ambit/trunk/ambit2-all/ambit2-descriptors/src/main/java/ambit2/descriptors/PKASmartsDescriptor.java At least I got first post :) Are your PkaNode use the SMARTSQueryTool too? How did you deal with the difficult SMARTS? Egon -- ---- http://chem-bla-ics.blogspot.com/ |
From: Nina J. <ni...@ac...> - 2008-10-06 17:47:54
|
Egon Willighagen wrote: > On Mon, Oct 6, 2008 at 4:48 PM, Nina Jeliazkova <ni...@ac...> wrote: > >> I would also be interested in finding a solution to the MOE SMARTS >> extensions. Does MOE documentation come only with their products? >> >> With the supporting info the implementation is pretty quick and >> straightforward (here is mine from Friday evening :) >> https://ambit.svn.sourceforge.net/svnroot/ambit/trunk/ambit2-all/ambit2-descriptors/src/main/java/ambit2/descriptors/PKASmartsDescriptor.java >> > > At least I got first post :) > :) > Are your PkaNode use the SMARTSQueryTool too? How did you deal with > https://ambit.svn.sourceforge.net/svnroot/ambit/trunk/ambit2-all/ambit2-descriptors/src/main/java/ambit2/descriptors/PKANode.java > the difficult SMARTS? > Actually it uses Smarts implementation written by my colleague. But nevertheless, difficult smiles are not dealt yet - too much to expect from a quick implementation. I hope it will be possible to rewrite the smarts, given MOE documentation. Regards, Nina > Egon > > > |
From: Rajarshi G. <rg...@in...> - 2008-10-14 03:21:26
|
On Oct 6, 2008, at 9:15 AM, Egon Willighagen wrote: > 1. [#G6] (or [#G7] etc) This is now support in 1.2.x. It's slightly different from the above, in that you'd write [G6] or [G7]. Also the number used in the CDK is based on the IUPAC group numbering. So MOE's [#G6] is actually [G14] in the CDK SMARTS implementation ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- All science is either physics or stamp collecting. -- Ernest Rutherford |
From: Egon W. <ego...@gm...> - 2008-10-14 09:13:38
|
On Tue, Oct 14, 2008 at 5:21 AM, Rajarshi Guha <rg...@in...> wrote: > This is now support in 1.2.x. It's slightly different from the above, in > that you'd write [G6] or [G7]. Also the number used in the CDK is based on > the IUPAC group numbering. So MOE's [#G6] is actually [G14] in the CDK > SMARTS implementation Thanx! Egon -- ---- http://chem-bla-ics.blogspot.com/ |
From: Andrew D. <da...@da...> - 2008-10-14 16:23:45
|
On Oct 14, 2008, at 5:21 AM, Rajarshi Guha wrote: > This is now support in 1.2.x. It's slightly different from the above, > in that you'd write [G6] or [G7]. Also the number used in the CDK is > based on the IUPAC group numbering. So MOE's [#G6] is actually [G14] > in the CDK SMARTS implementation Cool! Is this documented somewhere as a list of CDK extensions to SMARTS? Andrew da...@da... |
From: Rajarshi G. <rg...@in...> - 2008-10-14 16:31:11
|
On Oct 14, 2008, at 12:23 PM, Andrew Dalke wrote: > On Oct 14, 2008, at 5:21 AM, Rajarshi Guha wrote: >> This is now support in 1.2.x. It's slightly different from the above, >> in that you'd write [G6] or [G7]. Also the number used in the CDK is >> based on the IUPAC group numbering. So MOE's [#G6] is actually [G14] >> in the CDK SMARTS implementation > > Cool! Is this documented somewhere as a list of CDK extensions > to SMARTS? It's documented in the Javadocs for SMARTSQueryTool http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/ org/openscience/cdk/smiles/smarts/SMARTSQueryTool.html (It'll be updated after the afternoon build completes, about 20 min) ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- Chemistry professors never die, they just fail to react. |
From: Rajarshi G. <rg...@in...> - 2008-10-14 16:34:10
|
On Oct 14, 2008, at 12:30 PM, Rajarshi Guha wrote: > > On Oct 14, 2008, at 12:23 PM, Andrew Dalke wrote: > >> On Oct 14, 2008, at 5:21 AM, Rajarshi Guha wrote: >>> This is now support in 1.2.x. It's slightly different from the >>> above, >>> in that you'd write [G6] or [G7]. Also the number used in the CDK is >>> based on the IUPAC group numbering. So MOE's [#G6] is actually [G14] >>> in the CDK SMARTS implementation >> >> Cool! Is this documented somewhere as a list of CDK extensions >> to SMARTS? > > It's documented in the Javadocs for SMARTSQueryTool > > http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/ > org/openscience/cdk/smiles/smarts/SMARTSQueryTool.html Sorry, it should be http://cheminfo.informatics.indiana.edu/~rguha/code/java/ nightly-1.2.x/api/org/openscience/cdk/smiles/smarts/SMARTSQueryTool.html since it's in the 1.2.x branch and will be merged into trunk ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- Entropy isn't what it used to be. |
From: Rajarshi G. <rg...@in...> - 2008-10-15 00:20:50
|
On Oct 6, 2008, at 9:15 AM, Egon Willighagen wrote: > 3. [#X] This is now support in 1.2.x ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- My Ethicator machine must have had a built-in moral compromise spectral phantasmatron! I'm a genius." -Calvin |
From: Egon W. <ego...@gm...> - 2008-10-15 08:55:23
|
On Wed, Oct 15, 2008 at 2:20 AM, Rajarshi Guha <rg...@in...> wrote: > > On Oct 6, 2008, at 9:15 AM, Egon Willighagen wrote: > >> 3. [#X] > > This is now support in 1.2.x Brilliant! I'll update my pKa branch to include all SMARTS from the paper in Java Decision Tree format... I also saw the work on on ^bla for hybridization... [i] will be a bit more tricky, as we would need to do a some work on delocalization detection for that... but I rather like this concept, and just love to see that supported too... Miguel wrote a delocalization detector already and will ping him on how to use it... Egon -- ---- http://chem-bla-ics.blogspot.com/ |
From: Rajarshi G. <rg...@in...> - 2008-10-15 02:34:38
|
On Oct 6, 2008, at 9:15 AM, Egon Willighagen wrote: > 2. [i] This is now supported in 1.2.x - but not directly. Rather, we now support the ^x symbol where x indicates the hybridization number (1 - SP1, 2 - SP2, ..., 8 - SP3D5). This is an OE extension to SMARTS. Since MOE defines [i] as any atom in a pi system, one way to replicate this is to use [^1,^2] ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- All life evolves by the differential survival of replicating entities. -- Dawkins |
From: Egon W. <ego...@gm...> - 2008-10-15 09:06:12
|
On Wed, Oct 15, 2008 at 4:34 AM, Rajarshi Guha <rg...@in...> wrote: > On Oct 6, 2008, at 9:15 AM, Egon Willighagen wrote: > >> 2. [i] > > This is now supported in 1.2.x - but not directly. Rather, we now support > the ^x symbol where x indicates the hybridization number (1 - SP1, 2 - SP2, > ..., 8 - SP3D5). This is an OE extension to SMARTS. > > Since MOE defines [i] as any atom in a pi system, one way to replicate this > is to use [^1,^2] Ummm.... sorry, overlooked this email when just replying to the other... But I think [i] really refers to pi electrons involved in a delocalized system... so C=CCC would not match [i][i], while C=CC=C and C=CC=O would... Egon -- ---- http://chem-bla-ics.blogspot.com/ |
From: Rajarshi G. <rg...@in...> - 2008-10-15 15:14:55
|
On Oct 15, 2008, at 4:55 AM, Egon Willighagen wrote: > On Wed, Oct 15, 2008 at 4:34 AM, Rajarshi Guha <rg...@in...> > wrote: >> >> Since MOE defines [i] as any atom in a pi system, one way to >> replicate this >> is to use [^1,^2] > > Ummm.... sorry, overlooked this email when just replying to the > other... > > But I think [i] really refers to pi electrons involved in a > delocalized system... > > so C=CCC would not match [i][i], while C=CC=C and C=CC=O would... According to Adam (CC'd), [i] is an atom in any pi system ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- CChheecckk yyoouurr dduupplleexx sswwiittcchh.. |
From: Egon W. <ego...@gm...> - 2008-10-15 12:29:33
|
Hi Rajarshi, On Wed, Oct 15, 2008 at 1:50 PM, Rajarshi Guha <rg...@in...> wrote: > On Oct 15, 2008, at 4:55 AM, Egon Willighagen wrote: >> On Wed, Oct 15, 2008 at 4:34 AM, Rajarshi Guha <rg...@in...> wrote: >>> Since MOE defines [i] as any atom in a pi system, one way to replicate >>> this is to use [^1,^2] >> >> Ummm.... sorry, overlooked this email when just replying to the other... >> >> But I think [i] really refers to pi electrons involved in a >> delocalized system... >> >> so C=CCC would not match [i][i], while C=CC=C and C=CC=O would... > > According to Adam (CC'd), [i] is an atom in any pi system OK, I misinterpreted that then. Which is actually good, becomes that simplifies things considerably! :) Egon -- ---- http://chem-bla-ics.blogspot.com/ |