From: Ed B. <mre...@ya...> - 2011-06-04 08:54:24
|
Hi All I have devised a way to find the sssr which I believe is novel and more efficient than that used in the current sssrfinder. I am new to cdk and cdk-devel so not sure how to go about collaborating on this. I am looking for help to develop the code I have written using the cdk and also test the idea. Anybody able to help? The algorithm has two stages. In the first stage very simple rings can be found in what I think is linear time - it's certainly faster when compared with sssrfinder. If compounds are not solved in stage one then stage two tries to solve the rest. I am having difficulties optimising stage 2 and I'm hoping a good programmer can see a way to optimise this part. Failing that stage one of the method can be used as a pre-processing stage prior to doing the sssrfinder in order to improve efficiency. I have done a time trial using 250k smiles in the NCI dataset and by using stage one and the sssr finder together it is twice as fast as using the sssr finder alone. I have also tested that it is finding the right rings and it is. Stage one and stage two combined with the sssrfinder is also faster than sssrfinder alone, but for some reason slower than stage one and the sssrfinder combined. I have not found anything out there like it - but that's not to say there isn't anything. There may be a paper in it if it is unique. Anyway I am interested how I can go about this, make the code available, test it, incorporate into cdk,explain the idea etc so if anyone is interested let me know what the next step is. Thanks Ed Barker |
From: Rajarshi G. <raj...@gm...> - 2011-06-04 12:03:46
|
On Sat, Jun 4, 2011 at 4:54 AM, Ed Barker <mre...@ya...> wrote: > I have not found anything out there like it - but that's not to say there > isn't anything. There may be a paper in it if it is unique. > Anyway I am interested how I can go about this, make the code available, > test it, incorporate into cdk,explain the idea etc so if anyone is > interested let me know what the next step is. Sounds very interesting. If you don't mind making your code public at this point, you could put it up on Github so that we could take a look at it and provide suggestions on how to make it suitable for inclusion as well suggestions on the stage 2. Again, if you don't object to going public, it would be useful to have a write up of how the algorithm works -- Rajarshi Guha NIH Chemical Genomics Center |
From: gilleain t. <gil...@gm...> - 2011-06-04 13:23:15
|
Hi Ed, It does sound like you might have a better method than the SSSRFinder; I'm curious, though, about how much better it is than other algorithms. You could also test the HanserRingFinder : http://pele.farmbio.uu.se/nightly/api/org/openscience/cdk/smsd/ring/RingFinder.html (shouldn't this be in the same package as the SSSRFinder, by the way, list?). One paper I came across when looking at minimal cycle bases was this : Counterexamples in Chemical Ring Perception J. Chem. Inf. Comput. Sci. 2004, 44, 323-331 DOI: 10.1021/ci030405d which gives some interesting test cases. What sort of compounds fail the second stage? gilleain On 6/4/11, Ed Barker <mre...@ya...> wrote: > Hi All > > I have devised a way to find the > sssr which I believe is novel and more efficient than that used in the > current sssrfinder. I am new to cdk and cdk-devel so not sure how to go > about collaborating on this. I am looking for help to develop the code I > have written using the cdk and also test the idea. Anybody able to > help? > > The algorithm has two stages. In > the first stage very simple rings can be found in what I think is linear > time - it's certainly faster when compared with sssrfinder. If compounds are > not solved in stage one then > stage two tries to solve the rest. I am having difficulties optimising > stage 2 and I'm hoping a good programmer can see a way to optimise this > part. Failing that stage one of the method can be used as a > pre-processing stage prior to doing the sssrfinder in order to improve > efficiency. > > I have done a time trial using 250k smiles in the NCI dataset and by using > stage one and the sssr finder together it is twice as > fast as using the sssr finder alone. I have also tested that it is finding > the right rings and it is. Stage one and stage two combined with the > sssrfinder is also faster than sssrfinder alone, but for some reason slower > than stage one and the sssrfinder combined. > > > I have not found anything out there like it - but that's not to say there > isn't anything. There may be a paper in it if it is unique. > > Anyway I am interested how I can go about this, make the code available, > test it, incorporate into cdk,explain the idea etc so if anyone is > interested let me know what the next step is. > > Thanks > Ed Barker |
From: Andrew D. <da...@da...> - 2011-06-04 14:18:16
|
On Jun 4, 2011, at 3:23 PM, gilleain torrance wrote: > One paper I came across when looking at minimal cycle bases was this : > Counterexamples in Chemical Ring Perception J. Chem. Inf. Comput. Sci. > 2004, 44, 323-331 DOI: 10.1021/ci030405d which gives some interesting > test cases. There's a preprint at http://www.bioinf.uni-leipzig.de/Publications/PREPRINTS/03-012.pdf Perfect timing for me! OpenEye doesn't implement SSSR and at http://www.eyesopen.com/docs/toolkits/html/OEChemTK-python/ring.html#smallest-set-of-smallest-rings-sssr-considered-harmful says "We believe that it is a great service to our customers that we do not include any SSSR functionality inOEChem." I've wondered why but never researched it. In my chemfp project I'm trying to include implementations of the PubChem fingerprints. Some of the bits include: >= 4 aromatic rings >= 4 hetero-aromatic rings where ring is defined as "Extended Smallest Set of Smallest Rings". For RDKit, OpenBabel, and Indigo I use the toolkit's SSSR. I haven't been able to come up with an elegant or simple brute force solution with the OpenEye tools. I'm also distrustful that those implementations strive to maximize the number hetero-aromatic rings. The preprint says: The Extended Set of Smallest Rings was introduced by Downs et al. [17] as an approach to design an optimal ring set for retrieval purposes. ESSR by definition is limited to planar graphs. Looking up "Theoretical Aspects of Ring Perception and Development of the Extended Set of Smallest Rings Concept" (Downs et al. 1988) I see it says that the theory only works for planar graphs. Therefore, I'm doubtful that I can implement these bits for OpenEye without implementing ESSSR *and* I think that the PubChem fingerprints give incorrect answers for some non-planar graphs *and* I think this is a topic I would rather let Ed Barker or Greg .. or Wolf-Dietrich .. deal with. Andrew da...@da... |
From: Egon W. <ego...@gm...> - 2011-06-05 18:59:53
|
Hi Gileain, On Sat, Jun 4, 2011 at 3:23 PM, gilleain torrance <gil...@gm...> wrote: > It does sound like you might have a better method than the SSSRFinder; > I'm curious, though, about how much better it is than other > algorithms. You could also test the HanserRingFinder : > > http://pele.farmbio.uu.se/nightly/api/org/openscience/cdk/smsd/ring/RingFinder.html > > (shouldn't this be in the same package as the SSSRFinder, by the way, list?). Yes, it should. There is a lot of things in the smsd subpackage that should go somewhere else, but there needs to be fixing and tweaking, before I want to see it elsewhere. BTW, I just ran into this branch: https://github.com/egonw/cdk/pull/1 I am sorry I missed that or forgot about that. Is that still a valid pull/review request? Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Egon W. <ego...@gm...> - 2011-06-05 19:01:50
|
On Sat, Jun 4, 2011 at 4:18 PM, Andrew Dalke <da...@da...> wrote: > >= 4 aromatic rings > >= 4 hetero-aromatic rings > > where ring is defined as "Extended Smallest Set of Smallest Rings". Do they define in detail how to determine this for fused ring systems? Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: gilleain t. <gil...@gm...> - 2011-06-05 19:05:35
|
Hi Egon, Yes, there are a number of things that could move out of cdk-smsd, such as some of the labelling stuff. Another task-class to go on the task tree. The pull request is still valid, yes please. Hmmm. Although ... Asad has done a lot of recent work to make SMSD thread safe, which might be usefully added. Depends on whether you want to wait for a re-write of the patch queue, or look at this lot now, or what? gilleain On 6/5/11, Egon Willighagen <ego...@gm...> wrote: > Hi Gileain, > > On Sat, Jun 4, 2011 at 3:23 PM, gilleain torrance > <gil...@gm...> wrote: >> It does sound like you might have a better method than the SSSRFinder; >> I'm curious, though, about how much better it is than other >> algorithms. You could also test the HanserRingFinder : >> >> http://pele.farmbio.uu.se/nightly/api/org/openscience/cdk/smsd/ring/RingFinder.html >> >> (shouldn't this be in the same package as the SSSRFinder, by the way, >> list?). > > Yes, it should. There is a lot of things in the smsd subpackage that > should go somewhere else, but there needs to be fixing and tweaking, > before I want to see it elsewhere. > > BTW, I just ran into this branch: > > https://github.com/egonw/cdk/pull/1 > > I am sorry I missed that or forgot about that. Is that still a valid > pull/review request? > > Egon > > > -- > Dr E.L. Willighagen > Postdoctoral Researcher > Institutet för miljömedicin > Karolinska Institutet (http://ki.se/imm) > Homepage: http://egonw.github.com/ > LinkedIn: http://se.linkedin.com/in/egonw > Blog: http://chem-bla-ics.blogspot.com/ > PubList: http://www.citeulike.org/user/egonw/tag/papers > > ------------------------------------------------------------------------------ > Simplify data backup and recovery for your virtual environment with vRanger. > Installation's a snap, and flexible recovery options mean your data is safe, > secure and there when you need it. Discover what all the cheering's about. > Get your free trial download today. > http://p.sf.net/sfu/quest-dev2dev2 > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel > |
From: Syed A. R. <s9...@gm...> - 2011-06-05 19:18:40
|
Hi Egon, I would request you to wait for an updated patch as it will be much cleaner to review. We will submit it soon. Asad On 5 Jun 2011, at 20:05, gilleain torrance wrote: > Hi Egon, > > Yes, there are a number of things that could move out of cdk-smsd, > such as some of the labelling stuff. Another task-class to go on the > task tree. > > The pull request is still valid, yes please. Hmmm. Although ... Asad > has done a lot of recent work to make SMSD thread safe, which might be > usefully added. Depends on whether you want to wait for a re-write of > the patch queue, or look at this lot now, or what? > > gilleain > > On 6/5/11, Egon Willighagen <ego...@gm...> wrote: >> Hi Gileain, >> >> On Sat, Jun 4, 2011 at 3:23 PM, gilleain torrance >> <gil...@gm...> wrote: >>> It does sound like you might have a better method than the SSSRFinder; >>> I'm curious, though, about how much better it is than other >>> algorithms. You could also test the HanserRingFinder : >>> >>> http://pele.farmbio.uu.se/nightly/api/org/openscience/cdk/smsd/ring/RingFinder.html >>> >>> (shouldn't this be in the same package as the SSSRFinder, by the way, >>> list?). >> >> Yes, it should. There is a lot of things in the smsd subpackage that >> should go somewhere else, but there needs to be fixing and tweaking, >> before I want to see it elsewhere. >> >> BTW, I just ran into this branch: >> >> https://github.com/egonw/cdk/pull/1 >> >> I am sorry I missed that or forgot about that. Is that still a valid >> pull/review request? >> >> Egon >> >> >> -- >> Dr E.L. Willighagen >> Postdoctoral Researcher >> Institutet för miljömedicin >> Karolinska Institutet (http://ki.se/imm) >> Homepage: http://egonw.github.com/ >> LinkedIn: http://se.linkedin.com/in/egonw >> Blog: http://chem-bla-ics.blogspot.com/ >> PubList: http://www.citeulike.org/user/egonw/tag/papers >> >> ------------------------------------------------------------------------------ >> Simplify data backup and recovery for your virtual environment with vRanger. >> Installation's a snap, and flexible recovery options mean your data is safe, >> secure and there when you need it. Discover what all the cheering's about. >> Get your free trial download today. >> http://p.sf.net/sfu/quest-dev2dev2 >> _______________________________________________ >> Cdk-devel mailing list >> Cdk...@li... >> https://lists.sourceforge.net/lists/listinfo/cdk-devel >> > > ------------------------------------------------------------------------------ > Simplify data backup and recovery for your virtual environment with vRanger. > Installation's a snap, and flexible recovery options mean your data is safe, > secure and there when you need it. Discover what all the cheering's about. > Get your free trial download today. > http://p.sf.net/sfu/quest-dev2dev2 > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel |
From: Egon W. <ego...@gm...> - 2011-06-05 19:55:29
|
On Sun, Jun 5, 2011 at 9:18 PM, Syed Asad Rahman <s9...@gm...> wrote: > Hi Egon, I would request you to wait for an updated patch as it will be much cleaner to review. OK, I'll wait. Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Andrew D. <da...@da...> - 2011-06-07 07:05:36
|
On Jun 5, 2011, at 9:01 PM, Egon Willighagen wrote: > Do they define in detail how to determine [ESSSR] for fused ring systems? Apparently the code is available somewhere on their web site. I haven't looked for it though. Ah-ha, found it! In a file dated from 2007 ftp://ftp.ncbi.nih.gov/pubchem/CACTVS/ncbisource.tgz It's probably related to CSaddESSSRRings . Andrew da...@da... |
From: Egon W. <ego...@gm...> - 2011-06-08 05:37:43
|
On Tue, Jun 7, 2011 at 9:05 AM, Andrew Dalke <da...@da...> wrote: > On Jun 5, 2011, at 9:01 PM, Egon Willighagen wrote: >> Do they define in detail how to determine [ESSSR] for fused ring systems? That should have read: "Do they define in detail how to determine aromaticity for fused ring systems?" You know how much a bitch about aromaticity. The fact is that some rings are not aromatic by themselves, but the larged, fused ring is. For example, azulene, which is 5 pi electrons in one ring, and 7 in the other. That makes neither ring aromatic, yet the compound is. Now, doing this, you have to run an all-rings-finder algorithm, and test *all* rings if they fulfill the Hueckel conditions. I have not seen any specifications of what toolkits do here. But I know what I did for the CDK: it isolates ring systems in the molecule, and for each system with 2 or 3 rings is does a al-rings-finder and determine aromaticity on all of them. The algorithm also excludes rings with are sprouted with double bonds. For example, benzoquinone is not aromatic, AFAIK, despite the ring having 6 pi electrons. And because no toolkit uses the same rules, or specifies the rules, aromaticity is a concept to stay away from, and not to be mixed between toolkits. Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Andrew D. <da...@da...> - 2011-06-14 10:32:17
|
On Jun 8, 2011, at 7:37 AM, Egon Willighagen wrote: > Now, doing this, you have to run an all-rings-finder algorithm, and > test *all* rings if they fulfill the Hueckel conditions. That's overkill. A bond is aromatic if it's in at least one aromatic ring. You don't need to generate all possible rings first. In the trivial case, if there's no atom which can be sp2 then there's no need to find all-rings. Consider then graphite. If you pick a bond and do a breadth-first search you'll quickly find a ring. Move on to the next non-aromatic bond and repeat. To make it faster, first do a search to find all atoms and bonds which can be in a cycle. You can see this will work for graphite, and will be much faster than finding all rings first. Interestingly, it won't fail either. It may be slower than SSSR for some cases, but likely it will be the same cases where SSSR gives ambiguous answers. Andrew da...@da... |
From: Egon W. <ego...@gm...> - 2011-06-14 10:55:35
|
On Tue, Jun 14, 2011 at 12:32 PM, Andrew Dalke <da...@da...> wrote: > On Jun 8, 2011, at 7:37 AM, Egon Willighagen wrote: >> Now, doing this, you have to run an all-rings-finder algorithm, and >> test *all* rings if they fulfill the Hueckel conditions. > > That's overkill. A bond is aromatic if it's in at least one > aromatic ring. You don't need to generate all possible rings > first. For 'simple' systems it indeed is an overkill... for other systems, combinatorial combinations is needed to detect aromaticity, like azulene. Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Andrew D. <da...@da...> - 2011-06-14 11:28:36
|
On Jun 14, 2011, at 12:55 PM, Egon Willighagen wrote: > For 'simple' systems it indeed is an overkill... for other systems, > combinatorial combinations is needed to detect aromaticity, like > azulene. The goal is to find at least one aromatic ring for each bond. My point is that that doesn't mean finding all rings first. In the worst case (lots of sp2 atoms which are in complex ring systems but not in aromatic rings) then the algorithm I gave ends up finding all rings; so yes, the essential combinitorics are unescapable. The advantage is that it can short-circuit in more cases. Andrew da...@da... |
From: Egon W. <ego...@gm...> - 2011-06-14 11:48:19
|
On Tue, Jun 14, 2011 at 1:28 PM, Andrew Dalke <da...@da...> wrote: > In the worst case (lots of sp2 atoms which are in complex > ring systems but not in aromatic rings) then the algorithm I > gave ends up finding all rings; so yes, the essential combinitorics > are unescapable. The advantage is that it can short-circuit > in more cases. Absolutely nothing wrong with that... I'll think about if I can easily integrate this heuristics into the CDKHueckelAromaticityChecker... Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Christoph S. <ste...@eb...> - 2011-06-16 09:04:23
|
Thanks, Andrew, for pointing this out. Seems so obvious but we always went for the over-engineered approach :-) Cheers, Chris -- Dr. Christoph Steinbeck Head of Chemoinformatics and Metabolism European Bioinformatics Institute (EBI) Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD UK Phone +44 1223 49 2640 What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. On 14 Jun 2011, at 12:28, Andrew Dalke wrote: > On Jun 14, 2011, at 12:55 PM, Egon Willighagen wrote: >> For 'simple' systems it indeed is an overkill... for other systems, >> combinatorial combinations is needed to detect aromaticity, like >> azulene. > > The goal is to find at least one aromatic ring for each bond. > My point is that that doesn't mean finding all rings first. > > In the worst case (lots of sp2 atoms which are in complex > ring systems but not in aromatic rings) then the algorithm I > gave ends up finding all rings; so yes, the essential combinitorics > are unescapable. The advantage is that it can short-circuit > in more cases. > > > Andrew > da...@da... > > > > ------------------------------------------------------------------------------ > EditLive Enterprise is the world's most technically advanced content > authoring tool. Experience the power of Track Changes, Inline Image > Editing and ensure content is compliant with Accessibility Checking. > http://p.sf.net/sfu/ephox-dev2dev > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel |
From: Egon W. <ego...@gm...> - 2011-06-19 07:17:24
|
On Thu, Jun 16, 2011 at 11:04 AM, Christoph Steinbeck <ste...@eb...> wrote: > Thanks, Andrew, for pointing this out. Seems so obvious but we always went for the over-engineered approach :-) No, I would not phrase it like that; the current aromaticity searching reuses components, rather than reinventing the wheel. It uses the (fast) spanning tree to find fused ring systems, counts the rings (using the SSSR), and if <= three and the rings are not found aromatic yet, it uses the AllRingsFinder to larger aromatic rings. The ring counting can be improved, using atom and bond count. And what it should do, is not find fused ring systems, but find fused, potentially aromatic ring systems. The latter is what I was talking about, in reply to Andrew's suggestion. This will not remove the need for the AllRingsFinder. If anything, introducing this will further engeneer things into a more complex framework, rather than simplify it. Aromaticity is not a concept that allows for simple algorithms. The most basic Hueckel approach does, but is only defined for a single ring, ignoring sprouting with double bonds etc. Andrew, maybe we should write up a review on aromaticity in cheminformatics? Set up a simple test set of corner cases, describe the algorithms around, and show their limitations, using Open Source implementations? Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Christoph S. <ste...@eb...> - 2011-06-20 11:18:15
|
Thanks for the correction. You are of course completely right. Cheers, Chris -- Dr. Christoph Steinbeck Head of Chemoinformatics and Metabolism European Bioinformatics Institute (EBI) Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD UK Phone +44 1223 49 2640 What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. On 19 Jun 2011, at 08:16, Egon Willighagen wrote: > On Thu, Jun 16, 2011 at 11:04 AM, Christoph Steinbeck > <ste...@eb...> wrote: >> Thanks, Andrew, for pointing this out. Seems so obvious but we always went for the over-engineered approach :-) > > No, I would not phrase it like that; the current aromaticity searching > reuses components, rather than reinventing the wheel. > > It uses the (fast) spanning tree to find fused ring systems, counts > the rings (using the SSSR), and if <= three and the rings are not > found aromatic yet, it uses the AllRingsFinder to larger aromatic > rings. > > The ring counting can be improved, using atom and bond count. And what > it should do, is not find fused ring systems, but find fused, > potentially aromatic ring systems. The latter is what I was talking > about, in reply to Andrew's suggestion. This will not remove the need > for the AllRingsFinder. > > If anything, introducing this will further engeneer things into a more > complex framework, rather than simplify it. Aromaticity is not a > concept that allows for simple algorithms. The most basic Hueckel > approach does, but is only defined for a single ring, ignoring > sprouting with double bonds etc. > > Andrew, maybe we should write up a review on aromaticity in > cheminformatics? Set up a simple test set of corner cases, describe > the algorithms around, and show their limitations, using Open Source > implementations? > > Egon > > > -- > Dr E.L. Willighagen > Postdoctoral Researcher > Institutet för miljömedicin > Karolinska Institutet (http://ki.se/imm) > Homepage: http://egonw.github.com/ > LinkedIn: http://se.linkedin.com/in/egonw > Blog: http://chem-bla-ics.blogspot.com/ > PubList: http://www.citeulike.org/user/egonw/tag/papers > > ------------------------------------------------------------------------------ > EditLive Enterprise is the world's most technically advanced content > authoring tool. Experience the power of Track Changes, Inline Image > Editing and ensure content is compliant with Accessibility Checking. > http://p.sf.net/sfu/ephox-dev2dev > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel |
From: Andrew D. <da...@da...> - 2011-06-19 10:24:25
|
On Jun 19, 2011, at 9:16 AM, Egon Willighagen wrote: > Andrew, maybe we should write up a review on aromaticity in > cheminformatics? Set up a simple test set of corner cases, describe > the algorithms around, and show their limitations, using Open Source > implementations? I am not interested in doing so. The algorithms are already well-enough known that, for example, OpenEye implements not one but multiple families of aromaticity perception, including those from other vendors. One of these is the MMFF aromaticity model, which is well-described and also implemented in OpenBabel. Is your point that that knowledge, no likely described in the literature, just hasn't made its way into CDK? (I haven't looked at RDKit to see how it implements this, so I can't say that it's a general free software issue.) I do not feel like doing that literature research. Such a study would be extremely tedious and I don't understand what the end goal would be. Would it be to develop a better aromaticity model? In which case it would need a diverse set of structures where the aromaticity is known experimentally. Would it show problems in the overall definition of aromaticity, or mostly highlight limitations in the specific implementations of the algorithm? Best regards, Andrew da...@da... |
From: Rajarshi G. <raj...@gm...> - 2011-06-19 12:44:58
|
On Sun, Jun 19, 2011 at 6:24 AM, Andrew Dalke <da...@da...> wrote: > The algorithms are already well-enough known that, for example, > OpenEye implements not one but multiple families of aromaticity > perception, including those from other vendors. Do you have any pointers to the documentation for other aromaticity models? The OE docs are nice, but just give a few examples comparing aromaticity models. Do the MMFF papers describe the aromaticity model? (I can't seem to access them from home today) -- Rajarshi Guha NIH Chemical Genomics Center |
From: Egon W. <ego...@gm...> - 2011-06-19 12:21:10
|
On Sun, Jun 19, 2011 at 12:24 PM, Andrew Dalke <da...@da...> wrote: > On Jun 19, 2011, at 9:16 AM, Egon Willighagen wrote: >> Andrew, maybe we should write up a review on aromaticity in >> cheminformatics? Set up a simple test set of corner cases, describe >> the algorithms around, and show their limitations, using Open Source >> implementations? > > I am not interested in doing so. Well, neither am I really :) > The algorithms are already well-enough known Maybe for OpenEye users, but not in general. At least the whole ideas that there are different definitions of aromaticity seems very much lost in literature, from my perspective. I will try to find time to read up on their documentation, and should probably get myself academic licenses for OpenEye and other proprietary tools. (I have not made time yet to read through all licenses to make sure I am allowed to develop CDK stuff, while having such licenses. This sounds absurd, but has been a problem in the past 8 years!) > that, for example, > OpenEye implements not one but multiple families of aromaticity > perception, including those from other vendors. Good! OpenEye has been doing it right here. The CDK only implements one algorithm. The CDK could use more approaches; I will have to look at what OpenEye does. > One of these is the MMFF aromaticity model, which is well-described > and also implemented in OpenBabel. Happy to hear that, and I am happy to hear you talk about families and models here. Because I have not seen such talk, and precisely how I see this. E.g. this volume calculation paper I just looked at does not describe *which* model/family they are using for aromaticity. This is the big problem here, because their parameters are effected by it. > Is your point that knowledge, no likely described in the > literature, just hasn't made its way into CDK? (I haven't looked > at RDKit to see how it implements this, so I can't say that it's > a general free software issue.) One point indeed is that the CDK implements one model right now, and I welcome alternative methods. > Such a study would be extremely tedious and I don't understand > what the end goal would be. Would it be to develop a better > aromaticity model? In which case it would need a diverse set > of structures where the aromaticity is known experimentally. That would in fact be a very good goal, but the problem here is indeed that such experimental data is not omnipresent. > Would it show problems in the overall definition of aromaticity, > or mostly highlight limitations in the specific implementations > of the algorithm? I think it would be good for the larger cheminformatics community to actually understand 'aromaticity', because it is one significant source of incompatibility between toolkits right now. This would be tedious, and not directly resulting in new applications. I would help the community, though. Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Egon W. <ego...@gm...> - 2011-06-19 12:33:21
|
On Sun, Jun 19, 2011 at 2:20 PM, Egon Willighagen <ego...@gm...> wrote: >> The algorithms are already well-enough known > > Maybe for OpenEye users, but not in general. At least the whole ideas > that there are different definitions of aromaticity seems very much > lost in literature, from my perspective. The relevant documentation is at: http://www.eyesopen.com/docs/toolkits/html/OEChemTK-python/aromaticity.html#section-aromaticity-models OpenEye in fact does keep a really good online presence of documentation; I should be reading that much more. Thanx for reminding me of that :) Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |