From: Rajarshi G. <rg...@in...> - 2008-08-20 21:32:07
|
Hi, the ACS is finally ended and I can get back to code! I noticed in the commits that the QM code has its own module. What is the role of QM code in the CDK? Do we seriously plan to implement QM/semi- emperical algo's? On a related note, why does the CDK need an implementation of a PDB reader? Why not use BioJava and provide a transformer class that converts the BioJava object to a CDK object? I spoked with Raphael in Philly, and he noted that round tripping does not work with the PDB (PDBWriter makes files that PDBReader cannot read). I can see the CDK being used in metabolomics and hence the need for PDB support - but given the format can be fiddly, why not just use BioJava? ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- So the Zen master asked the hot-dog vendor, "Can you make me one with everything?" - TauZero on Slashdot |
From: gilleain t. <gil...@gm...> - 2008-08-20 22:19:22
|
Hi, About the PDBReader; I also noticed a problem with it (now fixed) to do with the CONECT records, but I agree that it makes sense for a chemistry development kit to focus on chemical, not biological formats. The PDB fixed-width files (as opposed to mmCIF or the xml file formats) are notoriously difficult to handle because (I understand) some crystallographers do odd things with the files. There are cleaned versions of the data, but that can't be checked for or guaranteed. Even though macromolecules are of course chemicals, I would also vote for using the Biojava classes. This is part of a bigger issue that the Bioclipse project faces of different projects having different core models for the data. JMol uses its own classes for macromolecules, which are not CDK Molecules. In fact, an even bigger question (off topic, really) is the large amount of unused and duplicated classes from bioclipse plugins in the project. I would be interested to know what the smallest size it could be is. But that's a tale for another time, and another mailing list, I suppose :) gilleain On Wed, Aug 20, 2008 at 10:31 PM, Rajarshi Guha <rg...@in...> wrote: > Hi, the ACS is finally ended and I can get back to code! I noticed in > the commits that the QM code has its own module. What is the role of > QM code in the CDK? Do we seriously plan to implement QM/semi- > emperical algo's? > > On a related note, why does the CDK need an implementation of a PDB > reader? Why not use BioJava and provide a transformer class that > converts the BioJava object to a CDK object? I spoked with Raphael in > Philly, and he noted that round tripping does not work with the PDB > (PDBWriter makes files that PDBReader cannot read). > > I can see the CDK being used in metabolomics and hence the need for > PDB support - but given the format can be fiddly, why not just use > BioJava? > > ------------------------------------------------------------------- > Rajarshi Guha <rg...@in...> > GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 > ------------------------------------------------------------------- > So the Zen master asked the hot-dog vendor, > "Can you make me one with everything?" > - TauZero on Slashdot > > > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel > |
From: Egon W. <ego...@gm...> - 2008-08-21 05:12:22
|
On Thu, Aug 21, 2008 at 12:19 AM, gilleain torrance <gil...@gm...> wrote: > About the PDBReader; I also noticed a problem with it (now fixed) to > do with the CONECT records, but I agree that it makes sense for a > chemistry development kit to focus on chemical, not biological > formats. Que? Protein structures are molecules... *chemical* molecules. > The PDB fixed-width files (as opposed to mmCIF or the xml > file formats) are notoriously difficult to handle because (I > understand) some crystallographers do odd things with the files. And the other half of the world does strange things too... oh, and even the PDB database guys do really strange things to those files. As said in the other email, PDB files are documents, much like Word documents. > There are cleaned versions of the data, but that can't be checked for > or guaranteed. > > Even though macromolecules are of course chemicals, Ah. phew... > I would also vote for using the Biojava classes. Before anyone starts writing a patch to the CDK to use whatever other PDB writer, I urge this person to first compare PDB readers: CDK (which will come out worst), Jmol, BioJava, ... and determine how good they are. Does the reader deal well with inserts, multiple sequences, CONECT, reading of the sequence, ... > This is part of a bigger issue that the > Bioclipse project faces of different projects having different core > models for the data. JMol Jmol, lower case 'm'... > uses its own classes for macromolecules which are not CDK Molecules. ... > In fact, an even bigger question (off topic, really) is the large > amount of unused and duplicated classes from bioclipse plugins in the > project. I would be interested to know what the smallest size it could > be is. But that's a tale for another time, and another mailing list, I > suppose :) Another tale indeed... These are really unrelated problems. Egon -- ---- http://chem-bla-ics.blogspot.com/ |
From: Egon W. <ego...@gm...> - 2008-08-21 05:05:48
|
On Wed, Aug 20, 2008 at 11:31 PM, Rajarshi Guha <rg...@in...> wrote: > Hi, the ACS is finally ended and I can get back to code! I noticed in > the commits that the QM code has its own module. Yeah, I'm fixing PMD warnings, and need to make PMD not use the test for Vector->List for the QM code, because it defines a private Vector class (the math one), and the PMD test does not seem to test for java.util bits :( BTW, one advantage of putting a piece of code like this in a module, is that it gets noted :) > What is the role of > QM code in the CDK? Do we seriously plan to implement QM/semi- > emperical algo's? Define 'we'... it was a submitted patch; I think it is a good piece of educational code. That's a goal of the CDK too... > On a related note, why does the CDK need an implementation of a PDB > reader? Because there was nothing better around... > Why not use BioJava and provide a transformer class that > converts the BioJava object to a CDK object? Maintaining convertors is more work than maintaining a reader... that said, the CDK one is not an overly complete one... the PDB format is really messy, and writing one is a munk's job (as we say in NL, but then in Dutch)... > I spoked with Raphael in > Philly, and he noted that round tripping does not work with the PDB > (PDBWriter makes files that PDBReader cannot read). Full roundtripping is outside the scope of the CDK... a PDB file is a document, really, not a chemical format. I'm already happy if it could roundtrip the 3D atomic coordinates... that is something we should be able to get to work. > I can see the CDK being used in metabolomics and hence the need for > PDB support Ummm... I don't often see metabolites in the PDB format... :) > but given the format can be fiddly, why not just use BioJava? Sure 'we' can do that: submit a patch. It's LGPL, etc, so I see no reason why one could not write a convertor for BioJava... I did that once for sequences I think... btw, I was not aware that BioJava did 3D structures nowadays... The Jmol PDB reader might even be a better one... lot's of user community around that PDB reader, likely more than for the BioJava version... Egon -- ---- http://chem-bla-ics.blogspot.com/ |
From: Rajarshi G. <rg...@in...> - 2008-08-21 05:12:38
|
On Aug 21, 2008, at 1:05 AM, Egon Willighagen wrote: >> >> What is the role of >> QM code in the CDK? Do we seriously plan to implement QM/semi- >> emperical algo's? > > Define 'we'... Whoever submitted it > it was a submitted patch; I think it is a good piece of > educational code. That's a goal of the CDK too... I'd argue that if one wanted to learn how to program hartree-fock they'd be better of looking at MPQC, Mopac or GAMESS. My view is that the CDK is meant to be educational in the context of cheminformatics - this is quite a jump from QM. Even though one can argue that the CDK is somewhat bloated, I don't mind bloat wrt cheminformatics, but QM is, IMO, pretty off-topic for the CDK >> Full roundtripping is outside the scope of the CDK... a PDB file is a > document, really, not a chemical format. I'm already happy if it could > roundtrip the 3D atomic coordinates... that is something we should be > able to get to work. Hmm, I agree - a simple implementation that essentially just focuses on 3D coordinates could be useful >> but given the format can be fiddly, why not just use BioJava? > > Sure 'we' can do that: submit a patch. It's LGPL, etc, so I see no > reason why one could not write a convertor for BioJava... I did that > once for sequences I think... btw, I was not aware that BioJava did 3D > structures nowadays... The Jmol PDB reader might even be a better > one... lot's of user community around that PDB reader, likely more > than for the BioJava version... That would work as well - given I don't use PDB files much, I'm probably not the best person to submit a patch. My interest in this is to reduce the scope for bugs, incompleteness etc ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- So the Zen master asked the hot-dog vendor, "Can you make me one with everything?" - TauZero on Slashdot |
From: Egon W. <ego...@gm...> - 2008-08-21 05:20:52
|
On Thu, Aug 21, 2008 at 7:12 AM, Rajarshi Guha <rg...@in...> wrote: > I'd argue that if one wanted to learn how to program hartree-fock they'd be > better of looking at MPQC, Mopac or GAMESS. My view is that the CDK is meant > to be educational in the context of cheminformatics - this is quite a jump > from QM. Even though one can argue that the CDK is somewhat bloated, I don't > mind bloat wrt cheminformatics, but QM is, IMO, pretty off-topic for the CDK This code is CDK history... on what grounds do you want to remove code from the CDK then? What is allowed in an out? >> Full roundtripping is outside the scope of the CDK... a PDB file is a >> document, really, not a chemical format. I'm already happy if it could >> roundtrip the 3D atomic coordinates... that is something we should be >> able to get to work. > > Hmm, I agree - a simple implementation that essentially just focuses on 3D > coordinates could be useful > >>> but given the format can be fiddly, why not just use BioJava? >> >> Sure 'we' can do that: submit a patch. It's LGPL, etc, so I see no >> reason why one could not write a convertor for BioJava... I did that >> once for sequences I think... btw, I was not aware that BioJava did 3D >> structures nowadays... The Jmol PDB reader might even be a better >> one... lot's of user community around that PDB reader, likely more >> than for the BioJava version... > > That would work as well - given I don't use PDB files much, I'm probably not > the best person to submit a patch. My interest in this is to reduce the > scope for bugs, incompleteness etc Right. And until someone comes with a good patch, we'll have to do with the current code. That's the way it works. If we find the code is hopelessly broken, then we could consider fixing it, or giving up. BTW, the PDBWriter was never written for IBioPolymer, or for any roundtripping at all. All Reader/Writer IO classes should be seriously tested for roundtripping anyway. The real question here is: who wants to start the QA project? In all situations with less optimal and partly broken code, it comes down to 'show me the code', which include accurate bug reports, larger analyses, patches, reimplementations, ... Egon -- ---- http://chem-bla-ics.blogspot.com/ |
From: Rajarshi G. <rg...@in...> - 2008-08-21 12:39:47
|
On Aug 21, 2008, at 1:20 AM, Egon Willighagen wrote: > > This code is CDK history... on what grounds do you want to remove code > from the CDK then? What is allowed in an out? I'm not sure what you mean by history - a lot of 'historical' code is changed, removed etc. In this case, if it's not really a cheminformatics task, I don't think it should be in the CDK. Furthermore, code that is clearly not maintained and is only partially complete could be safely removed. In the QM case, the current classes are some basic classes for orbitals etc - to my knowledge there are no implementations of QM methods. So it's definitely incomplete ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- So the Zen master asked the hot-dog vendor, "Can you make me one with everything?" - TauZero on Slashdot |
From: Christoph S. <chr...@go...> - 2008-08-21 15:23:46
|
I think that Rajarhi's point of using BioJava is a very valid one and should be supported. On the other hand, quite some effort was invested in the writing and fixing of our PDBReader. I suggest that, like in so many cases, we adopt the Perl TIMTOWTDI, but document the possibility to use BioJava and a mapper. The question is: Who writes the mapper :-) For the other issue with the QM code, I suggest that it is archived and removed. Cheers, Chris Rajarshi Guha wrote: > On Aug 21, 2008, at 1:20 AM, Egon Willighagen wrote: >> This code is CDK history... on what grounds do you want to remove code >> from the CDK then? What is allowed in an out? > > I'm not sure what you mean by history - a lot of 'historical' code is > changed, removed etc. In this case, if it's not really a > cheminformatics task, I don't think it should be in the CDK. > Furthermore, code that is clearly not maintained and is only > partially complete could be safely removed. In the QM case, the > current classes are some basic classes for orbitals etc - to my > knowledge there are no implementations of QM methods. So it's > definitely incomplete > > ------------------------------------------------------------------- > Rajarshi Guha <rg...@in...> > GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 > ------------------------------------------------------------------- > So the Zen master asked the hot-dog vendor, > "Can you make me one with everything?" > - TauZero on Slashdot > > > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel -- Dr. Christoph Steinbeck Head of Chemoinformatics and Metabolism European Bioinformatics Institute (EBI) Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD UK Phone +44 1223 49 2640 What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. |
From: Rajarshi G. <rg...@in...> - 2008-08-21 15:48:54
|
On Aug 21, 2008, at 11:23 AM, Christoph Steinbeck wrote: > I suggest that, like in so many cases, we adopt the Perl TIMTOWTDI I think that for ease of use for an API, the TIMTOWTDI is not a great policy. In contrast , the Python approach (one recommended, obvious way) is far easier to learn. I personally don't care whether it's BioJava or a CDK implementation. My only concern is that if the CDK goes with it's own implementation, a good deal of effort is required to cover a majority of the PDB features and should also be maintained/ maintainable. Once again I don't personally use PDB files, but given that BioJava focuses on that type of stuff, I am assuming that its implementation is more complete - but I may be totally wrong. In any case, if it is the case that the BioJava code is more complete, that should be the recommended way > The question is: Who writes the mapper :-) Who's using the CDK and PDB files? :) ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- So the Zen master asked the hot-dog vendor, "Can you make me one with everything?" - TauZero on Slashdot |
From: Nina J. <ni...@ac...> - 2008-08-22 05:35:28
|
Rajarshi Guha wrote: > On Aug 21, 2008, at 11:23 AM, Christoph Steinbeck wrote: > > >> I suggest that, like in so many cases, we adopt the Perl TIMTOWTDI >> > > I think that for ease of use for an API, the TIMTOWTDI is not a great > policy. In contrast , the Python approach (one recommended, obvious > way) is far easier to learn. I personally don't care whether it's > BioJava or a CDK implementation. My only concern is that if the CDK > goes with it's own implementation, a good deal of effort is required > to cover a majority of the PDB features and should also be maintained/ > maintainable. > > Once again I don't personally use PDB files, but given that BioJava > focuses on that type of stuff, I am assuming that its implementation > is more complete - but I may be totally wrong. In any case, if it is > the case that the BioJava code is more complete, that should be the > recommended way > > >> The question is: Who writes the mapper :-) >> > > Who's using the CDK and PDB files? :) > or who might be using in the future ... it is hard to predict beforehand all possible ways a library can be used. IMHO, I would prefer Perl way :) Regards, Nina > ------------------------------------------------------------------- > Rajarshi Guha <rg...@in...> > GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 > ------------------------------------------------------------------- > So the Zen master asked the hot-dog vendor, > "Can you make me one with everything?" > - TauZero on Slashdot > > > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel > |
From: Raphael A. B. <rap...@ch...> - 2008-08-27 15:31:55
|
Nina Jeliazkova wrote: > Rajarshi Guha wrote: >> On Aug 21, 2008, at 11:23 AM, Christoph Steinbeck wrote: >> >> >>> I suggest that, like in so many cases, we adopt the Perl TIMTOWTDI .... >> Once again I don't personally use PDB files, but given that BioJava >> focuses on that type of stuff, I am assuming that its implementation >> is more complete - but I may be totally wrong. In any case, if it is >> the case that the BioJava code is more complete, that should be the >> recommended way >> >> >>> The question is: Who writes the mapper :-) >>> >> >> Who's using the CDK and PDB files? :) >> > or who might be using in the future ... it is hard to predict beforehand > all possible ways a library can be used. IMHO, I would prefer Perl way :) hi, meep. i am currently using the pdb reader and writer for some smaller projects :) some quick comments on the issues from my side: 1. biojava / jmol readerwriter vs cdk readerwriter hmm. i don't know what is the better way to do it actually. the major point is how difficult it will be to maintain the internal data structure converters between biojava and cdk (a minor point is the "philosophical question"). and if nobody apart from me is actually using the "biological" pdb file format then it is definitely not worth the effort to establish the converter, integrate biojava into the build process and so on). 2. things we can do immediately - i have some fixes to the writer that enable roundtripping - i have some fixes to the reader to enable the reading of files without full header (many programs actually produce only 3d coordinates in a pdb style). => raphael should commit this to a branch for testing and so on (btw who is raphael? me? oh...) 3. things that are ugly - pdb is ugly because there is a standard, but many break it (.. not the first time i hear this argument). - as far as i can remember the cdk reader relies on a "PROTEIN" containing line in the file and does not do anything if that magic word is not present... simply not relying on this fixes many problems (relates to 2. and "roundtripping"). some conclusions - first i am committing some branched code for reading writing - you can comment on that, any maybe that solves many issues we are currently discussing. (this will not happen until mid of september because i am out of office). - if that does not suit the needs of the larger cdk community we can start a discussion if we want to wrap the biojava reader and writer or do anything else... thanks to rajarshi for starting the discussion :) raphael |
From: Christoph S. <ste...@eb...> - 2008-08-28 14:32:11
|
Hi Raphael, very toughtful email - thanks! I think we should proceed as you suggested (branch, test, merge) Cheers, Chris Raphael A. Bauer wrote: > Nina Jeliazkova wrote: >> Rajarshi Guha wrote: >>> On Aug 21, 2008, at 11:23 AM, Christoph Steinbeck wrote: >>> >>> >>>> I suggest that, like in so many cases, we adopt the Perl TIMTOWTDI > .... >>> Once again I don't personally use PDB files, but given that BioJava >>> focuses on that type of stuff, I am assuming that its implementation >>> is more complete - but I may be totally wrong. In any case, if it is >>> the case that the BioJava code is more complete, that should be the >>> recommended way >>> >>> >>>> The question is: Who writes the mapper :-) >>>> >>> Who's using the CDK and PDB files? :) >>> >> or who might be using in the future ... it is hard to predict beforehand >> all possible ways a library can be used. IMHO, I would prefer Perl way :) > > hi, > > > meep. i am currently using the pdb reader and writer for some smaller > projects :) > > some quick comments on the issues from my side: > > 1. biojava / jmol readerwriter vs cdk readerwriter > hmm. i don't know what is the better way to do it actually. the major > point is how difficult it will be to maintain the internal data > structure converters between biojava and cdk (a minor point is the > "philosophical question"). and if nobody apart from me is actually using > the "biological" pdb file format then it is definitely not worth the > effort to establish the converter, integrate biojava into the build > process and so on). > > 2. things we can do immediately > - i have some fixes to the writer that enable roundtripping > - i have some fixes to the reader to enable the reading of files without > full header (many programs actually produce only 3d coordinates in a pdb > style). > => raphael should commit this to a branch for testing and so on (btw who > is raphael? me? oh...) > > 3. things that are ugly > - pdb is ugly because there is a standard, but many break it (.. not the > first time i hear this argument). > - as far as i can remember the cdk reader relies on a "PROTEIN" > containing line in the file and does not do anything if that magic word > is not present... simply not relying on this fixes many problems > (relates to 2. and "roundtripping"). > > > some conclusions > - first i am committing some branched code for reading writing - you can > comment on that, any maybe that solves many issues we are currently > discussing. (this will not happen until mid of september because i am > out of office). > - if that does not suit the needs of the larger cdk community we can > start a discussion if we want to wrap the biojava reader and writer or > do anything else... > > > thanks to rajarshi for starting the discussion :) > > raphael > > > > > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel -- Dr. Christoph Steinbeck Head of Chemoinformatics and Metabolism European Bioinformatics Institute (EBI) Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD UK Phone +44 1223 49 2640 What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. |
From: Egon W. <ego...@gm...> - 2008-08-22 05:52:35
|
On Thu, Aug 21, 2008 at 5:23 PM, Christoph Steinbeck <chr...@go...> wrote: > For the other issue with the QM code, I suggest that it is archived and > removed. For what reasons? I really don't like the idea that we remove code just because it is out of fashion... There is *a* *lot* *of* code that has known bugs or even was plain broken, is less well written, and as such much more candidate for removal... I think the community decides what is the scope of the CDK, and it this QM code was accepted before, and then determines the scope of the CDK... Actually, I have zero problems if someone contributes JMopac to the CDK... (though we now have raised standards, which ensure easier maintainability..) Egon -- ---- http://chem-bla-ics.blogspot.com/ |
From: Christoph S. <ste...@eb...> - 2008-08-22 06:50:22
|
Egon Willighagen wrote: > For what reasons? I really don't like the idea that we remove code > just because it is out of fashion... "Out of fashion" is does not quite hit is. We all know that it is toy code. Good for educational use - you are right - and this reason is good enought to keep it. *My* preference is to remove it but my opinion is only one of many. > I think the community decides what is the scope of the CDK Right - and I'm part of this community, give my cents to Rajarshi's question, and make my own suggestion. Cheers, Chris -- Dr. Christoph Steinbeck Head of Chemoinformatics and Metabolism European Bioinformatics Institute (EBI) Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD UK Phone +44 1223 49 2640 What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. |
From: Nina J. <ni...@ac...> - 2008-08-22 12:31:15
|
Egon Willighagen wrote: > On Fri, Aug 22, 2008 at 8:49 AM, Christoph Steinbeck > <ste...@eb...> wrote: > >> Egon Willighagen wrote: >> >>> For what reasons? I really don't like the idea that we remove code >>> just because it is out of fashion... >>> >> "Out of fashion" is does not quite hit is. We all know that it is toy code. >> Good for educational use - you are right - and this reason is good enought >> to keep it. *My* preference is to remove it but my opinion is only one of >> many. >> > > So, we never really had this situation before... what would be the > proper next step? I suggest we contact Stephan and ask him what he > thinks... > > >>> I think the community decides what is the scope of the CDK >>> >> Right - and I'm part of this community, give my cents to Rajarshi's >> question, and make my own suggestion. >> > > Sure. I guess what worries me from this discussion is the lack of > 'why' it should be removed. Define 'toy code'... there is a lot of toy > code... we all know that a lot of the code in CDK is a rough, good > first go at things, some algorithms excepted... there simply is quick > and dirty code in the CDK... > My two cents: Is it possible to define a module -branch -whatever for any kind of "incomplete" or "toy" code so that is clearly not included in the stable distribution, but not deleted? Wasn't there an "experimental" module in CDK? Regards, Nina > I just like to see some more well-defined rules... > > Egon > > |
From: Egon W. <ego...@gm...> - 2008-08-22 12:33:58
|
On Fri, Aug 22, 2008 at 2:30 PM, Nina Jeliazkova <ni...@ac...> wrote: > Is it possible to define a module -branch -whatever for any kind of > "incomplete" or "toy" code All modules that are not, so called, CDK stable, or incomplete/toy/whatever... so, most modules. Experimental was the stuff people are working on, for which we now have branches... > so that is clearly not included in the stable distribution, but not deleted? That would remove really a lot of code. > Wasn't there an "experimental" module in CDK? Yes, all stuff that was still in there is now in the extra module... Egon -- ---- http://chem-bla-ics.blogspot.com/ |
From: Christoph S. <ste...@eb...> - 2008-08-26 14:50:20
|
Nina Jeliazkova wrote: > Is it possible to define a module -branch -whatever for any kind of > "incomplete" or "toy" code so that is clearly not included in the stable > distribution, but not deleted? This is actually what I meant by "archive and remove". This should be available and well documented but not necessarily be part of the main CDK distribution, IMHO. And of course, Egon is also right when saying: We need clear rules. Cheers, Chris -- Dr. Christoph Steinbeck Head of Chemoinformatics and Metabolism European Bioinformatics Institute (EBI) Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD UK Phone +44 1223 49 2640 What is man but that lofty spirit - that sense of enterprise. ... Kirk, "I, Mudd," stardate 4513.3.. |
From: Andrew D. <da...@da...> - 2008-08-27 23:04:01
|
On Aug 21, 2008, Egon Willighagen wrote (responding to Rajarshi Guha): > Full roundtripping is outside the scope of the CDK... a PDB file is a > document, really, not a chemical format. I've been thinking about this distinction, as someone who has written several PDB parsers, and other parsers. I don't understand the nuance you're pointing out. What do you mean by document here? Is CML, with its ability to embed other data, also a document rather than a chemical format? What biases a given file type more towards one or the other? Do the tag fields in an SD file make it somewhat of a document? Are all Gaussian output files documents? >> I can see the CDK being used in metabolomics and hence the need for >> PDB support > > Ummm... I don't often see metabolites in the PDB format... :) That's because most people doing metabolomics treat molecules as identifier names only and don't track atom level details, much less use something like reaction SMILES. Still, a quick Google search for the words "metabolites in the PDB format" ;) found the Human Metabolome Database, and a search of it for "Isocitric acid" found this record http://hmdb.ca/scripts/show_card.cgi?METABOCARD=HMDB00193.txt with a link to the isocitric acid as a standalone/synthetic PDB and a link to structure 1b0j, which is CRYSTAL STRUCTURE OF ACONITASE WITH ISOCITRATE Out of curiosity, I also checked KEGG. There are a few PDB-linked records, for example: http://www.genome.jp/dbget-bin/www_bget?compound+C00167 which links to the "PDB-CCD" record (first time I heard of that term; "PDB Chemical Component Dictionary") for UGA at http://www.ebi.ac.uk/msd-srv/msdchem/cgi-bin/cgi.pl? FUNCTION=getByCode&CODE=UGA and has a way to get the ligand as a PDB file, as well as a link to "In PDB Entries" which lists PDB files containing that ligand, at http://www.ebi.ac.uk/msd-srv/msdchem/cgi-bin/cgi.pl? FUNCTION=relation&PARENTENTITY=CHEM_COMP&APPLICATION=1&ENTITY=COMP_OCCUR ENCES&RELATIONID=3193&PARENTINDEX=0&PARENT0=:UGA%20UGA%20: > btw, I was not aware that BioJava did 3D structures nowadays... According to the CVS logs for org/biojava/bio/structure/io/PDBFileParser.java revision 1.3 date: 2005/12/06 15:08:10; author: andreas; state: Exp; lines: +661 -647 added a check that ignores empty lines that some people might have at the end of their (local) PDB files. ---------------------------- revision 1.2 date: 2005/04/14 12:24:58; author: andreas; state: Exp; lines: +1 -1 made convert_3code_1code public ---------------------------- revision 1.1 date: 2004/10/25 20:37:08; author: andreas; state: Exp; added PDBFileParser as independent class There's a BioJava-based structure viewer called SPICE that Andreas Prlic (the one mentioned in the CVS logs) has been working on. It's pretty widely used at the EBI and the Sanger Institute. See: http://www.efamily.org.uk/software/dasclients/spice/ I first saw it at ISMB Detroit, I think, so in 2005. It's meant more for sequence/structure comparisons and feature annotations, which aren't things that small molecule viewers typically care about. Just like large molecule structure viewers don't all care about things like bond orders. ;) It does seem sometimes like the big molecule and small molecule people are in mostly disjoint fields. > The Jmol PDB reader might even be a better > one... lot's of user community around that PDB reader, likely more > than for the BioJava version... and there's some integration already between the two, as for example: http://www.biojava.org/wiki/BioJava:CookBook:PDB:Jmol > Andrew da...@da... |
From: Egon W. <ego...@gm...> - 2008-08-22 12:22:49
|
On Fri, Aug 22, 2008 at 8:49 AM, Christoph Steinbeck <ste...@eb...> wrote: > Egon Willighagen wrote: >> For what reasons? I really don't like the idea that we remove code >> just because it is out of fashion... > > "Out of fashion" is does not quite hit is. We all know that it is toy code. > Good for educational use - you are right - and this reason is good enought > to keep it. *My* preference is to remove it but my opinion is only one of > many. So, we never really had this situation before... what would be the proper next step? I suggest we contact Stephan and ask him what he thinks... >> I think the community decides what is the scope of the CDK > > Right - and I'm part of this community, give my cents to Rajarshi's > question, and make my own suggestion. Sure. I guess what worries me from this discussion is the lack of 'why' it should be removed. Define 'toy code'... there is a lot of toy code... we all know that a lot of the code in CDK is a rough, good first go at things, some algorithms excepted... there simply is quick and dirty code in the CDK... I just like to see some more well-defined rules... Egon -- ---- http://chem-bla-ics.blogspot.com/ |
From: Rajarshi G. <rg...@in...> - 2008-08-22 13:00:58
|
On Aug 22, 2008, at 8:22 AM, Egon Willighagen wrote: > Sure. I guess what worries me from this discussion is the lack of > 'why' it should be removed. Define 'toy code'... there is a lot of toy > code... we all know that a lot of the code in CDK is a rough, good > first go at things, some algorithms excepted... there simply is quick > and dirty code in the CDK... > > I just like to see some more well-defined rules... Good idea - but a tough job :) Regarding the QM code: * If it's educational, what are the pedagogical goals of the code? * Has the code been used anywhere? * Does the code work? (There is a unit test, but it's just an example of usage and not a test of the calculated values) I'm still of the opinion that QM is out of the purview of cheminformatics From the cheminformatics side I agree that some code is buggy/ incomplete and I think the new policy of branch based development is excellent for this. If it's incomplete it stays in a branch till it's done. ------------------------------------------------------------------- Rajarshi Guha <rg...@in...> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- So the Zen master asked the hot-dog vendor, "Can you make me one with everything?" - TauZero on Slashdot |