From: rich a. <che...@ya...> - 2004-05-09 22:05:52
|
Hello All, I almost don't want to bring this up because the discussion around the QSAR project is pretty involved as it is. But I can't resist... Egon, your comment about QSAR being a "meta project" hit home with me in a big way. The thought occurs: Are we (QSAR, CDK, JOELib, Octet, Jumbo) trying to do too much at the same time? Here's my impression of the line of discussion that led to where we are now (which I believe is a good place, by the way): (1) Wouldn't it useful to have an open-source project devoted exclusively to QSAR with open implementations based on existing projects, a GUI, and which makes use of open-source data mining tools (such as weka)? (2) Yes it would. (3) Wouldn't it be even more useful if project we're planning interacted with a single "standard" Java API for accessing and manipulating Molecular information? (4) Yes it would, but such a thing doesn't exist! How can we ensure that the new API will be general enough, robust, and useful? How can we meet this objective AND minimize refactorings of existing cheminformatics projects to accomodate this new API? This is where we are now, in my view. The problem is, just tackling point (4) will be a very big job in itself. My point is this: would it be useful to tackle the problem of developing a single standard Molecular API separately from the development of a QSAR framework? Would it be even more helpful to devote a separate project toward cheminformatics standardization and/or integration in general? This project could start off by trying address our point (4), but could easily expand to deal with any number of standardization/integration issues currently plaguing cheminformatics research. The focus of the project needn't be Java-centric either, although it would probably start out that way. As a model for such an effort, how about the Apache Jakarta project (http://jakarta.apache.org/)? This project nicely ties together a lot of technologies and serves as an essential resource for experienced developers and newcomers alike. More importantly, experiences in one project often lead to new projects that address novel problems. Any thoughts? cheers, rich Egon Willighagen <eg...@sc...> wrote: The interfaces and the wrappers can be in Octet, but personally I prefer to do this is the common, implementation neutral, QSAR project... The compile scheme is identical, the only difference is where people get added as developer. I prefer this setup, because it more clearly shows that the QSAR part is sort of meta project which tries to connect available OS tools for QSAR research. Please comment. --------------------------------- Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs |
From: E.L. W. <eg...@sc...> - 2004-05-10 12:26:47
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday 10 May 2004 00:05, rich apodaca wrote: > I almost don't want to bring this up because the discussion around the QS= AR > project is pretty involved as it is. But I can't resist... > > Egon, your comment about QSAR being a "meta project" hit home with me in a > big way. > > The thought occurs: > > Are we (QSAR, CDK, JOELib, Octet, Jumbo) trying to do too much at the same > time? Maybe, but I think that things are going fine as they go now... we approach= =20 things step by step... I guess we are mostly just glueing existing tools=20 together... > Here's my impression of the line of discussion that led to where we are n= ow > (which I believe is a good place, by the way): > > (1) Wouldn't it useful to have an open-source project devoted exclusively > to QSAR with open implementations based on existing projects, a GUI, and > which makes use of open-source data mining tools (such as weka)? > > (2) Yes it would. Yes, that's a nice summary of the goal of qsar.sf.net :) > (3) Wouldn't it be even more useful if project we're planning interacted > with a single "standard" Java API for accessing and manipulating Molecular > information? > > (4) Yes it would,=20 Mmmm... people have tried that... there are some articles in which they onl= y=20 focus on chemical entities only... very difficult to make the 'single=20 standard API'... Chemistry is too fuzzy, too broad... But this does not mean that we can define 'a standard Java API' which glues= =20 together a few existing projects... > but such a thing doesn't exist! How can we ensure that > the new API will be general enough, robust, and useful?=20 I don't think we can... > How can we meet > this objective AND minimize refactorings of existing cheminformatics > projects to accomodate this new API? > > This is where we are now, in my view. The problem is, just tackling point > (4) will be a very big job in itself. Agreed. And I do not think we should make this our focus... I very much lik= ed=20 your suggestion of spliting up API's which can be merges for some specific= =20 application...: Very basic Atom API 3DRenderingAPI 2DRenderingAPI > My point is this: would it be useful to tackle the problem of developing a > single standard Molecular API separately from the development of a QSAR > framework? Interesting, but I don't think we can easily come up with the solution to t= his=20 problem... (if it was easy, it was already done...) > Would it be even more helpful to devote a separate project toward > cheminformatics standardization and/or integration in general? This proje= ct > could start off by trying address our point (4), but could easily expand = to > deal with any number of standardization/integration issues currently > plaguing cheminformatics research. The focus of the project needn't be > Java-centric either, although it would probably start out that way. Interesting, too... OpenBabel is struggling with atom types in file convers= ion=20 (i.e., I think they still are...)... which indicates only part of the=20 problems... But, I think doing this for the QSAR field only, reduces the problem size, = and=20 would make an very interesting test case... > As a model for such an effort, how about the Apache Jakarta project > (http://jakarta.apache.org/)? This project nicely ties together a lot of > technologies and serves as an essential resource for experienced develope= rs > and newcomers alike. More importantly, experiences in one project often > lead to new projects that address novel problems. > > Any thoughts? Jakarta is a much simpler working area... all the results are artificial...= =20 that is, they don't have to match with nature... so they don't really care = on=20 how things should be interpreted, only that they work... But, the resources that such a thing provides is applicable to our situatio= n=20 too... I'm hoping that the qsar.sf.net project can server such a function t= o=20 the QSAR field of science... Egon =2D --=20 eg...@sc... PhD on Molecular Representation in Chemometrics Nijmegen University http://www.cac.sci.kun.nl/people/egonw/ GPG: 1024D/D6336BA6 =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (SunOS) iD8DBQFAn3T9d9R8I9Yza6YRAse7AKCuSJRXMLMoSAxYDtjg8Zk+dvGv5wCgkz98 lZ/LyciBliBj5jzF3tSIwMw=3D =3DdM7G =2D----END PGP SIGNATURE----- |
From: Joerg K. W. <we...@in...> - 2004-05-10 17:00:13
|
Hi all, >>Are we (QSAR, CDK, JOELib, Octet, Jumbo) trying to do too much at the same >>time? > Maybe, but I think that things are going fine as they go now... we approach > things step by step... I guess we are mostly just glueing existing tools > together... Maybe ... step by step ... and we need at first a common merged interface, before any concrete implemention helps us to improve the actual design. >>(3) Wouldn't it be even more useful if project we're planning interacted >>with a single "standard" Java API for accessing and manipulating Molecular >>information? >>(4) Yes it would, > focus on chemical entities only... very difficult to make the 'single > standard API'... Chemistry is too fuzzy, too broad... > But this does not mean that we can define 'a standard Java API' which > glues together a few existing projects... Let's start with the 'glued' interface, if people have plans to write their own implementation, they can do that. But at first we must find a common interface... combining actual available open source projects may be at a later stage be interesting. >>but such a thing doesn't exist! How can we ensure that >>the new API will be general enough, robust, and useful? > I don't think we can... At the moment, i don't think we have time ... hey, these are open source projects, so in future we have the ability to refactor things ... >>My point is this: would it be useful to tackle the problem of developing a >>single standard Molecular API separately from the development of a QSAR >>framework? > Interesting, but I don't think we can easily come up with the solution to this > problem... (if it was easy, it was already done...) Correct, of course is refactoring much more easy than developing functionality, but there are still some really nasty problems out there, so i'm optimistic that we can iterate to a common interface and a common API, but this will need time ... it's still hard enough to maintain the actual available projects, because there are still some open performance-problems or bad-designs in them. And simply 'merging' the functionality is difficult, because it may demand a difficult refactoring. You surely know the actual LinesOfCode: ChemicalMarkupLanguge: 30285 CDK: 43772 JOELib: 63761 http://pmd.sourceforge.net/scoreboard.html So, assuming that a good developer reads 1000 LOC/day and understands them and all the dependencies, he will need 30+44+64=138 days (4 1/2) months to understand all the projects, then he can start with refactoring and testing, so ... hope you get paid for one year producing nothing :-) So are LOC a good measure for productivity ? No, but ... that's another problem, and out of the QSAR project focus. > Interesting, too... OpenBabel is struggling with atom types in file conversion > (i.e., I think they still are...)... which indicates only part of the > problems... I've discussed this topic with Geoff, but as always ... there are some other things to do, but we have exactly the same chemistry 'kernels', but this was checked 'by hand', because we have partially hard-coded assignment algorithms, so still suboptimal. > Jakarta is a much simpler working area... all the results are artificial... > that is, they don't have to match with nature... so they don't really care on > how things should be interpreted, only that they work... I agree ... chemoinformatics is still strongly connected to science, because we need still standards, which are in progress ... CML, 'expert systems', interfaces, ... Unfortunately, as already critisized by Kubyini (or at least cited by him) the contribution of the pharmaceutical industry could be higher in helping to set a standard. So, refactoring helps me not to publish papers and does not help pharmaceutical industry to reduce their data piles, of course for the future it can be helpfull, but financial pressure might be high for them and for us ... so who cares about a good hypothetical standard in the future which faciliates the maintenance ? So let's work with shell-scripts, they are fast and have an included copy protection, but that's unrealistic :-) As already said by Egon ... let's iterate ... step by step ... nothing is exluded ... but also nothing should be included too early ... Kind regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |