From: Joerg K. W. <we...@in...> - 2004-05-10 17:00:13
|
Hi all, >>Are we (QSAR, CDK, JOELib, Octet, Jumbo) trying to do too much at the same >>time? > Maybe, but I think that things are going fine as they go now... we approach > things step by step... I guess we are mostly just glueing existing tools > together... Maybe ... step by step ... and we need at first a common merged interface, before any concrete implemention helps us to improve the actual design. >>(3) Wouldn't it be even more useful if project we're planning interacted >>with a single "standard" Java API for accessing and manipulating Molecular >>information? >>(4) Yes it would, > focus on chemical entities only... very difficult to make the 'single > standard API'... Chemistry is too fuzzy, too broad... > But this does not mean that we can define 'a standard Java API' which > glues together a few existing projects... Let's start with the 'glued' interface, if people have plans to write their own implementation, they can do that. But at first we must find a common interface... combining actual available open source projects may be at a later stage be interesting. >>but such a thing doesn't exist! How can we ensure that >>the new API will be general enough, robust, and useful? > I don't think we can... At the moment, i don't think we have time ... hey, these are open source projects, so in future we have the ability to refactor things ... >>My point is this: would it be useful to tackle the problem of developing a >>single standard Molecular API separately from the development of a QSAR >>framework? > Interesting, but I don't think we can easily come up with the solution to this > problem... (if it was easy, it was already done...) Correct, of course is refactoring much more easy than developing functionality, but there are still some really nasty problems out there, so i'm optimistic that we can iterate to a common interface and a common API, but this will need time ... it's still hard enough to maintain the actual available projects, because there are still some open performance-problems or bad-designs in them. And simply 'merging' the functionality is difficult, because it may demand a difficult refactoring. You surely know the actual LinesOfCode: ChemicalMarkupLanguge: 30285 CDK: 43772 JOELib: 63761 http://pmd.sourceforge.net/scoreboard.html So, assuming that a good developer reads 1000 LOC/day and understands them and all the dependencies, he will need 30+44+64=138 days (4 1/2) months to understand all the projects, then he can start with refactoring and testing, so ... hope you get paid for one year producing nothing :-) So are LOC a good measure for productivity ? No, but ... that's another problem, and out of the QSAR project focus. > Interesting, too... OpenBabel is struggling with atom types in file conversion > (i.e., I think they still are...)... which indicates only part of the > problems... I've discussed this topic with Geoff, but as always ... there are some other things to do, but we have exactly the same chemistry 'kernels', but this was checked 'by hand', because we have partially hard-coded assignment algorithms, so still suboptimal. > Jakarta is a much simpler working area... all the results are artificial... > that is, they don't have to match with nature... so they don't really care on > how things should be interpreted, only that they work... I agree ... chemoinformatics is still strongly connected to science, because we need still standards, which are in progress ... CML, 'expert systems', interfaces, ... Unfortunately, as already critisized by Kubyini (or at least cited by him) the contribution of the pharmaceutical industry could be higher in helping to set a standard. So, refactoring helps me not to publish papers and does not help pharmaceutical industry to reduce their data piles, of course for the future it can be helpfull, but financial pressure might be high for them and for us ... so who cares about a good hypothetical standard in the future which faciliates the maintenance ? So let's work with shell-scripts, they are fast and have an included copy protection, but that's unrealistic :-) As already said by Egon ... let's iterate ... step by step ... nothing is exluded ... but also nothing should be included too early ... Kind regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |