From: Peter Murray-R. <pm...@ca...> - 2004-04-17 13:05:50
|
At 09:52 17/04/2004 +0200, Joerg Wegner wrote: >Dear Nina Nikolova, >Dear All, > >please reply also to the JOELib mailing list and ... i've already >published three papers about QSAR and our group has it's main focus on >data mining and optimization algorithms, so i think i've some experience >in this area, too. >http://www-ra.informatik.uni-tuebingen.de/ It seems there is general agreement that an SF project in this area is valuable and I'll make a few comments which I hope are helpful. Please ignore if they aren't. A. Current QSAR practice has severe problems. They include: - almost all codes are closed. Many are not free. - it is impossible to repeat any experiment. Therefore QSAR ceases to be scientific but relies on reputation, trust and power - the objects used are badly designed, irreproducible and have variable interpretation - data selection is arbitrary. There are few (no?) standard test sets. It is impossible to verify whether data have be modified consciously or unconsciously to increase apparent success - algorithms are closed, even if the data are well defined. B. The mainstream QSAR community is not taking effective steps to remedy the errors. Our current group believes that through an OpenSource approach we can catalyse a change in thinking and practice. We do this by creating a system and practice that demonstrates the increased **quality** available through OpenSource. IMO quality is the most important - more so than platform, language, ease of use, performance, etc. If it is easier and faster to create more garbage on every platform what have we achieved? C. The OpenSource community has made some small, useful steps in this direction. They now wish to pool their efforts and produce a single point of contact for their own development and to show to the world. This does NOT necessarily mean a single program. IMO it is much more likely to mean an infrastructure on which a variety of operations can be carried out ("glueware"?). They wish to create a project at SF which leads to: - active constructive discussion - agreed representation of objects * molecules, atoms, fragments, etc. * descriptors * properties - creation, cataloguing, annotating, high-quality information objects: * dictionaries * properties (e.g. of atoms) * datasets - creation, cataloguing, annotation of algorithms related to QSAR * chemical perception * statistics, optimisation, etc - creation of software: * as toolkit components * as demonstrators of the *quality* of the system That is as far as I have got... I think it's important to be inclusive and I would therefore suggest that we review the current OpenSource efforts in this area. My knowledge extends to: - CDK, etc. - JOELib - OpenBabel - Weka - Nina's work (does this have a label?) In projects of this sort everyone has something to contribute and also something to give up. For example I did a lot of work on visual display of CML (Jumbo3) - and some of this functionality is not provided by other sources. Nevertheless I decided to give up JUMBO3 and use JCP and Jmol for display. JUMBO4.3 has now developed in a more structured form as a flexible XML DOM and Tools library which can be reconfigured easily and rapidly. It is component based rather than application based. I suggest starting not with deciding what program to write but with what the components of a QSAR system are and then deciding what who wants to be involved, we have got and setting some realistic scope to what is achievable. Best P. Peter Murray-Rust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 |