From: Egon W. <eg...@sc...> - 2004-04-17 15:54:20
|
On Saturday 17 April 2004 15:02, Peter Murray-Rust wrote: > I suggest starting not with deciding what program to write but with what > the components of a QSAR system are and then deciding what who wants to be > involved, we have got and setting some realistic scope to what is > achievable. Agreed. This is why we need to set up a SF project where we can write these things down. Here's a list: - building a molecule database - read from file/internet - draw yourself one by one, or insert from smiles - browsing the database with 2D and possible 3D structures - associate activities/properties with those molecules - preprocessing - get mathematical (or other) descriptions of the molecules in the database - selection of wanted descriptions - ability to use external programs for this - descriptor value preprocessing - statistical analysis of the database (outliers, diversity, etc) - model building - chosing method, and method parameters - model validation - visual validation -> plots - statistical validation I've requested a new SF project ('qsar') yesterday after getting positive reactions to my proposal earlier. Joerg, I did not direct you personaly yet, because I vaguely remembered you stating to be on holiday (?), but I might very well be confused here... I see JOELib as an important part of the new program: it has many descriptors implemented, already uses CML2 for storing results, and has an interface to Weka. I also see an important part for CDK: 2D editing/display is a very important feature here. And, I expect, some descriptors will be implemented in CDK later this year, though this will likely not conflict with those in JOELib. The reason why I propose CDK's core classes must be obvious. Hopefully, the QSAR SF project will be approved early next week, and then I will start adding requirements, analyses, etc to documentation, hopefully together with the others interested. Then we will see how the available OS parts fit together. Egon |