|
From: Egon W. <eg...@sc...> - 2004-04-17 15:54:20
|
On Saturday 17 April 2004 15:02, Peter Murray-Rust wrote:
> I suggest starting not with deciding what program to write but with what
> the components of a QSAR system are and then deciding what who wants to be
> involved, we have got and setting some realistic scope to what is
> achievable.
Agreed. This is why we need to set up a SF project where we can write these
things down.
Here's a list:
- building a molecule database
- read from file/internet
- draw yourself one by one, or insert from smiles
- browsing the database with 2D and possible 3D structures
- associate activities/properties with those molecules
- preprocessing
- get mathematical (or other) descriptions of the molecules in the database
- selection of wanted descriptions
- ability to use external programs for this
- descriptor value preprocessing
- statistical analysis of the database (outliers, diversity, etc)
- model building
- chosing method, and method parameters
- model validation
- visual validation -> plots
- statistical validation
I've requested a new SF project ('qsar') yesterday after getting positive
reactions to my proposal earlier.
Joerg, I did not direct you personaly yet, because I vaguely remembered you
stating to be on holiday (?), but I might very well be confused here...
I see JOELib as an important part of the new program: it has many descriptors
implemented, already uses CML2 for storing results, and has an interface to
Weka.
I also see an important part for CDK: 2D editing/display is a very important
feature here. And, I expect, some descriptors will be implemented in CDK
later this year, though this will likely not conflict with those in JOELib.
The reason why I propose CDK's core classes must be obvious.
Hopefully, the QSAR SF project will be approved early next week, and then I
will start adding requirements, analyses, etc to documentation, hopefully
together with the others interested. Then we will see how the available OS
parts fit together.
Egon
|