On Tuesday 08 November 2005 20:52, Rajarshi Guha wrote:
> I've been having some issues with the descriptor package and I decided
> to throw out a possible solution that could be taken.
> 1. Currently some descriptors (such as AtomDegreeDescriptor) are
> atom-based descriptors rather than whole molecule descriptors. Thus
> given a molecule, they will return a number for a specific atom in the
> molecule rather than for the whole molecule.
> Though such descriptors are useful, I do not think that they properly
> represent the concept of 'molecular descriptors'. IMO, there should be a
> distinction between these two types of descriptors (if only for the
> purposes of automated descriptor calculation)
> 2. Currently, the evaluation of all descriptors depends on reading the
> contents of qsar-descriptors.set. This is nice, but it creates a problem
> if a user wants to supply additional descriptor classes via his own jar
> file. We could always set up the descriptor engine to read an optional
> description file. However a cleaner method that does not depend on
> having an extra file around would be nice (especially since the file
> simply lists the class names and does not provide extra meta-data, which
> is available from the descriptor class itself)
The only reason for this setup is to not have to hack in the DescriptorEngine
each time a new Descriptor is written.
> The above problems could be solved in a clean manner:
> Under the qsar package we could make a descriptors package,
What about moldescriptors/ and atomdescriptors/ then?
> which would
> contain only whole molecule descriptors. The current build script would
> remain the same, so that all the descriptors would be under
> org/openscience/cdk/qsar/descriptors in the cdk-qsar.jar file
> The reason why this would be nice is in the following situation: assume
> a user is writing an app with the CDK. It can be assumed that he will
> have the relevent jar files in the class path, so if the app requires
> descriptors, cdk-qsar.jar will be in the class path.
> The CDK code can then simply parse the java.class.path System property
I have not done this before, but sounds like an excellent plan. We could use
that for the file IO too.
> to get the cdk-qsar.jar file, enumerate the entries, search for entries
> matching the descriptor package and consequently instantiate the
> descriptor classes.
> Furthermore, as long as the user writes descriptor class that implement
> the Descriptor interface, it would be trivial for his code (or even a
> helper routine in the CDK) to load an external jar file (say, a
> descriptor plugin) and instantiate those descriptor classes.
> Now, since each descriptor will make a DescriptorSpecification object
> available, the new approach can still access and make use of meta-data
> (if available) about descriptors as is currently possible (eg in
I'm actually in the process of further classifying the descriptors, and will
report on this after the 1st german conference on chemoinformatics, next
> The basic goal (wishful thinking?) is to make something comparable to
> Dragon - a simple GUI to evaluate all (or a subset) of descriptors for
> an SDF file. I currently have the GUI components available. I just need
> a clean and uniform way to evaluate all the descriptors at one go.
> Comments welcome :)
Sound good; you've got my support. Won't be able to offer support other then
replying to emails, but this should improve in two weeks or so...