From: Egon W. <eg...@us...> - 2005-11-08 21:57:42
|
On Tuesday 08 November 2005 20:52, Rajarshi Guha wrote: > I've been having some issues with the descriptor package and I decided > to throw out a possible solution that could be taken. > > 1. Currently some descriptors (such as AtomDegreeDescriptor) are > atom-based descriptors rather than whole molecule descriptors. Thus > given a molecule, they will return a number for a specific atom in the > molecule rather than for the whole molecule. > > Though such descriptors are useful, I do not think that they properly > represent the concept of 'molecular descriptors'. IMO, there should be a > distinction between these two types of descriptors (if only for the > purposes of automated descriptor calculation) Agreed. > 2. Currently, the evaluation of all descriptors depends on reading the > contents of qsar-descriptors.set. This is nice, but it creates a problem > if a user wants to supply additional descriptor classes via his own jar > file. We could always set up the descriptor engine to read an optional > description file. However a cleaner method that does not depend on > having an extra file around would be nice (especially since the file > simply lists the class names and does not provide extra meta-data, which > is available from the descriptor class itself) The only reason for this setup is to not have to hack in the DescriptorEngine each time a new Descriptor is written. > The above problems could be solved in a clean manner: > > Under the qsar package we could make a descriptors package, What about moldescriptors/ and atomdescriptors/ then? > which would > contain only whole molecule descriptors. The current build script would > remain the same, so that all the descriptors would be under > org/openscience/cdk/qsar/descriptors in the cdk-qsar.jar file Ack. > The reason why this would be nice is in the following situation: assume > a user is writing an app with the CDK. It can be assumed that he will > have the relevent jar files in the class path, so if the app requires > descriptors, cdk-qsar.jar will be in the class path. > > The CDK code can then simply parse the java.class.path System property I have not done this before, but sounds like an excellent plan. We could use that for the file IO too. > to get the cdk-qsar.jar file, enumerate the entries, search for entries > matching the descriptor package and consequently instantiate the > descriptor classes. > > Furthermore, as long as the user writes descriptor class that implement > the Descriptor interface, it would be trivial for his code (or even a > helper routine in the CDK) to load an external jar file (say, a > descriptor plugin) and instantiate those descriptor classes. > > Now, since each descriptor will make a DescriptorSpecification object > available, the new approach can still access and make use of meta-data > (if available) about descriptors as is currently possible (eg in > DescriptorEngine.java) Yes, important. I'm actually in the process of further classifying the descriptors, and will report on this after the 1st german conference on chemoinformatics, next week. > The basic goal (wishful thinking?) is to make something comparable to > Dragon - a simple GUI to evaluate all (or a subset) of descriptors for > an SDF file. I currently have the GUI components available. I just need > a clean and uniform way to evaluate all the descriptors at one go. > > Comments welcome :) Sound good; you've got my support. Won't be able to offer support other then replying to emails, but this should improve in two weeks or so... Egon -- eg...@us... Blog: http://chem-bla-ics.blogspot.com/ GPG: 1024D/D6336BA6 |