From: Rajarshi G. <rx...@ps...> - 2005-11-24 17:22:52
|
Sorry for the delayed reply. On Sun, 2005-11-20 at 21:21 +0100, Egon Willighagen wrote: > On Saturday 19 November 2005 22:11, Rajarshi Guha wrote: > > > OK, which meta data fields do you currently use? > > > > > >From the XML dictionary I was using the following tags: > > > > <metadataList > > dictRef="qsar-descriptors-metadata:descriptorClassification"> <metadata > > dictRef="qsar-descriptors-metadata:descriptorType" > > content="qsar-descriptors-metadata:molecularDescriptor"/> <metadata > > dictRef="qsar-descriptors-metadata:descriptorClass" > > content="qsar-descriptors-metadata:hybridDescriptor"/> </metadataList> > > OK, these are now: > > <isClassifiedAs rdf:resource="#constitutionalDescriptor"/> > <isClassifiedAs rdf:resource="#molecularDescriptor"/> > > I think both are just two sorts of classification. Right > > > OK, we need to extend this in some way, that is general enough... will > > > add two fields for <description> and <definition> which seems general > > > enough. > > > > That sounds good. > > > > > We don't have immediate need for <dc:*> stuff in CDK, right? > > > > Right, though in the interests of generality, would it not be possible > > to simply stuff the <annotation></annotation> hierarchy into a field of > > the Entry class? > > That would be possible... but would prefer to do all parsing as soon as > possible... OK > > A String version as in the whole BibTeX entry as a single String? I > > would rather have a BibTeXEntry object out of which I could pull the > > various components. > > Will be a XOM node structure, OK? That'd be fine > > > > Alternatively, if an Entry can be a tree-like structure where each node > > > > in the tree lets me get the attribute name and attribute value, that > > > > would be fine. However as I noted above, I think this boils down to > > > > using SAX (or XOM) directly. > > > > > > That's the other option... just dump the XOM nodes into the Entry... > > > > > > OK, I hacked my earlier API, and added get/setRawContent(). OK with that? > > > > I'll take a look Which class has these methods? I just synced with CVS and Entry does not have these methods > > Would it be a good idea if I made a QSARDictionary class that is a > > subclass of the Dictionary. The idea would be that you would read in a > > OWL dictionary using the methods above - that way the actual > > unmarshalling is not specific to a dictionary. > > Yes. The DictionaryDatabase currently sets the dict name and input type (STTML > ("xml") or OWL). We should add a third String array that indicates what type > to read it in. Will think about this. > > > Then if I am in a QSAR application, I could then process the raw > > dictionary to a QSARDictionary. > > Better: that will be done by the DescriptorDatabase itself. All you the user > needs to know is what type the dict is; instanceof can be used for this. Right. However after thinking it seems that the we do not really need a specific Dictionary for QSAR applications. Rather, it would be more logical to have a specific type of Entry for QSAR applications, called, say, QSAREntry which would only contain entries of type <Descriptor></Descriptor> This seems more logical since we will have a variety of dictionaries, but even if they are of different types they will all be dictionaries. The differences will lie in what types of elements they can contain. > > Since you have allowed access to the raw > > content, I could then parse the raw content and fill up various fields, > > without having to deal with the dictionary file itself. > > > > It seems like that this would allow you to keep the general dictionary > > parsing code as well as a generic Dictionary and Entry class, and if > > certain areas of the CDK (or certain apps) need a specific type of > > dictionary a corresponding class (based of Dictionary) would handle the > > specifics. > > > > Does this sound too convoluted? > > Not at all. I'll work out an API. Please suggest a QSAR API based on the above > API, and whatever fields you additionally need. So going by the above description of a QSAREntry class the following methods would be useful (based of the contents of descriptor- algorithms.owl) get/setID() get/setLabel() get/setDefinition() get/setDescription() (The above 4 you've already mentioned) get/setClassification() - this should return a vector containing the value of the rdf:resource attribute of all <isClassifiedAs> tags get/setCitations() - this would return a vector containing the value of the ref attribute for any <bibtex:cite> entries found in the <Descriptor></Descriptor> entry. ------------------------------------------------------------------- Rajarshi Guha <rx...@ps...> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- "355/113 -- Not the famous irrational number PI, but an incredible simulation!" |