From: Peter Murray-R. <pm...@ca...> - 2004-04-16 14:16:11
|
At 08:48 16/04/2004 -0400, Rajarshi Guha wrote: >On Fri, 2004-04-16 at 06:08, Nina Nikolova wrote: > > > > > You haven't said anything about dictionaries. IMO this is an > essential part > > > > of the project. These dictionaries should certainly be available to > anyone > > > > and I would wish to see them included in commercial tools as well. > The same > > > > goes for test data sets. > > > > > > Ack. > > > > Apparently dictionary development should be (first) part of the project. > > I am not quite sure how do we start, is there already something available ? > > > >This makes sense (especially since you cant make models without >descriptors :) > >If I understand properly, having various dictionaries allows us to >basically group descriptors. In Dr. Murray-Rusts post he mentiones >dictionaries for JOELib, Randic, Dragon etc. This is one strategy. IIRC Dragon has ca 900 descriptors. They range from Molecular weight to various electrotopological indexes. The entries would be something like: drag:mwt drag:etop23 and so on. it doesn't matter what the IDs are as long as they are unique. Now MWt is common to other programs. However the concept might vary. One program might sum all the average atomicMasses and another might take the largest peak in an HRMS (I don't think this is common in QSAR, but who knows). So it makes sense to create, say, joe:molwt joe:nrot etc. Note that the prefixes don't matter as long as they are mapped to a namespace URI which is constant, e.g. <foo xmlns:drag="http://net.sf.qsar/dict/dragon">... <foo xmlns:joe="http://net.sf.qsar/dict/joelib">... You will need to make sure that you either obtain permission from the original authors to copy program manuals or that you extract the definitions from the open literature. Of course it would be great if the QSAR authors wanted to join but we shouldn't expect this. >How are we going to decide on namespaces? Should it be by descriptor >type (topological, geometrical, informational etc) or by program >(JOELib, Dragon etc) or something else? Anything that makes sense and maintenance easy. In general I would have one main curator per dictionary. When the same concept (e.g. Mwt) occurs in two dictionaries then we can create a communal dictionary which normalises the concept. >One thing that occurs to me is that it is possible for overlap of >descriptors - that is, two namespaces might list the same descriptor. >Would'nt this be a problem? We cannot assume that similar names in different programs are precisely the same concept. For example different programs may give different atom types within a molecule. e.g. which are the C.ar in pyrrole? So we should start off by assuming these are different >For the case of references, does the CML schema (my terminology might be >wrong) allow for a standard way to represent references (journal name, >author, vol, year etc?) No - we tend to use Dublin Core or other schemes. This is an area where there are already many approaches and CML does not add another Peter >------------------------------------------------------------------- >Rajarshi Guha <rx...@ps...> <http://jijo.cjb.net> >GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE >------------------------------------------------------------------- >Chemistry professors never die, they just fail to react. > > > >------------------------------------------------------- >This SF.Net email is sponsored by: IBM Linux Tutorials >Free Linux tutorial presented by Daniel Robbins, President and CEO of >GenToo technologies. Learn everything from fundamentals to system >administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click >_______________________________________________ >Cdk-devel mailing list >Cdk...@li... >https://lists.sourceforge.net/lists/listinfo/cdk-devel Peter Murray-Rust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 |