From: Peter Murray-R. <pm...@ca...> - 2004-03-17 10:17:59
|
At 10:46 16/03/2004 +0100, Joerg K. Wegner wrote: >Hi Geoff, > >I've used Egon's code to establish CML2 support for JOELib. (Thank's again >!!!). This includes full stereochemistry support, also for MDL SD were the >missing lines were added to the writer (see attachments)! >pipe: original.sdf -> test.cml -> test.sdf > >As already discussed, this can be a step to 'reproducible' >conversion/descriptorCalculation results. > >CML (metainformations) and SDF (comment) contains now both the ID for the >used chemistry kernel (expert systems). >This number is the hash code for all hard- and soft-coded expert system >informations. >I've modified the JOEGlobalDataBase and all text file definitions. They >contain now all CVS tags: > >VENDOR: http://joelib.sf.net >RELEASE_VERSION: $Revision: 1.4 $ >RELEASE_DATE: $Date: 2004/03/15 13:33:42 $ > >So this is not platform independant, BUT if we can find a way to assign >the same vendor, version and date tag (independent standard organization >or just a combined standard web page for JOELib/OpenBabel!!!) we can get >the same hash codes ! >BTW, the hash code uses the standard hash code calculation for string in Java. This looks very exciting - haven't worked through the details. >So descriptors contain now also a reference to the used kernel, e.g.: > ><cml:scalar dataType="xsd:integer" dictRef="joelib:kernel:715333816" >title="Number_of_HBD_2">0</cml:scalar> Yes - apart from the dictRef - this is great. At present it is required to add integer if you wish to inform a generic processing engine that this is an integer. Otherwise it defaults to an xsd:string. JOElib itself knows the implied semantics and could omit it. The title is optional - it is for humans - machines simply replicate it. So if you are happy to refer to this by joelib:kernel:715333816 you have everything. The string joelib:kernel:715333816 is an XML QName, so it can't have two colons. The prefix has no semantic value and maps to a URI. So the full spec could be: <cml:cml xmlns:cml="http://www.xml-cml.org/schema/cml2/core" xmlns:jk="http:joelib.sf.net/joelib/kernel/dict"> <cml:scalar dataType="xsd:integer" dictRef="jk:a715333816" title="Number_of_HBD_2">0</cml:scalar> </cml:cml> Note that a QNname must have both components starting with a letter, hence: a715333816 >Peter, is this now correct ? > >I will release a new JOELib version the next hours, then i'm going home, >i'm sick. Hope you get better soon. Best wishes P. MORE comments below: >Kind regards, Joerg >-- >Dipl. Chem. Joerg K. Wegner >Center of Bioinformatics Tuebingen (ZBIT) >Department of Computer Architecture >Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany >Phone: (+49/0) 7071 29 78970 >Fax: (+49/0) 7071 29 5091 >E-Mail: mailto:we...@in... >WWW: http://www-ra.informatik.uni-tuebingen.de >-- >Never mistake motion for action. > (E. Hemingway) > >Never mistake action for meaningful action. > (Hugo Kubinyi,2004) > > > ><?xml version="1.0" encoding="ISO-8859-1"?> ><!DOCTYPE molecule SYSTEM "cml.dtd" []> ><cml:molecule xmlns:cml="http://www.xml-cml.org/schema/cml2/core" >title="Gonan derivate with stereochemistry" id="-2114331329"> IDs must start with a letter ><cml:metadataList title="generated automatically from JOELib"> ><cml:metadata name="dc:creator" content="Used JOELib chemistry kernel ID >is 715333816 and the used CML writer is >joelib.io.types.cml.MoleculeHuge(version 1.2)"></cml:metadata> ><cml:metadata name="dc:description" content="Conversion of legacy filetype >to CML"></cml:metadata> ><cml:metadata name="dc:identifier" content="unknown"></cml:metadata> ><cml:metadata name="dc:content"></cml:metadata> ><cml:metadata name="dc:rights" content="unknown"></cml:metadata> ><cml:metadata name="dc:type" content="chemistry"></cml:metadata> ><cml:metadata name="dc:contributor" content="see http://joelib.sf.net for >a full list of contributors"></cml:metadata> ><cml:metadata name="dc:date" content="16 Mar 2004 08:53:52 >GMT"></cml:metadata> ><cml:metadata name="cmlm:structure" content="yes"></cml:metadata> Looks great ></cml:metadataList> ><cml:scalar dataType="xsd:string" dictRef="joelib:kernel" >title="joelib:kernel:715333816:softCoded:joelib.data.JOEAromaticTyper" >id="joelib:kernel:single:1345017519">joelib.data.JOEAromaticTyper >http://joelib.sf.net joelib/data/plain/aromatic.txt 1.4 >2004-03-15_13-33-42</cml:scalar> This is a compound data field. We are working on how dictionaries can support this. The fields are, I think: classname URI id datetime XSD can support complexContent which describes and validates these. We are looking at how the dictionaries can support this ><cml:scalar dataType="xsd:string" dictRef="joelib:kernel" >title="joelib:kernel:715333816:softCoded:joelib.data.JOEAtomTyper" >id="joelib:kernel:single:331687181">joelib.data.JOEAtomTyper >http://joelib.sf.net joelib/data/plain/atomtype.txt 1.8 >2004-03-15_13-33-42</cml:scalar> ><cml:scalar dataType="xsd:string" dictRef="joelib:kernel" >title="joelib:kernel:715333816" >id="joelib:kernel:715333816">joelib:kernel:single:1345017519, >joelib:kernel:single:331687181, joelib:kernel:single:-349822152, >joelib:kernel:single:1305056389, joelib:kernel:single:2066602532, >joelib:kernel:single:1443728660, joelib:kernel:single:2056739992, >joelib:kernel:single:-465953241, joelib:kernel:single:-752030822, >joelib:kernel:single:-647783807, joelib:kernel:single:1562896312, >joelib:kernel:single:1380529384, joelib:kernel:single:2132350161, >joelib:kernel:single:-1630937171, joelib:kernel:single:862707252, >joelib:kernel:single:-94117937</cml:scalar> I would make this an array and use prefixes: ><cml:array dataType="xsd:QName dictRef="joelib:kernel" >title="joelib:kernel:715333816" id="joelib:kernel:715333816"> jks:a1345017519 jks:331687181 ...</cml:array> But it looks messy. Do you actually want to process these as integers? If so it can be redesigned >joelib:kernel:single:1345017519, joelib:kernel:single:331687181, >joelib:kernel:single:-349822152, joelib:kernel:single:1305056389, >joelib:kernel:single:2066602532, joelib:kernel:single:1443728660, >joelib:kernel:single:2056739992, joelib:kernel:single:-465953241, >joelib:kernel:single:-752030822, joelib:kernel:single:-647783807, >joelib:kernel:single:1562896312, joelib:kernel:single:1380529384, >joelib:kernel:single:2132350161, joelib:kernel:single:-1630937171, >joelib:kernel:single:862707252, joelib:kernel:single:-94117937</cml:scalar> ><cml:atom id="-2114331329:a1"> ><cml:string builtin="elementType">C</cml:string> ><cml:float builtin="x2">13.916500091552734</cml:float> ><cml:float builtin="y2">-7.770199775695801</cml:float> Looks good This is CML1 - I assume you can now use CML2 ><cml:scalar units="units:electron" dataType="xsd:float" >dictRef="joelib:partialCharge">-0.022420638985526313</cml:scalar> Yes. this is great. I am hoping to get some hierarchy for units so they don't need constant reiteration. And if space matters you can use CML2array ><cml:integer builtin="hydrogenCount">5</cml:integer> ></cml:atom> ><cml:atom id="-2114331329:a2"> ><cml:string builtin="elementType">C</cml:string> ><cml:float builtin="x2">13.916500091552734</cml:float> ><cml:name convention="trivial">Gonan derivate with stereochemistry</cml:name> ><cml:scalar dataType="xsd:double" dictRef="joelib:kernel:715333816" >title="Fraction_of_rotatable_bonds">4.3478260869565216E-2</cml:scalar> ><cml:scalar dataType="xsd:double" dictRef="joelib:kernel:715333816" >title="Geometrical_shape_coefficient">9.202824265150003</cml:scalar> ><cml:scalar dataType="xsd:double" dictRef="joelib:kernel:715333816" >title="Molecular_weight">2.452599936723709E2</cml:scalar> ><cml:scalar dataType="xsd:string" title="RDF"><![CDATA[Gasteiger_Marsili >50<1.5940199183282508E-16,3.5320351902369007E-12,1.0787560326377212E-8,4.562926679481138E-6,2.6872124085373483E-4,2.2069047752468125E-3,2.303245111884276E-3,-5.436356512771221E-4,-3.2478281395112424E-4,4.918651334382971E-5,1.536251324560617E-3,2.277435169742313E-3,1.300816511980075E-3,6.126140107135232E-4,-4.1824716100300705E-4,-1.368863814540861E-3,-8.208339192751625E-4,5.3376461771753E-4,1.534072562138197E-3,1.7908914501277423E-3,5.507175720486667E-6,-5.763211262243748E-4,1.4132473231521047E-4,8.686437677244833E-4,1.5090055350379477E-3,1.6546296948737362E-3,8.647678978194836E-4,2.500403657505191E-4,-8.328990776669339E-5,3.4797339389094526E-4,1.0482630671704465E-3,3.57933385516601E-4,-9.243630247315901E-5,-1.6374118268106688E-4,-6.744686206966648E-4,-7.3563670708480915E-6,1.0783852781926621E-3,1.1659437942465932E-3,5.3669116363927E-4,-1.1314694263345528E-4,-6.928623677582942E-4,-4.119554183357553E-4,2.042136965559031E-5,2.119917715225437E-4,2.94531451807724E-4,-6.505797814441567 E >-4,-2.9527826356305804E-4,6.187696472975766E-5,1.1298076477827559E-4,9.012712526200434E-5>]]></cml:scalar> You could use array and xsd:float for this. CDATA isn't needed unless you expect < or & in your content ><cml:scalar dataType="xsd:integer" dictRef="joelib:kernel:715333816" >title="Number_of_rotatable_bonds">1</cml:scalar> ><cml:scalar dataType="xsd:integer" dictRef="joelib:kernel:715333816" >title="Number_of_HBD_2">0</cml:scalar> ><cml:scalar dataType="xsd:integer" dictRef="joelib:kernel:715333816" >title="Number_of_HBD_1">0</cml:scalar> ><cml:scalar dataType="xsd:double" dictRef="joelib:kernel:715333816" >title="MolarRefractivity">9.199900000000002E1</cml:scalar> ><cml:scalar dataType="xsd:integer" dictRef="joelib:kernel:715333816" >title="Number_of_aromatic_bonds">0</cml:scalar> ><cml:scalar dataType="xsd:double" dictRef="joelib:kernel:715333816" >title="Zagreb_group_index_2">1.8E2</cml:scalar> ><cml:scalar dataType="xsd:double" dictRef="joelib:kernel:715333816" >title="Zagreb_group_index_1">1.47E2</cml:scalar> ><cml:scalar dataType="xsd:double" dictRef="joelib:kernel:715333816" >title="PolarSurfaceArea">0.0</cml:scalar> This needs structuring... ><cml:scalar dataType="xsd:string" title="Topological_atom_pair"><![CDATA[42 >1 >Atom_valence >C >1.0 >5.0 >C >3.0 >2 >C >2.0 I have deleted the rest for convenience but will be happy to help with the design. Much of this can be made compact and semantically rich with array and table. It's somewhat inconvenient working in email - would be better to have attachments. This is actually an excellent thing for a Wiki Best P. Peter Murray-Rust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 |