|
From: Oliver S. <rev...@us...> - 2014-11-18 16:29:28
|
Hi Dan, Peter, Ah, now it gets interesting... Yes, we can use CML do describe a lot of different kinds of chemistry and for each we might want/need to be able to use different kinds of identifiers. But I think that could be done by using the <identifier> tag with appropriate conventions and/or dictionaries. I'm trying to represent structures from QM calculations, therefore I need structural identifiers. And for other applications we could define other conventions or dictionaries. How about this: *<cml* xmlns="http://www.xml-cml.org/schema" convention="convention:molecular" xmlns:convention="http://www.xml-cml.org/convention/" xmlns:structID="http://www.xml-cml.org/dictionary/structualIdentifier/" xmlns:compndID="http://www.xml-cml.org/dictionary/compoundIdentifier/" xmlns:substanceID="ttp://www.xml-cml.org/dictionary/substanceIdentifier/" *>* *<molecule* id="aspirin" spinMultiplicity="1" formalCharge="0"*>* *<identifier* dictRef="structID:InChI"*>*InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)*</identifier**>* *<identifier* dictref="structID:CanonicalSmiles"*>*CC(=O)OC1=CC=CC=C1C(=O)O*</identifier**>* *<identifier* dictref="compndID:pubChemCompound"*>*CID 2244*</identifier**>* *<identifier* dictref="substsID:pubChemSubstance"*>*SID 53788943*</identifier**>* <!-- atomArray, bondArray, etc. --> *<property* dictRef="cml:molmass"*>* *<scalar* dataType="xsd:double" units="unit:dalton" xmlns:unit="http://www.xml-cml.org/unit/si/"*>*180.15742*</scalar**>* *</property**>* *<formula* concise="C 9 H 8 O 4"*/>* *</molecule**>**</cml**>* The dictionaries with a more general applicability could be part of the CML standard and hosted at the CML website and if necessary other very specific dictionaries could even be located somewhere else (e.g. at NIH). What do you think? Best, Oliver On Mon, Nov 17, 2014 at 5:32 PM, Peter Murray-Rust <pm...@ca...> wrote: > Absolutely right Dan, > > We never confuse > > substance with compound with structure. > > P. > > > On Mon, Nov 17, 2014 at 3:46 PM, Zaharevitz, Daniel (NIH/NCI) [E] < > Dan...@ni...> wrote: > >> It has been quite a while, but I did use the <identifier> tag for a lot >> of stuff. I would be interested in participating in reviving it. One reason >> I used it was the tag name implied a distinction from structure. I think >> it is very important to keep the distinction between substance or sample >> identifier distinct from structure. Chemical structure is an empirical >> property of a substance hence subject to change as more data become >> available. Making structure serve as a substance identifier is a major >> mistake and thus I would argue that all the uses in the example listed are >> inappropriate uses of the identifier tag. Of course as Peter says any >> internally consistent use should be able to be changed to any other >> internally consistent use without too much difficulty, but I think using >> structure as an identifier can lead to major problems in maintaining >> internal consistency. >> >> DanZ >> >> -- >> >> /**********************/ >> Daniel Zaharevitz >> Chief, ITB, DTP, DCTD >> National Cancer Institute >> Zah...@ma... >> /**********************/ >> >> From: Peter Murray-Rust <pm...@ca...<mailto:pm...@ca...>> >> Date: Monday, November 17, 2014 3:23 PM >> To: Oliver Stueker <rev...@us...<mailto: >> rev...@us...>> >> Cc: "cml...@li...<mailto: >> cml...@li...>" <cml...@li... >> <mailto:cml...@li...>> >> Subject: Re: [cml/ccml-discuss] Identifiers in CML >> >> Wonderful. >> >> Egon wants to hand on the baton of CML so I'll put you in touch with >> others quite shortly. I'd say do whatever seems reasonable and not too >> complex... >> >> >> Happy to see extensions in principle - suggest what you want to do... >> >> On Mon, Nov 17, 2014 at 3:12 PM, Oliver Stueker < >> rev...@us...<mailto:rev...@us...>> >> wrote: >> Thanks Peter, >> >> My goal is to add SMILES and InChI identifiers to the molecule elements >> in CML/CompChem documents. >> The CompChem convention states that <molecule> elements should conform to >> the molecular convention. >> In the Molecular convention there is a list of allowed child elements, >> of which <property>, <label> and a bit far-fetched <name> come the closest >> to describe an identifier. >> >> I can easily just invent something but I prefer to follow an already >> defined standard or involve with defining and accepted standard. >> >> In the cmllite-validator-code repo I found this CML/CompChem file [1] >> which defines: >> >> >> <cml> >> >> <module role="joblist"> <identifier >> convention="chemid:EmpiricalFormula" value="CCl2O2"/> <identifier >> convention="chemid:InChI" value="InChI=1/CCl2O2/c2-1(4)5-3"/> >> <identifier convention="chemid:CanonicalSmiles" value="ClOC(=O)Cl"/> >> <identifier convention="chemid:IsomericSmiles" value="C(=O)(Cl)OCl"/> >> <module role="job" title="job1"> >> >> <!-- ... --> >> >> </module> >> >> </module> >> >> </cml> >> >> However on http://www.xml-cml.org/convention/ there is no chemid >> convention nor an <identifier> element in any convention. >> >> What do you think of adding <identifier> to the allowed elements of >> <molecule> in the molecular convention and starting a chemid dictionary? >> >> >> In fact I'm currently also working on expanding the CompChem dictionary. >> I've (privately) forked the repo on bitbucket.org/cml/dictionary-compchem >> <http://bitbucket.org/cml/dictionary-compchem> and will propose a pull >> request at a suitable time. >> I could do the same for the molecular convention. >> >> Best, >> Oliver >> >> [1] >> https://bitbucket.org/cml/cmllite-validator-code/src/2eaa18f959bb0324268bf75be8a904d4c9e07944/src/test/resources/org/xmlcml/www/convention/cmlcomp/valid/two-jobs.cml?at=default >> >> On Mon, Nov 17, 2014 at 3:32 PM, Peter Murray-Rust <pm...@ca... >> <mailto:pm...@ca...>> wrote: >> Copying Egon, >> >> We did have an <identifier> label which allowed this, but you can also >> use <label> and add a "class" attribute. >> >> The key approach of CML now is that communities should create conventions >> that work for them. As these become established then conventions can become >> normalised. trying to constrains too rigidly requires a lot of software are >> consistent discipline. >> >> If different communities end up with slightly different approaches it's >> not normally hard to convert or merge. >> >> P. >> >> >> On Mon, Nov 17, 2014 at 8:43 AM, Oliver Stueker < >> rev...@us...<mailto:rev...@us...>> >> wrote: >> Dear CML Community, >> >> >> is there an (official) standard or best practice on how to include one >> or more identifiers (SMILES, InChI, etc.) in a CML document following the >> molecular convention? >> >> Maybe as <label> or <property> children of the <molecule> ? >> >> >> I couldn't find anything in the convention or the cml dictionaries. >> >> >> Best, >> Oliver >> >> >> >> >> -- >> Oliver Stueker, Dr. rer. nat. >> Postdoctoral Fellow, Poirier Lab >> Department of Chemistry, Memorial University >> Room C3052 - phone: +1 (709) 864-8752<tel:%2B1%20%28709%29%20864-8752> >> >> >> ------------------------------------------------------------------------------ >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & more >> Get technology previously reserved for billion-dollar corporations, FREE >> >> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk >> _______________________________________________ >> cml-discuss mailing list >> cml...@li...<mailto: >> cml...@li...> >> https://lists.sourceforge.net/lists/listinfo/cml-discuss >> >> >> >> >> -- >> Peter Murray-Rust >> Reader in Molecular Informatics >> Unilever Centre, Dep. Of Chemistry >> University of Cambridge >> CB2 1EW, UK >> +44-1223-763069<tel:%2B44-1223-763069> >> >> >> >> >> -- >> Peter Murray-Rust >> Reader in Molecular Informatics >> Unilever Centre, Dep. Of Chemistry >> University of Cambridge >> CB2 1EW, UK >> +44-1223-763069 >> > > > > -- > Peter Murray-Rust > Reader in Molecular Informatics > Unilever Centre, Dep. Of Chemistry > University of Cambridge > CB2 1EW, UK > +44-1223-763069 > |