|
From: Egon W. <ego...@gm...> - 2010-03-20 08:53:04
|
Hi Sam, On Fri, Mar 12, 2010 at 4:09 PM, Sam Adams <se...@ca...> wrote: > <atom elementType="O"> > <atomType dictRef="cml:mol2">O.co2</atomType> > <atomType dictRef="cml:mmff94">O=CO</atomType> > </atom> I always understood that the dictRef was a deep link... not pointing to a particular dictionary, but to the matching entry in the dictionary... I would have expected something like: <atom elementType="O"> <atomType dictRef="mol2:Oco2">O.co2</atomType> <atomType dictRef="mmff94:Oco2">O=CO</atomType> </atom> > <atom> > <property dictRef="gfx:color"> > <scalar type="xsd:string">#ff0000</scalar> > </property> > <property dictRef="cml:radius"> > <scalar type="xsd:float" units="units:angstrom">1.2</scalar> > </property> > </atom> Is there a convention for this defined? The JChemPaint project has been long talking about atom based properties and the serialization of them (though no one ever found actually time to implement it yet). > As part of Chem4Word we are looking into some of these issues (e.g. > colouring, radius, default bond angles, functional group > representations) and are investigating the possibility of a chemical > 'CSS' equivalent. This applies to a lot of project: JChemPaint (incl variants), Jmol, PyMOL, and likely many others... I'd very much like to see these things being discussed on the BO mailing list, so that the Blue Obelisk can make a community convention for these properties... Egon -- Post-doc @ Uppsala University Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
|
From: Konstantin T. <an...@ya...> - 2010-03-21 11:11:54
|
Hi all, I think this peace of XML cleanly demonstrates excessiveness, low human readability and parsing inefficiency: <atom> <property dictRef="gfx:color"> <scalar type="xsd:string">#ff0000</scalar> </property> <property dictRef="cml:radius"> <scalar type="xsd:float" units="units:angstrom">1.2</scalar> </property> </atom> IMHO, this information could be easily stored as <atom gfx:color="#ff0000" cml:radius="1.2" cml:radius:units="angstrom"/> If I'm wrong, please, explain why. 20.03.10, 09:52, "Egon Willighagen" <ego...@gm...>: > Hi Sam, > On Fri, Mar 12, 2010 at 4:09 PM, Sam Adams wrote: > > > > O.co2 > > O=CO > > > I always understood that the dictRef was a deep link... not pointing > to a particular dictionary, but to the matching entry in the > dictionary... I would have expected something like: > O.co2 > O=CO > > > > > > #ff0000 > > > > > > 1.2 > > > > > Is there a convention for this defined? > The JChemPaint project has been long talking about atom based > properties and the serialization of them (though no one ever found > actually time to implement it yet). > > As part of Chem4Word we are looking into some of these issues (e.g. > > colouring, radius, default bond angles, functional group > > representations) and are investigating the possibility of a chemical > > 'CSS' equivalent. > This applies to a lot of project: JChemPaint (incl variants), Jmol, > PyMOL, and likely many others... > I'd very much like to see these things being discussed on the BO > mailing list, so that the Blue Obelisk can make a community convention > for these properties... > Egon -- Regards, Konstantin Яндекс.Почта. Письма есть. Спама - нет. http://mail.yandex.ru/nospam/sign |
|
From: Peter Murray-R. <pm...@ca...> - 2010-03-21 12:54:04
|
Sorry not to have replied earlier. CML is deisgned for extensibility - primarily through other namespaces. It is possible to add foreign attributes and foreign elements (and I assume that gfx does not resolve to http://www.xml-cml.org/schema). So <atom gfx:color="#ff0000"/> is perfectly OK - you can use it to mean whatever you want. CML parsers are allowed to ignore it. Similarly: <atom elementType="O"> <gfx:color>#ff0000</gfx:color> </atom> is allowed. Note that: <atom gfx:radius:units="angstrom"/> is badly formed XML - it could be: <atom gfx:radius_units="angstrom"/> Note that attributes in CML are NOT in the CML namespace - this is an XML feature, not a CML one. Writing: <atom cml:elementType="Cl"/> (where cml resources to CML namespace ) is NOT recommended and is NOT the same as : <atom elementType="Cl"/> This is the syntax. The philosophy is that we can extend XML-languages through community agreement and practice. If yous wish to write: <atom egon:radius="2.0"/> you may - the only question is whether other people will understand that and write software. We tend to reserve "cmlx" for extensions in JUMBO and CMLLite. There is no definitive list of such extensions but there are a number we use regularly. In general we extend CML through dictionaries. In <atom> <property dictRef="gfx:color"> <scalar type="xsd:string">#ff0000</ scalar> </property> <property dictRef="cml:radius"> <scalar type="xsd:float" units="units:angstrom">1.2</scalar> </property> </atom> There is agreed semantics that there should be namespaced dictionary entries for color and radius. They are further enhanced by the typing (string/float/units) in the XML. In principle it's better to have the typing in the dictionary and we are moving that way. There is no *CML* semantics that says that "gfx:color" should have a given type or form or have a dictionary entry. 2010/3/21 Konstantin Tokarev <an...@ya...> > Hi all, > I think this peace of XML cleanly demonstrates excessiveness, low human > readability and parsing inefficiency: > <atom> > <property dictRef="gfx:color"> > <scalar type="xsd:string">#ff0000</scalar> > </property> > <property dictRef="cml:radius"> > <scalar type="xsd:float" units="units:angstrom">1.2</scalar> > </property> > </atom> > > XML is not always very human readable, but nor are most data formats. A Molfile is not very human readable in places either. A Gaussian archive file is almost human-unreadable. A CDX file is completely human-unreadable. The implortant thing is that the semantics are unambiguous. I'm am afraid that's a necessary payoff in the machine age. IMHO, this information could be easily stored as > <atom gfx:color="#ff0000" cml:radius="1.2" cml:radius:units="angstrom"/> > > If I'm wrong, please, explain why. > > It could also be stored as red atom radius 1.2A That's very human readable and rather difficult for a machine without special implicit conventions. It's always a tradeoff. P. -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069 |
|
From: Konstantin T. <an...@ya...> - 2010-03-21 13:25:45
|
OK. More general question: what is profit of dictionaries? XML has it's own "dictionaries": dtd, xml schema. But you actually create new language on top of XML which complicates readability not only by humans, but by programs too. Why not to keep things simple? -- Regards, Konstantin |
|
From: Peter Murray-R. <pm...@ca...> - 2010-03-21 14:00:55
|
On Sun, Mar 21, 2010 at 1:25 PM, Konstantin Tokarev <an...@ya...>wrote: > OK. More general question: what is profit of dictionaries? > > XML has it's own "dictionaries": dtd, xml schema. But you actually create > new language on top of XML which complicates readability not only by humans, > but by programs too. Why not to keep things simple? > Beacuse it is not simple to represent science to a computer! It's easy to write: dipole="1.2" "everyone knows" that this is a float and that the units are Debye. But machines don't know. To them it's the same as: version="1.2" So at this stage we have to indicate the dataType and the units or we have to guess. in CML we don't guess - we make it explicit. what does "dipole" mean. Does it mean the absolute magnitude of the dipole. Probably, but not certainly. What does: aromatic="true" mean? unless you have an algorithm defining "aromatic" different people will use different definitions. and so on The dictionaries are isomorphic with RDF and ontologies - indeed it's possible to transform CML+dictRef into RDF+ontologies algorithmically. RDF and ontologies are verbose and not very human-readable but they are the best the world has got. P. > > -- > Regards, > Konstantin > -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069 |
|
From: Peter Murray-R. <pm...@ca...> - 2010-03-21 14:03:21
|
Thanks Andrew! I stand corrected and that's an excellent exposition. -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069 |
|
From: Peter Murray-R. <pm...@ca...> - 2010-03-21 15:20:42
|
On Sat, Mar 20, 2010 at 8:52 AM, Egon Willighagen < ego...@gm...> wrote: > Hi Sam, > > On Fri, Mar 12, 2010 at 4:09 PM, Sam Adams <se...@ca...> wrote: > > <atom elementType="O"> > > <atomType dictRef="cml:mol2">O.co2</atomType> > > <atomType dictRef="cml:mmff94">O=CO</atomType> > > </atom> > > I always understood that the dictRef was a deep link... not pointing > to a particular dictionary, but to the matching entry in the > dictionary... I would have expected something like: > > <atom elementType="O"> > <atomType dictRef="mol2:Oco2">O.co2</atomType> > <atomType dictRef="mmff94:Oco2">O=CO</atomType> > </atom> > Essentially the dictRef value (mol2:bar) is equivalent to a URI http://www.foo.com/mol2#bar The world is split between whether a URI is purely a name or whether it is also an address. I was inducted to the W3C philosophy that names and addresses are separate. Tim wishes to conflate them. I am now relaxed about this. So you can interpret <atomType xmlns:mmff94="http://mmff94.org/dict" dictRef="mmff94:Oco2">O=CO</atomType> either as a statement that "there is a defined uniqueId in the mmff94 namespace (http://mmff94.org/dict) with value Oco2. There may or may not be an accessible dictionary entry but there should be at least the concept of one" or "there is a dictionary at http://mmff94.org/dict with an entry http://mmff94.org/dict#Oco2 and this is of the form <cml:entry id="Oco2">...</cml:entry> " I think the the latter is most useful if we can manage it. I absolutely agree that the BO should try to support and systematize this. All the Chem4Word material (code, schemas, dictionaries) etc. will be Open Source/Data/Standard. They may not always be robust but it's best endeavour. Where there are existing BO dictionaries then Chem4Word will be informed by them. P. -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069 |