From: Kent L. <Ken...@ge...> - 2006-03-30 19:01:11
|
Philip, Thank you very much for your thoughtful posting. I will decline your challenge and choose rather to engage in a dialog (actually n-log). The assumptions you are challenging may not actually exist, and I think you will find that we have much more common ground than the somewhat detached cyber world reveals at first glance. My view of a succesful standard (in today's terms) is that it would - Be a substantial aid to the science and researchers it addresses - Define its value and usage understandably - Define enough structure via its format (XML Schema or other mechanism) to provide data interchange for a reasonable cross section of use cases as put forth by the communities seeking this interchange - Provide a consistent and robust mechanism for ontological extension/annotation - Provide an appropriated scoped, accurate and complete ontology for its domain - Provide a process and community for evolvement, interoperability and extension. - Be implementable and/or provide meaningful implementation(s) Reaching a "unified standard" is about providing a cohesive and consumable offering (balancing the bespoke aspects in time) supported by community governance and involvement that does its best to meet the needs of its domain while providing a way to move forward. =20 So yes, a standard is much more than XML Schema. Being, among other things, an old smalltalker and oodbms guy, I have never had a great affection for XML Schema. However the rubber being on the road takes an incremental approach given the variety of timings, interests and pragmatic issues a standards body faces. So I look forward to the community working together towards the more semantically rich and machine processable world we are seeking in the life sciences. As your posting implies, along with recent activities in a variety of standards groups, this goal is becoming more realistic every day. Thank you once again. Regards, Kent Laursen PSI-MS Working Group -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Philip Doggett Sent: Wednesday, March 29, 2006 9:58 PM To: psi...@li... Subject: [Psidev-ms-dev] Value of common ontology vs common format I would like to challenge the assumptions behind the need to put a great deal of work into developing a "common" XML format - or, indeed, any "standard" XML format at all - for the storage and interchange of mass spec data. An XML format is a syntax into which we embed semantics. The semantics is obtained from an ontology. It is best to have one ontology that is shared by all who create XML syntaxes. It does not matter whether one or many syntaxes exist as long as each embeds semantics from a common ontology. The reasons why is does not matter include: - any syntax will be unable to satisfy the data and workflow requirements of all potential users, ergo many syntaxes will exist anyway - the cost of converting any XML syntax to any other XML syntax is negligible, given embedded semantics from a common ontology The element and attribute names used in an XML format include some amount of semantic content. That semantic content, however, is usually dependent upon a processing or workflow context shared by a project team but no farther. The development of an ontology and a method to include ontology links in the XML format provide a powerful method of extending the semantic content across all potential consumers of the data, independent of processing and workflow contexts. An XML format is often developed not only to contain data but also to make processing and workflows more efficient. This suggests that any XML format designed to be "common" across many or all projects will most often be suboptimal with respect to efficiency of processing and workflow of any one project. When data is encoded in a binary (non-ASCII/non-Unicode) format, that format is usually tied to a particular processor, eg big-endian vs little-endian, 32-bit vs 64-bit, etc. The cost of writing a data converter to change data from binary format A to binary format B is high. A programming language that provides bit-level manipulation - eg C/C++, Java - is required, as is precise documentation of the 'from' and 'to' formats. Any conversion application will likely be fragile, unable to handle the slightest change in the definition of the 'from' or 'to' format. Given the high cost and high fragility, the need for strong data format standards is high. I would like to suggest that with today's tools for manipulating XML documents - eg, Xalan, Saxon (both open source) - the cost of developing XML format converters is almost negligible and is decreasing. I also suggest that when the next generation of schema definition standards is released, support for schema-embedded ontology links will enable fully automatable XML format conversion - all that will be required is the 'from' schema, the 'to' schema and the common ontology. If there is zero cost to convert your XML format to my XML format and my XML format is optimised (and extended) for my processing and workflow requirements, then I don't care what your XML format is except that it includes links to a common ontology. You don't care (and don't know) what my XML format is. What both of us depend upon is the ontology that imbues our data with a common semantics that allows the *data* rather than the *format* to be exchanged and shared. Rather than embarking on a project to merge mzXML and mzData into a single standard XML format, I suggest it is much more important and cost effective to merge the ontologies into a single standard. I further suggest that the development and maintenance of this standard ontology become one of PSI's highest priorities. Philip Doggett (The above comments are mine alone and do not necessarily reflect the views of Proteome Systems Limited.) =20 ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D110944&bid=3D241720&dat=3D= 121642 _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |