Re: [Psidev-ms-dev] Value of common ontology vs common format

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I think there are two things here (and I tremble at the thought of 
deploying my meagre wattage on this one but nonetheless...):

Firstly, ontology handling tools and ontologies in general are in a 
pretty awful state. All our mzData-export implementers to my knowledge 
hard code their ontology values; in part this was the lack of a stable 
reference from us, but also I don't get the impression that live lookups 
of ontology references across networks (both the vagueries of networks 
and the potential time delay) are something people are exactly waiting 
to implement. I know caching is a help, but still...

And I speak as one stuck firmly into the FuGO thing, but that is just 
one effort, of thirty or so maybe, growing admittedly; and many of us 
are on a steep, steep learning curve. To handle large bodies of 
descriptors requires that the engineering is strong and that is 
non-trivial. Even in the magic circle of ontology there isn't consensus. 
Now a CV will get you a long way if a lookup (essentially acting I 
suppose as some sort of two-way API if that's not nonsense) is all 
you're after but the management problems with such entities grow 
non-linearly with size to say the least.

Second thing is that the XML really _is_ part of the ontology, although 
not in the literal sense (for example for the CPAS people, where they 
are obligated to build a structured CV rather than a real 'purist' 
ontology because every descriptor, from RDF or a table, has [IIRC] to go 
into the CV -- convenience classes are the problem there (straddling 
concepts for instance, spawning multiple inheritance problems). But 
anyway the point is that some of the ontology already is in a sense 
manifest as the XML. It's the bit of ontology that is settled, made 
flesh; and the reason that is the case is because of the lack of tools 
for ontology use mandates it.

The practical benefit is obviously that we can then exploit those 
semantic labels to do (some of) exactly the kinds of transformations and 
sophisticated data handling that you describe (but not all); while we 
can keep the competition down there won't be many many-to-many XML 
maintenance nightmares, mostly it's into and out of RDBs for now, and 
for analysis tools to feed on. It's better than just having more general 
types, and tables for trees, like HTML (or in a sense FuGE in it's raw 
form in a world without ontologies).

Basically I agree with the general sentiment in many ways, but in 
advance of there being decent ways to do the ontology-based thing, and 
with the ontologies being there (and stable), standardising (in effect) 
some of the ontology as schema elements give us some of the benefits 
while we wait for the situation to change.

There's another issue also, which is about keeping an eye on the ease of 
implementation of all this standards stuff for the funding/coder-poor. 
Hacking up an XML-based analysis pipeline on the cheap is not out of the 
reach of some 'small players', but adding the cost of a variety of XMLs 
to handle, with for example decent knowledge of XSL being required and 
the ability to write ontology-based tools that aren't even really there 
yet, will cut some out of the picture.

If much of that came off as stating the obvious, sorry. The obvious is 
kind of the level I'm at currently with a lot of this stuff but I have 
logorrhea  :\

Cheers, Chris.

Kent Laursen wrote:
> Philip,
> 
> Thank you very much for your thoughtful posting.  I will decline your
> challenge and choose rather to engage in a dialog (actually n-log).  The
> assumptions you are challenging may not actually exist, and I think you
> will find that we have much more common ground than the somewhat
> detached cyber world reveals at first glance.
> 
> My view of a succesful standard (in today's terms) is that it would
> 
> - Be a substantial aid to the science and researchers it addresses
> 
> - Define its value and usage understandably
> 
> - Define enough structure via its format (XML Schema or other mechanism)
> to provide data interchange for a reasonable cross section of use cases
> as put forth by the communities seeking this interchange
> 
> - Provide a consistent and robust mechanism for ontological
> extension/annotation
> 
> - Provide an appropriated scoped, accurate and complete ontology for its
> domain
> 
> - Provide a process and community for evolvement, interoperability and
> extension.
> 
> - Be implementable and/or provide meaningful implementation(s)
> 
> Reaching a "unified standard" is about providing a cohesive and
> consumable offering (balancing the bespoke aspects in time) supported by
> community governance and involvement that does its best to meet the
> needs of its domain while providing a way to move forward.  
> 
> So yes, a standard is much more than XML Schema.  Being, among other
> things, an old smalltalker and oodbms guy, I have never had a great
> affection for XML Schema.  However the rubber being on the road takes an
> incremental approach given the variety of timings, interests and
> pragmatic issues a standards body faces.  So I look forward to the
> community working together towards the more semantically rich and
> machine processable world we are seeking in the life sciences.  As your
> posting implies, along with recent activities in a variety of standards
> groups, this goal is becoming more realistic every day.
> 
> Thank you once again.
> 
> Regards,
> 
> Kent Laursen
> PSI-MS Working Group
> 
> 
> -----Original Message-----
> From: psi...@li...
> [mailto:psi...@li...] On Behalf Of Philip
> Doggett
> Sent: Wednesday, March 29, 2006 9:58 PM
> To: psi...@li...
> Subject: [Psidev-ms-dev] Value of common ontology vs common format
> 
> I would like to challenge the assumptions behind the need to put a great
> deal of work into developing a "common" XML format - or, indeed, any
> "standard" XML format at all - for the storage and interchange of mass
> spec data.
> 
> An XML format is a syntax into which we embed semantics.  The semantics
> is obtained from an ontology.  It is best to have one ontology that is
> shared by all who create XML syntaxes.  It does not matter whether one
> or many syntaxes exist as long as each embeds semantics from a common
> ontology.
> 
> The reasons why is does not matter include:
> - any syntax will be unable to satisfy the data and workflow
> requirements of all potential users, ergo many syntaxes will exist
> anyway
> - the cost of converting any XML syntax to any other XML syntax is
> negligible, given embedded semantics from a common ontology
> 
> 
> The element and attribute names used in an XML format include some
> amount of semantic content.  That semantic content, however, is usually
> dependent upon a processing or workflow context shared by a project team
> but no farther.  The development of an ontology and a method to include
> ontology links in the XML format provide a powerful method of extending
> the semantic content across all potential consumers of the data,
> independent of processing and workflow contexts.
> 
> An XML format is often developed not only to contain data but also to
> make processing and workflows more efficient.  This suggests that any
> XML format designed to be "common" across many or all projects will most
> often be suboptimal with respect to efficiency of processing and
> workflow of any one project.
> 
> When data is encoded in a binary (non-ASCII/non-Unicode) format, that
> format is usually tied to a particular processor, eg big-endian vs
> little-endian, 32-bit vs 64-bit, etc.  The cost of writing a data
> converter to change data from binary format A to binary format B is
> high.  A programming language that provides bit-level manipulation - eg
> C/C++, Java - is required, as is precise documentation of the 'from' and
> 'to' formats.  Any conversion application will likely be fragile, unable
> to handle the slightest change in the definition of the 'from' or 'to'
> format.  Given the high cost and high fragility, the need for strong
> data format standards is high.
> 
> I would like to suggest that with today's tools for manipulating XML
> documents - eg, Xalan, Saxon (both open source) - the cost of developing
> XML format converters is almost negligible and is decreasing.  I also
> suggest that when the next generation of schema definition standards is
> released, support for schema-embedded ontology links will enable fully
> automatable XML format conversion - all that will be required is the
> 'from' schema, the 'to' schema and the common ontology.
> 
> If there is zero cost to convert your XML format to my XML format and my
> XML format is optimised (and extended) for my processing and workflow
> requirements, then I don't care what your XML format is except that it
> includes links to a common ontology.  You don't care (and don't know)
> what my XML format is.  What both of us depend upon is the ontology that
> imbues our data with a common semantics that allows the *data* rather
> than the *format* to be exchanged and shared.
> 
> Rather than embarking on a project to merge mzXML and mzData into a
> single standard XML format, I suggest it is much more important and cost
> effective to merge the ontologies into a single standard.  I further
> suggest that the development and maintenance of this standard ontology
> become one of PSI's highest priorities.
> 
> 
> Philip Doggett
> 
> (The above comments are mine alone and do not necessarily reflect the
> views of Proteome Systems Limited.)
>  
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting
> language that extends applications into web and mobile media. Attend the
> live webcast and join the prime developer group breaking into this new
> coding territory!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
> _______________________________________________
> Psidev-ms-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting language
> that extends applications into web and mobile media. Attend the live webcast
> and join the prime developer group breaking into this new coding territory!
> http://sel.as-us.falkag.net/sel?cmd_______________________________________________
> Psidev-ms-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev
> 
> 

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Chris Taylor (ch...@eb...)
  HUPO PSI: GPS -- psidev.sf.net
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~