Re: [Psidev-ms-dev] mzML 0.99.0 comments

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

But with namespaces and all surely what is in physical reality 
two schemata can be operated as one anyway? So does it really 
make a huge difference? I'm actually asking rather than being 
rhetorical...

Despite all the arguments I perceive 'proper' standards as being 
completely static apart from through (infrequent) versions (XML 
Schema itself, for example). Maybe I have a biased notion of 
standards but should we not be making a core thing that is 
static and keeping the volatile stuff in the second one?

And I do still see a tie to one CV as bundling for no reason -- 
it's a short term gain (a year or so, which means that just at 
the point that we have good implementations, it'll be change-o 
time).

I dunno. I'm as I said just throwing in opinions I've heard 
elsewhere mostly. On balance it really comes down to pragmatism 
versus kind/strength of assurance (to third parties). I'm gonna 
pull my head in now anyway  :)

Cheers, Chris.

Brian Pratt wrote:
> Hey All,
> 
> It's true that in practice most day to day consumers of mzML files will not
> bother with validation.  The value of the detailed validation capability of
> a fully realized xsd is largely seen during the *development* of the readers
> and writers, not in their day to day operation.  (Of course it's also seen
> in their day to day operation because they work properly, having been
> written properly.)
> 
> Ideally we would test every conceivable combination of writer and reader,
> but since we can't expect to do that (we can't start until everybody
> finishes, and imagine the back and forth!) we instead have to make it
> possible for the writers to readily check their work in syntactic and
> semantic detail, and for the readers to not have to make a lot of guesses
> about what they're likely to see.  The fully realized xsd helps on both
> counts - ready validation for the writers, and a clear spec for the readers.
> It also gives the possibility of automatically generated code as a jumping
> off point for the programmers of both readers and writers, which can reduce
> defect rates.
> 
> Matt asks if I envision one schema or two.  We need to go out the gate with
> one schema that expresses everything we know we want to say today (includes
> any intelligence in the current mapping file, plus more detail).  The
> anticipated need for vendors to extend the schema independent of the
> official schema release cycle (our "stability" goal) is then handled by
> schemas the vendors create, which inherit from and extend the standard
> schema.   The proposed idea of a second schema from the get-go just to layer
> on the CV mappings is unwarranted complexity.  These belong in the core xsd
> as (optional) attributes of the various elements, when that one-time OBI
> event comes we'll just update the core xsd to add attributes that indicate
> relationships from elements to the new CV as well.  It's far enough away not
> to threaten the appearance of stability in the spec, and in any case won't
> break backward compatibility.
> 
> The important point about hard coding rules vs expressing relationships and
> constraints in the xsd is one of economies of scale.  It was asked whether
> hard coding was any more work than getting the schema right: the answer is
> yes, as it has to be done repeatedly, once per validating reader
> implementation (not everyone uses Java, or is even allowed to use open
> source code in their product).  Why make everyone reinvent the wheel and
> probably get it wrong, when we have a nice, standard, language independent
> means of expressing those constraints? 
> 
> It just comes down to KISS:  Keep It Simple, Stupid! (not calling names
> here, that's just the acronym as I learned it).  We're here to deal with MS
> raw data transfer, not to design new data format description languages.
> More than once on this list I've seen snarky asides about coders who aren't
> up to muscling through these proposed convolutions, but a truly competent
> coder is professionally lazy (managers prefer "elegant").  Moreover, a
> standards effort is supposed to consolidate the efforts of the community so
> its individuals can get on with their real work - we shouldn't be blithely
> proposing things that create more individual work than they absolutely need
> to.
> 
> - Brian
> 
> -----Original Message-----
> From: psi...@li...
> [mailto:psi...@li...] On Behalf Of Chris
> Taylor
> Sent: Thursday, October 18, 2007 9:37 AM
> To: Mass spectrometry standard development
> Subject: Re: [Psidev-ms-dev] mzML 0.99.0 comments
> 
> Hiya.
> 
> Matthew Chambers wrote:
>> I'm glad we're getting good participation and discussion of this issue 
>> now!  Chris, your characterization is a reasonable one for the 
>> two-schema approach I described.
>>
>> To respond to qualification of the current state of affairs, I'll quote 
>> something you said the other day:
>>> Clearly we need the basic (and rilly rilly easy to do) syntactic 
>>> validation provided by a fairly rich XML schema.
>> This is not clear to me.  I do not see a clear advantage to validating 
>> syntax and not validating semantics.  In my experience, reading a file 
>> with invalid semantics is as likely to result in a parser error as 
>> reading a file with invalid syntax (although I admit that implementing 
>> error handling for semantic errors tends to be more intuitive).
> 
> The only thing I'd say here is that there is a minimum effort 
> option available for implementers who cannot or choose not to 
> validate content -- i.e. the 'core' schema is there to allow 
> syntactic validation only, the extended schema you suggested 
> would then allow the Brians and yourselves of this world to do 
> more. Seems a neat solution. That said I don't contest your 
> assertion that the more thorough the validation, the more likely 
> one is to catch the subtle errors as well as the gross ones.
> 
>>> But supporting 
>>> the kinds of functionality discussed (which would mean the CV 
>>> rapidly becoming a 'proper' ontology, which we don't have the 
>>> person-hours to do right btw) is really just a nice to have at 
>>> the moment. True semantic validation is just about feasible but 
>>> _isn't_ practical imho.
>> I think you misunderstood the functionality I was suggesting to be added 
>> to the CV.  I was not suggesting significant logic changes in the CV, 
>> only a simple instance_of relationship added to every controlled value 
>> to link it to its parent category: "LTQ" is a controlled value, and it 
>> should be an 'instance_of' an "instrument model", which is a controlled 
>> category.  In my view, the distinction between controlled values and 
>> categories in the CV is crucial and it doesn't come close to making the 
>> CV any more of a 'proper' ontology (i.e. that machines can use to gain 
>> knowledge about the domain without human intervention).  It would, 
>> however, mean that a machine could auto-generate a schema from the CV, 
>> which is what I was aiming for. :)  I don't really agree with the idea 
>> that the PSI MS CV should be a filler which gets replaced by the OBI CV 
>> whenever it comes about, but if that's the consensus view then that 
>> would be reason enough to give up the idea of using the CV to 
>> auto-generate the schema.
> 
> Thing here is that I heard several people assert (not on here) 
> that defining terminating endpoints is storing up trouble and 
> instances are therefore hostages to fortune; you'll just end up 
> making a new class and deprecating the instance. Obviously there 
> are clear endpoints (is there only one variant of an LTQ btw? is 
>   it a child or a sib to have an LTQ-FT?) but there are also 
> going to be mistakes made -- rope to hang ourselves (overly 
> dramatic phrase but nonetheless).
> 
> Then there is the case where people _want_ to use a more generic 
> parent (not sure how many there are in the CV tbh as it is quite 
> flat iirc but still there are many ontologies in the world where 
> the nodes are used as much as the leaves). A (simple-ish) 
> example off the top of my head (not necessarily directly 
> applicable, just for the principle) would be where someone has a 
> machine not yet described and just wants to say something about it.
> 
>>> Certainly for all but the most dedicated 
>>> coders it is a pipe dream. All that can realistically be hoped 
>>> for at the moment is correct usage (i.e. checking in an 
>>> application of some sort that the term is appropriate given its 
>>> usage), for which this wattage of CV is just fine.This is what 
>>> the MIers have done -- a java app uses hard-coded rules to check 
>>> usage (and in that simple scenario the intelligent use of 
>>> class-superclass stuff can bring benefits).
>> It seems here you DO suggest validating semantics, but instead of doing 
>> it with the CV/schema it must be implemented manually by hard-coding the 
>> rules into a user application.  Right now, there is no way (short of 
>> parsing the ms-mapping file and adopting that format) to get that kind 
>> of validation without the hard-coding you mention.  Brian and I both 
>> think that a proper specification should include a way to get this kind 
>> of validation without hard-coding the rules, even if applications choose 
>> not to use it.
> 
> I think in the absence of an ontology to afford this sort of 
> functionality (and with one expected), hard coding is not an 
> awful solution (the workload for your suggestion wouldn't be 
> orders of magnitude different would it, bearing in mind this is 
> a temporary state of affairs so not subject to years of 
> maintenance?). The MI group certainly went this route straight 
> off the bat...
> 
> At the risk of becoming dull, I'd restate that this is why I 
> like the separable schemata you suggested, as we get the best of 
> both worlds no?
> 
>>> But what they're not 
>>> doing is something like (for MS now) I have a Voyager so why on 
>>> earth do I have ion trap data -- sound the klaxon; this can only 
>>> come from something of the sophistication of OBI (or a _LOT_ of 
>>> bespoke coding), which is in a flavour of OWL (a cruise liner to 
>>> OBO's dinghy).
>> It's true, AFAIK, that validating (for example) the value of the "mass 
>> analyzer" category based on the value provided for the "instrument 
>> model" category is not possible with the current CV/schema.  It is not 
>> even possible after the extensions proposed by Brian or me.  Such 
>> functionality would require a much more interconnected CV (and the XSD 
>> schema would be so confusing to maintain that it would almost certainly 
>> have to be auto-generated from the CV).  I don't think anybody 
>> particularly expects this functionality either, so we needn't worry 
>> about it. :)
> 
> Well I'm kind of hoping we will ultimately be able to get this 
> from OBI, which is being built in a very thorough and extensible 
> (in terms of the richness of relations between classes) manner.
> 
> Cheers, Chris.
> 
> 
>> -Matt
>>
>>
>> Chris Taylor wrote:
>>> Hiya.
>>>
>>> So your solution can, if I understand correctly, be 
>>> characterised as formalising the mapping file info in an XSD 
>>> that happens (for obvious reasons) to inherit from the main 
>>> schema? If so, then as long as everyone likes it, I see that as 
>>> a nice, neat, robust solution.
>>>
>>> Funnily enough I was chatting to a fellow PSIer yesterday about 
>>> the mapping file(s) (this is cross-WG policy stuff you see) and 
>>> enquired as to the current nature of the thing. I think if there 
>>> is a clamour to formalise the map then hopefully there will be a 
>>> response. To qualify the current state of affairs though, this 
>>> was not meant to be a formal part of the standard -- more 
>>> something akin to documentation (it didn't exist at all at one 
>>> point -- bridging the gap was something done in the CV, which is 
>>> not a great method for a number of reasons).
>>>
>>> Cheers, Chris.
>>>
>>>
>>> Matthew Chambers wrote:
>>>   
>>>> If the consensus is that the CV should be left simple like it is now, 
>>>> then I must agree with Brian.  The current schema is incapable of doing 
>>>> real validation, and the ms-mapping file is worse than a fleshed-out CV 
>>>> or XSD (it's more confusing, it takes longer to maintain, and it's 
>>>> non-standard).
>>>>
>>>> I still want Brian to clarify if he wants a one-schema spec or a 
>>>> two-schema spec.  I support the latter approach, where one schema is a 
>>>> stable, syntactical version and the other inherits from the first one 
>>>> and defines all the semantic restrictions as well.  It would be up to 
>>>> implementors which schema to use for validation, and of course only the 
>>>> syntactical schema would be "stable" because the semantic restrictions 
>>>> in the second schema would change to match the CV whenever it was
> updated.
>>>> -Matt
>>>>
>>>>     
>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by: Splunk Inc.
>> Still grepping through log files to find problems?  Stop.
>> Now Search log events and configuration files using AJAX and a browser.
>> Download your FREE copy of Splunk now >> http://get.splunk.com/
>> _______________________________________________
>> Psidev-ms-dev mailing list
>> Psi...@li...
>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev
>>
> 

-- 
~~~~~~~~~~~~~~~~~~~~~~~~
  chr...@eb...
  http://mibbi.sf.net/
~~~~~~~~~~~~~~~~~~~~~~~~