From: Chris T. <chr...@eb...> - 2007-10-18 19:19:13
|
But with namespaces and all surely what is in physical reality two schemata can be operated as one anyway? So does it really make a huge difference? I'm actually asking rather than being rhetorical... Despite all the arguments I perceive 'proper' standards as being completely static apart from through (infrequent) versions (XML Schema itself, for example). Maybe I have a biased notion of standards but should we not be making a core thing that is static and keeping the volatile stuff in the second one? And I do still see a tie to one CV as bundling for no reason -- it's a short term gain (a year or so, which means that just at the point that we have good implementations, it'll be change-o time). I dunno. I'm as I said just throwing in opinions I've heard elsewhere mostly. On balance it really comes down to pragmatism versus kind/strength of assurance (to third parties). I'm gonna pull my head in now anyway :) Cheers, Chris. Brian Pratt wrote: > Hey All, > > It's true that in practice most day to day consumers of mzML files will not > bother with validation. The value of the detailed validation capability of > a fully realized xsd is largely seen during the *development* of the readers > and writers, not in their day to day operation. (Of course it's also seen > in their day to day operation because they work properly, having been > written properly.) > > Ideally we would test every conceivable combination of writer and reader, > but since we can't expect to do that (we can't start until everybody > finishes, and imagine the back and forth!) we instead have to make it > possible for the writers to readily check their work in syntactic and > semantic detail, and for the readers to not have to make a lot of guesses > about what they're likely to see. The fully realized xsd helps on both > counts - ready validation for the writers, and a clear spec for the readers. > It also gives the possibility of automatically generated code as a jumping > off point for the programmers of both readers and writers, which can reduce > defect rates. > > Matt asks if I envision one schema or two. We need to go out the gate with > one schema that expresses everything we know we want to say today (includes > any intelligence in the current mapping file, plus more detail). The > anticipated need for vendors to extend the schema independent of the > official schema release cycle (our "stability" goal) is then handled by > schemas the vendors create, which inherit from and extend the standard > schema. The proposed idea of a second schema from the get-go just to layer > on the CV mappings is unwarranted complexity. These belong in the core xsd > as (optional) attributes of the various elements, when that one-time OBI > event comes we'll just update the core xsd to add attributes that indicate > relationships from elements to the new CV as well. It's far enough away not > to threaten the appearance of stability in the spec, and in any case won't > break backward compatibility. > > The important point about hard coding rules vs expressing relationships and > constraints in the xsd is one of economies of scale. It was asked whether > hard coding was any more work than getting the schema right: the answer is > yes, as it has to be done repeatedly, once per validating reader > implementation (not everyone uses Java, or is even allowed to use open > source code in their product). Why make everyone reinvent the wheel and > probably get it wrong, when we have a nice, standard, language independent > means of expressing those constraints? > > It just comes down to KISS: Keep It Simple, Stupid! (not calling names > here, that's just the acronym as I learned it). We're here to deal with MS > raw data transfer, not to design new data format description languages. > More than once on this list I've seen snarky asides about coders who aren't > up to muscling through these proposed convolutions, but a truly competent > coder is professionally lazy (managers prefer "elegant"). Moreover, a > standards effort is supposed to consolidate the efforts of the community so > its individuals can get on with their real work - we shouldn't be blithely > proposing things that create more individual work than they absolutely need > to. > > - Brian > > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On Behalf Of Chris > Taylor > Sent: Thursday, October 18, 2007 9:37 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] mzML 0.99.0 comments > > Hiya. > > Matthew Chambers wrote: >> I'm glad we're getting good participation and discussion of this issue >> now! Chris, your characterization is a reasonable one for the >> two-schema approach I described. >> >> To respond to qualification of the current state of affairs, I'll quote >> something you said the other day: >>> Clearly we need the basic (and rilly rilly easy to do) syntactic >>> validation provided by a fairly rich XML schema. >> This is not clear to me. I do not see a clear advantage to validating >> syntax and not validating semantics. In my experience, reading a file >> with invalid semantics is as likely to result in a parser error as >> reading a file with invalid syntax (although I admit that implementing >> error handling for semantic errors tends to be more intuitive). > > The only thing I'd say here is that there is a minimum effort > option available for implementers who cannot or choose not to > validate content -- i.e. the 'core' schema is there to allow > syntactic validation only, the extended schema you suggested > would then allow the Brians and yourselves of this world to do > more. Seems a neat solution. That said I don't contest your > assertion that the more thorough the validation, the more likely > one is to catch the subtle errors as well as the gross ones. > >>> But supporting >>> the kinds of functionality discussed (which would mean the CV >>> rapidly becoming a 'proper' ontology, which we don't have the >>> person-hours to do right btw) is really just a nice to have at >>> the moment. True semantic validation is just about feasible but >>> _isn't_ practical imho. >> I think you misunderstood the functionality I was suggesting to be added >> to the CV. I was not suggesting significant logic changes in the CV, >> only a simple instance_of relationship added to every controlled value >> to link it to its parent category: "LTQ" is a controlled value, and it >> should be an 'instance_of' an "instrument model", which is a controlled >> category. In my view, the distinction between controlled values and >> categories in the CV is crucial and it doesn't come close to making the >> CV any more of a 'proper' ontology (i.e. that machines can use to gain >> knowledge about the domain without human intervention). It would, >> however, mean that a machine could auto-generate a schema from the CV, >> which is what I was aiming for. :) I don't really agree with the idea >> that the PSI MS CV should be a filler which gets replaced by the OBI CV >> whenever it comes about, but if that's the consensus view then that >> would be reason enough to give up the idea of using the CV to >> auto-generate the schema. > > Thing here is that I heard several people assert (not on here) > that defining terminating endpoints is storing up trouble and > instances are therefore hostages to fortune; you'll just end up > making a new class and deprecating the instance. Obviously there > are clear endpoints (is there only one variant of an LTQ btw? is > it a child or a sib to have an LTQ-FT?) but there are also > going to be mistakes made -- rope to hang ourselves (overly > dramatic phrase but nonetheless). > > Then there is the case where people _want_ to use a more generic > parent (not sure how many there are in the CV tbh as it is quite > flat iirc but still there are many ontologies in the world where > the nodes are used as much as the leaves). A (simple-ish) > example off the top of my head (not necessarily directly > applicable, just for the principle) would be where someone has a > machine not yet described and just wants to say something about it. > >>> Certainly for all but the most dedicated >>> coders it is a pipe dream. All that can realistically be hoped >>> for at the moment is correct usage (i.e. checking in an >>> application of some sort that the term is appropriate given its >>> usage), for which this wattage of CV is just fine.This is what >>> the MIers have done -- a java app uses hard-coded rules to check >>> usage (and in that simple scenario the intelligent use of >>> class-superclass stuff can bring benefits). >> It seems here you DO suggest validating semantics, but instead of doing >> it with the CV/schema it must be implemented manually by hard-coding the >> rules into a user application. Right now, there is no way (short of >> parsing the ms-mapping file and adopting that format) to get that kind >> of validation without the hard-coding you mention. Brian and I both >> think that a proper specification should include a way to get this kind >> of validation without hard-coding the rules, even if applications choose >> not to use it. > > I think in the absence of an ontology to afford this sort of > functionality (and with one expected), hard coding is not an > awful solution (the workload for your suggestion wouldn't be > orders of magnitude different would it, bearing in mind this is > a temporary state of affairs so not subject to years of > maintenance?). The MI group certainly went this route straight > off the bat... > > At the risk of becoming dull, I'd restate that this is why I > like the separable schemata you suggested, as we get the best of > both worlds no? > >>> But what they're not >>> doing is something like (for MS now) I have a Voyager so why on >>> earth do I have ion trap data -- sound the klaxon; this can only >>> come from something of the sophistication of OBI (or a _LOT_ of >>> bespoke coding), which is in a flavour of OWL (a cruise liner to >>> OBO's dinghy). >> It's true, AFAIK, that validating (for example) the value of the "mass >> analyzer" category based on the value provided for the "instrument >> model" category is not possible with the current CV/schema. It is not >> even possible after the extensions proposed by Brian or me. Such >> functionality would require a much more interconnected CV (and the XSD >> schema would be so confusing to maintain that it would almost certainly >> have to be auto-generated from the CV). I don't think anybody >> particularly expects this functionality either, so we needn't worry >> about it. :) > > Well I'm kind of hoping we will ultimately be able to get this > from OBI, which is being built in a very thorough and extensible > (in terms of the richness of relations between classes) manner. > > Cheers, Chris. > > >> -Matt >> >> >> Chris Taylor wrote: >>> Hiya. >>> >>> So your solution can, if I understand correctly, be >>> characterised as formalising the mapping file info in an XSD >>> that happens (for obvious reasons) to inherit from the main >>> schema? If so, then as long as everyone likes it, I see that as >>> a nice, neat, robust solution. >>> >>> Funnily enough I was chatting to a fellow PSIer yesterday about >>> the mapping file(s) (this is cross-WG policy stuff you see) and >>> enquired as to the current nature of the thing. I think if there >>> is a clamour to formalise the map then hopefully there will be a >>> response. To qualify the current state of affairs though, this >>> was not meant to be a formal part of the standard -- more >>> something akin to documentation (it didn't exist at all at one >>> point -- bridging the gap was something done in the CV, which is >>> not a great method for a number of reasons). >>> >>> Cheers, Chris. >>> >>> >>> Matthew Chambers wrote: >>> >>>> If the consensus is that the CV should be left simple like it is now, >>>> then I must agree with Brian. The current schema is incapable of doing >>>> real validation, and the ms-mapping file is worse than a fleshed-out CV >>>> or XSD (it's more confusing, it takes longer to maintain, and it's >>>> non-standard). >>>> >>>> I still want Brian to clarify if he wants a one-schema spec or a >>>> two-schema spec. I support the latter approach, where one schema is a >>>> stable, syntactical version and the other inherits from the first one >>>> and defines all the semantic restrictions as well. It would be up to >>>> implementors which schema to use for validation, and of course only the >>>> syntactical schema would be "stable" because the semantic restrictions >>>> in the second schema would change to match the CV whenever it was > updated. >>>> -Matt >>>> >>>> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Splunk Inc. >> Still grepping through log files to find problems? Stop. >> Now Search log events and configuration files using AJAX and a browser. >> Download your FREE copy of Splunk now >> http://get.splunk.com/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> > -- ~~~~~~~~~~~~~~~~~~~~~~~~ chr...@eb... http://mibbi.sf.net/ ~~~~~~~~~~~~~~~~~~~~~~~~ |