|
From: Matthew C. <mat...@va...> - 2007-10-03 15:34:18
|
Hi all,
Time to reopen this can of worms! I like the specification document.
It's clearly written. Unfortunately there is no clear way that I know
of to capture the semantically valid cvParam relationships in a flat
written document, but that can be done externally and it doesn't bother
me. I have one comment before discussing cvParams though: where is the
rationale for having "referenceable" paramGroups? I'm not disagreeing
with the idea, I think it's good, but it does need a rationale because
it's not typical XML practice. For example, why not use the xlink
standard to do the referencing? Also, do we guarantee the order of the
elements so that "referenceableParamGroupList" is always known to come
before the first "run" element (which if I read correctly is the first
element to make use of "paramGroupRef"s)?
As for attributes vs. cvParams, I have a compromise to propose between
methods A, B and C. I earlier proposed an extension to the structure of
the CV which would be intended to force format writers to use certain
well-defined values instead of whatever kind of capitalization and
spacing they wish. That proposal still stands and I'd like to hear
feedback on it.
But I think we should agree on some basic requirements and then evaluate
proposals from there (this was probably done in one of your meetings or
teleconferences, but I don't recall such a requirements list being
posted on this mailing list). According to the specification document,
there is a requirement to have a long-term, unchanging specification,
mainly due to vendor interests it seems, which of course in the changing
field of MS also means a requirement of a companion CV. I happen to
agree with the idea of having a long-term, unchanging specification with
a CV, even though I don't intend to use the CV very much, if at all.
From a previous post by Eric Deutsch in this thread:
<cvParam cvLabel="MS" accession="MS:1000031" name="instrument model"
value="LCQ Deca"/>
<cvParam cvLabel="MS" accession="MS:1000031" name="instrument model"
value="LCQ DECA"/>
<cvParam cvLabel="MS" accession="MS:1000031" name="instrument model"
value="LTQ FT"/>
<cvParam cvLabel="MS" accession="MS:1000031" name="instrument model"
value="LTQ-FT"/>
<cvParam cvLabel="MS" accession="MS:1000031" name="instrument model"
value="LTQFT"/>
OK, so because of this legitimate concern we have another requirement:
the spec must allow defining a restricted value set for categories like
"instrument model." I do not see a reason for a requirement that the
spec must use accession numbers to enumerate those values. Consider, for
example, that we have not specified whether the cvLabel parameter is
case sensitive or not. Suppose a naughty writer starts using lowercase
instead of uppercase for the cvLabel, or for the cvLabel prefix on the
accession number. Even worse, suppose the case sensitivity between the
accession number's prefix and the cvLabel don't match. The best we can
do is specify things like case sensitivity for these issues or force a
certain case in certain contexts. We can't prevent people from writing
broken instances of the specification.
Based on the above requirement, one concern that I have (and I think
many others do too, because frankly I get a strong impression that many
people who want to use this spec don't care about being CV aware) is
that a writer should be able to write a cvParam with a value that is not
in the allowed value set of the CV without making readers have no clue
what the value is actually indicating. In other words, regardless of
whether a reader is CV aware or not, a (machine OR human) reader should
be able to glean the purpose of an unknown value in a cvParam via some
kind of category specification (e.g. "instrument model", or by the
category's accession number). If this is accepted as a requirement, it
practically eliminates method A as an option because it provides no
indication of what category the unknown cvParam's value belongs to.
There are perhaps other requirements for the cvParam, but I'll let
others fill them in. My new proposed compromise is to split values into
a valueAccession and a valueName, just like the optional unitAccession
and unitName. The two value attributes would not be optional like the
unit attributes, though. A special CV accession number could be
allocated to indicate an "unrestricted" value, in which case the reader
would use the valueName as the value. Alternatively, the reader could
read the accession attribute (which in this compromise would always
indicate a category's accession number) and choose based on that whether
to look up the valueAccession in the CV or to use the valueName
verbatim. So the SRM spectrum example would become:
<cvParam cvLabel="MS" accession="MS:1000035" name="spectrum type"
valueAccession="MS:1000583" valueName="SRM spectrum"/>
I like ketchup on my worms, how bout you?
-Matt Chambers
Vanderbilt MSRC
For reference, AFAIK this is the last post in this thread:
Joshua Tasman wrote:
> Hi all,
>
> Actually, I agree that we'd be better served if more structure was
> applied at the xml schema level, but since design decisions have
> already been made and it seems we're past the point of changing them,
> I think we should stick to a consistent flavor.
>
> I'd propose finding most instances in the schema where attributes and
> values are defined by the xml schema and replacing them with cvParams.
> If we're reliant on the OBO, let's completely get away from any
> parsing of human-readable elements. In the OBO, we already have
> inconsistent capitalization for source file types: "mzData File" vs
> "wiff file". Let's simplify things and rely on the nice clean accession.
>
> From a look through the instance document, some examples:
>
> I'd like to see soureFileType as a sub cvParam with a specific
> accession reference, vs attribute:
> <sourceFile id="1" sourceFileName="tiny1.RAW"
> sourceFileLocation="file://F:/data/Exp01" sourceFileType="Xcalibur RAW
> file">
>
> contactInfo could use value'd cvParams for name, institution, etc, or
> any other added features like email, phone, etc.
>
> fileChecksum's type should be a cv accession, instead of:
> <fileChecksum type="Sha1">
>
> In spectrum, spectrumType should be an cvParam, not attribute:
> <spectrum id="S19" scanNumber="19" spectrumType="MSn" msLevel="1">
>
> In binaryDataArray, attributes compressionType and dataType should be
> cvParams:
> <binaryDataArray dataType="64-bit float" compressionType="none"
> arrayLength="43" encodedLength="5000" dataProcessingRef="Xcalibur
> Processing">
>
>
> Josh
>
|