From: Matt C. <mat...@va...> - 2008-09-18 16:44:42
|
I prefer nativeIDs without the labels. Labels work better and can be verbose in the arbitrary string 'id'; nativeID is provided primarily for machine readability and guaranteed formatting so to me it just makes more sense to "KISS" (keep it small and simple). :) Since the two types of ids co-exist, human interpretation of the nativeID is not an issue. This is good discussion though, we just need more of it - even it's a simple assent to the proposal (or the alternatives). :) Thanks, -Matt Darren Kessner wrote: > I think Fredrik has good points, and I like his idea of using short > labels. > > An alternative to consider is 3-4 letter abbreviations (using Matt's > examples): > > Thermo: > "con0 scan1" > "scan2" > > Waters: > "fun1 proc0 scan1" > > WIFF: > "sam0 per1 cyc1 exp2" > > > Darren > > > On Sep 18, 2008, at 12:18 PM, Fredrik Levander wrote: > > >> Hi Matt, >> >> I agree that the Native ID is a very important feature of the format >> and >> that it needs to be settled. Your solution is elegant, I can see two >> disadvantages though: >> 1) It is not straightforward to intepret the nativeID by visual >> inspection, since you need to look in the CV to find out what order >> the >> numbers are in. >> 2) If the number in one axis is unknown or irrelevant for the setup, >> it >> could be a problem to have it as required. One could imagine just >> specifying an empty field instead of a number in that situation >> though. >> >> An alternative is to have reserved characters in the native id: >> S = scan >> F = function >> C = controller >> P = process >> Cy (or maybe Y) = Cycle >> E = Experiment >> Pe = Period >> Other reserved letters can be added as needed. >> >> Then one can specify these as required for the instrumental setup. >> Scan 1 would be "S1" >> Function1, Scan 1 would be "F1S1" or "S1F1" or "S1,F1", the later if >> comma separation is wanted. >> If a certain order of the axes is wanted this can be imposed by regex. >> A problem with this solution could be if an axis needs to contain >> letters instead of numbers, but it is doable, at least with comma >> separation. >> >> A combination of the CV approach and initiating letters could maybe >> also >> be an alternative: >> >> [Term] >> id: MS:x >> name: Waters RAW spectrum identifier >> def: "F:function number=xsd:positiveInteger (optional),P:process >> number=xsd:nonNegativeInteger (optional),S:scan >> number=xsd:positiveInteger" >> >> Valid nativeIDs are: "F1,S1" and "F1,P1,S1", but not "F1" >> >> It would be good to have some input on what is required to report >> for the rest of the vendor instruments too, but I think the >> nativeID format should be settled soon. >> >> Fredrik >> >> >> >> >> >> >> >> >> >> >> >> >> Matthew Chambers skrev: >> >>> It's been 4 months since we released the format and we still can't >>> point >>> implementors to documentation specifying what nativeIDs must look >>> like. >>> Can we please comment on my proposal or get other proposals to >>> discuss? >>> I am not averse to initially leaving out the terms that I couldn't >>> come >>> up with well-defined formats for (Bruker, PKL, ABI Oracle, Shimadzu). >>> >>> -Matt >>> >>> >>> -------- Original Message -------- >>> Subject: Re: [Psidev-ms-dev] Nailing down NativeID >>> Date: Tue, 22 Jul 2008 21:28:34 -0500 >>> From: Matt Chambers <mat...@va...> >>> Reply-To: Mass spectrometry standard development >>> <psi...@li...> >>> To: Mass spectrometry standard development >>> <psi...@li...> >>> References: <488...@va...> >>> <5BE...@he...> >>> >>> >>> >>> Hi Eric, >>> >>> Of course, sorry I should have realized that the axis name concept >>> would >>> confuse matters. The axis names are just there so that a machine >>> reading >>> the format specification can associate each comma delimited section >>> (what I'm calling an "axis") with a logical name. >>> >>> Thermo: >>> 0,1 (controller 0, scan 1) >>> 0,2 >>> 0,3 >>> 1,1 (controller 1, scan 1) >>> >>> Waters: >>> 1,0,1 (function 1, process 0, scan 1) >>> 1,0,2 >>> 1,0,3 >>> 2,0,1 (function 2, process 0, scan 1) >>> 2,0,2 >>> 2,0,3 >>> >>> WIFF: >>> 0,1,1,2 (sample 0, period 1, cycle 1, experiment 2) >>> 0,1,1,3 >>> 0,1,2,2 >>> 0,1,2,3 >>> 0,1,2,4 >>> 0,1,3,2 >>> 0,1,3,3 >>> 0,1,3,2 >>> 0,1,4,2 >>> 1,1,1,2 >>> 1,1,1,3 >>> >>> When a machine reads the WIFF definition, it will know that the >>> fields >>> mean (in order) "sample #", "period #", "cycle #", "experiment #". >>> The >>> detailed meaning of those names won't be covered by the format >>> definition, but it's conceivable that we define those names in >>> detail as >>> separate CV terms. Remember the main idea for nativeID is to map a >>> spectrum back to a source file in a way that is more intuitive than a >>> simple index, so being able to use them to look up the spectrum via a >>> native interface is important. >>> >>> I think we can safely require that the nativeIDs always use all the >>> fields even if for an entire run all of a particular axis has the >>> same >>> value. For example, in Thermo data the controller number is almost >>> always going to be the number corresponding with the MS controller >>> (although the actual number is not guaranteed to be 0). For backwards >>> compatibility with tools which expect Thermo ids to be scan numbers >>> with >>> an implicit assumption about the controller, it is very reasonable to >>> require those tools to simply parse the id. Parsing a comma-delimited >>> pair is far easier than all the other crap one must do to get proper >>> mzML support. ;) In particular for you Eric and other TPP users, the >>> RAMP adapter that pwiz uses will pass only the scan number (and make >>> sure the spectrum is a mass spectrum). >>> >>> -Matt >>> >>> >>> Eric Deutsch wrote: >>> >>> >>>> Hi Matt, thanks, this looks well thought out, although I'm not >>>> sure I >>>> fully understand the syntax you're proposing. Can you provide one >>>> or two >>>> examples of each type? >>>> >>>> Thanks! >>>> Eric >>>> >>>> >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: psi...@li... >>>>> >>>>> >>>>> >>>> [mailto:psidev-ms-dev- >>>> >>>> >>>> >>>>> bo...@li...] On Behalf Of Matthew Chambers >>>>> Sent: Tuesday, July 22, 2008 3:15 PM >>>>> To: Mass spectrometry standard development >>>>> Subject: [Psidev-ms-dev] Nailing down NativeID >>>>> >>>>> Hi all, >>>>> >>>>> I think it's overdue that we get this part of mzML formally >>>>> specified >>>>> >>>>> >>>>> >>>> - >>>> >>>> >>>> >>>>> at least for the vendors and generic formats. I am proposing a >>>>> draft >>>>> >>>>> >>>>> >>>> of >>>> >>>> >>>> >>>>> nativeID formats, the place to put the formats in the specification >>>>> documents, and to have mzML instance documents explicitly reference >>>>> >>>>> >>>>> >>>> the >>>> >>>> >>>> >>>>> format they are using. This explicit reference should be required >>>>> for >>>>> semantic validation, but I'd also recommend that mzML readers that >>>>> >>>>> >>>>> >>>> don't >>>> >>>> >>>> >>>>> find or ignore the nativeID format term specified simply treat the >>>>> nativeID as a free string (rendering it pretty useless, but at >>>>> least >>>>> there would be a defined way to handle it). The terms would be >>>>> placed >>>>> >>>>> >>>>> >>>> in >>>> >>>> >>>> >>>>> the fileContent element to define the format for all nativeIDs in >>>>> the >>>>> file. >>>>> >>>>> I propose that the nativeID formats become CV terms, and that the >>>>> term >>>>> definitions define the formats unambiguously in a machine- >>>>> readable way >>>>> that a semantic validator can use to validate the nativeIDs. I >>>>> will >>>>> list my format drafts in OBO format. Each specific native format >>>>> definition is a comma-delimited list of key-value pairs, where >>>>> the key >>>>> is the axis name (e.g. "scan number") and the value specifies the >>>>> >>>>> >>>>> >>>> format >>>> >>>> >>>> >>>>> of the axis in one of two ways: >>>>> 1) a Perl-style regular expression that can provide semantic/ >>>>> logical >>>>> choices for strings (e.g. "controller type" can be either "MS" or >>>>> >>>>> >>>>> >>>> "PDA" >>>> >>>> >>>> >>>>> or "UV" etc.) >>>>> 2) an XSD type that can specify unrestricted strings or a numeric >>>>> type >>>>> (possibly with semantic restrictions) >>>>> >>>>> I didn't actually need to use a regex for any of the formats below, >>>>> >>>>> >>>>> >>>> but >>>> >>>> >>>> >>>>> I can see their usefulness. For example, they would be needed if >>>>> I'm >>>>> wrong about Xcalibur and it makes more sense for Thermo spectra >>>>> to use >>>>> controller names instead of controller numbers. >>>>> >>>>> Obviously the syntax of the format definitions is flexible if >>>>> people >>>>> have better ideas (ideally one that could combine the power of >>>>> regex >>>>> >>>>> >>>>> >>>> and >>>> >>>> >>>> >>>>> XSD; "infinite cosmic power, itty bitty living space!"). >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: native spectrum identifier >>>>> def: "References a spectrum in a native (non-mzML) spectrum source >>>>> according to a strict format. The format is dependent on the type >>>>> of >>>>> >>>>> >>>>> >>>> the >>>> >>>> >>>> >>>>> spectra source." [PSI:MS] >>>>> is_a: MS:1000524 ! data file content >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: native chromatogram identifier >>>>> def: "References a chromatogram in a native (non-mzML) chromatogram >>>>> source according to a strict format. The format is dependent on the >>>>> >>>>> >>>>> >>>> type >>>> >>>> >>>> >>>>> of the chromatogram source." [PSI:MS] >>>>> is_a: MS:1000524 ! data file content >>>>> ! note: I don't have any instances of native chromatogram >>>>> identifiers, >>>>> but I can conceive of the possibilities! >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: Thermo RAW spectrum identifier >>>>> def: "controller type=xsd:nonNegativeInteger,scan >>>>> number=xsd:positiveInteger" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note to Jim: apparently, Xcalibur can handle multiple >>>>> controllers of >>>>> the same type, so is a choice between strings still appropriate? >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: Waters RAW spectrum identifier >>>>> def: "function number=xsd:positiveInteger,process >>>>> number=xsd:nonNegativeInteger,scan number=xsd:positiveInteger" >>>>> >>>>> >>>>> >>>> [PSI:MS] >>>> >>>> >>>> >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: is process number ever non-zero? >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: WIFF spectrum identifier >>>>> def: "sample number=xsd:nonNegativeInteger,period >>>>> number=xsd:positiveInteger,cycle >>>>> number=xsd:positiveInteger,experiment >>>>> number=xsd:positiveInteger" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> [Term] >>>>> id: MS:x >>>>> name: ABI Oracle database spectrum identifier >>>>> def: "" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: need expertise here; alternatively, we could lump these >>>>> >>>>> >>>>> >>>> spectra >>>> >>>> >>>> >>>>> in with DTA/PKL nativeIDs (see below) when they are extracted to >>>>> T2Ds >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: Bruker spectrum identifier >>>>> def: "" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: need expertise here. AFAIK, each Bruker YEP/BAF/FID >>>>> spectrum >>>>> >>>>> >>>>> >>>> is >>>> >>>> >>>> >>>>> natively a single file, so that seems to make nativeID irrelevant >>>>> and >>>>> sourceFile[Ref] critical >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: Shimadzu spectrum identifier >>>>> def: "" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: need expertise here >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: MGF spectrum identifier >>>>> def: "index=xsd:nonNegativeInteger" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: TITLE attributes are optional, so the index into the file >>>>> is >>>>> >>>>> >>>>> >>>> the >>>> >>>> >>>> >>>>> only reliable source (TITLE can be used for the string id if >>>>> present) >>>>> >>>>> [Term] >>>>> id: MS:x >>>>> name: mzData/mzXML/MS2 spectrum identifier >>>>> def: "scan number=xsd:positiveInteger" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> [Term] >>>>> id: MS:x >>>>> name: PKL/DTA spectrum identifier >>>>> def: "" [PSI:MS] >>>>> is_a: MS:x ! native spectrum identifier >>>>> ! note: like Bruker, a PKL or DTA could be standalone so AFAIK the >>>>> >>>>> >>>>> >>>> only >>>> >>>> >>>> >>>>> way to reliably reference it is via sourceFileRef >>>>> >>>>> >>>>> >>>>> >>>>> > |