From: Darren K. <Dar...@cs...> - 2008-09-18 15:35:41
|
I think Fredrik has good points, and I like his idea of using short labels. An alternative to consider is 3-4 letter abbreviations (using Matt's examples): Thermo: "con0 scan1" "scan2" Waters: "fun1 proc0 scan1" WIFF: "sam0 per1 cyc1 exp2" Darren On Sep 18, 2008, at 12:18 PM, Fredrik Levander wrote: > Hi Matt, > > I agree that the Native ID is a very important feature of the format > and > that it needs to be settled. Your solution is elegant, I can see two > disadvantages though: > 1) It is not straightforward to intepret the nativeID by visual > inspection, since you need to look in the CV to find out what order > the > numbers are in. > 2) If the number in one axis is unknown or irrelevant for the setup, > it > could be a problem to have it as required. One could imagine just > specifying an empty field instead of a number in that situation > though. > > An alternative is to have reserved characters in the native id: > S = scan > F = function > C = controller > P = process > Cy (or maybe Y) = Cycle > E = Experiment > Pe = Period > Other reserved letters can be added as needed. > > Then one can specify these as required for the instrumental setup. > Scan 1 would be "S1" > Function1, Scan 1 would be "F1S1" or "S1F1" or "S1,F1", the later if > comma separation is wanted. > If a certain order of the axes is wanted this can be imposed by regex. > A problem with this solution could be if an axis needs to contain > letters instead of numbers, but it is doable, at least with comma > separation. > > A combination of the CV approach and initiating letters could maybe > also > be an alternative: > > [Term] > id: MS:x > name: Waters RAW spectrum identifier > def: "F:function number=xsd:positiveInteger (optional),P:process > number=xsd:nonNegativeInteger (optional),S:scan > number=xsd:positiveInteger" > > Valid nativeIDs are: "F1,S1" and "F1,P1,S1", but not "F1" > > It would be good to have some input on what is required to report > for the rest of the vendor instruments too, but I think the > nativeID format should be settled soon. > > Fredrik > > > > > > > > > > > > > Matthew Chambers skrev: >> It's been 4 months since we released the format and we still can't >> point >> implementors to documentation specifying what nativeIDs must look >> like. >> Can we please comment on my proposal or get other proposals to >> discuss? >> I am not averse to initially leaving out the terms that I couldn't >> come >> up with well-defined formats for (Bruker, PKL, ABI Oracle, Shimadzu). >> >> -Matt >> >> >> -------- Original Message -------- >> Subject: Re: [Psidev-ms-dev] Nailing down NativeID >> Date: Tue, 22 Jul 2008 21:28:34 -0500 >> From: Matt Chambers <mat...@va...> >> Reply-To: Mass spectrometry standard development >> <psi...@li...> >> To: Mass spectrometry standard development >> <psi...@li...> >> References: <488...@va...> >> <5BE...@he...> >> >> >> >> Hi Eric, >> >> Of course, sorry I should have realized that the axis name concept >> would >> confuse matters. The axis names are just there so that a machine >> reading >> the format specification can associate each comma delimited section >> (what I'm calling an "axis") with a logical name. >> >> Thermo: >> 0,1 (controller 0, scan 1) >> 0,2 >> 0,3 >> 1,1 (controller 1, scan 1) >> >> Waters: >> 1,0,1 (function 1, process 0, scan 1) >> 1,0,2 >> 1,0,3 >> 2,0,1 (function 2, process 0, scan 1) >> 2,0,2 >> 2,0,3 >> >> WIFF: >> 0,1,1,2 (sample 0, period 1, cycle 1, experiment 2) >> 0,1,1,3 >> 0,1,2,2 >> 0,1,2,3 >> 0,1,2,4 >> 0,1,3,2 >> 0,1,3,3 >> 0,1,3,2 >> 0,1,4,2 >> 1,1,1,2 >> 1,1,1,3 >> >> When a machine reads the WIFF definition, it will know that the >> fields >> mean (in order) "sample #", "period #", "cycle #", "experiment #". >> The >> detailed meaning of those names won't be covered by the format >> definition, but it's conceivable that we define those names in >> detail as >> separate CV terms. Remember the main idea for nativeID is to map a >> spectrum back to a source file in a way that is more intuitive than a >> simple index, so being able to use them to look up the spectrum via a >> native interface is important. >> >> I think we can safely require that the nativeIDs always use all the >> fields even if for an entire run all of a particular axis has the >> same >> value. For example, in Thermo data the controller number is almost >> always going to be the number corresponding with the MS controller >> (although the actual number is not guaranteed to be 0). For backwards >> compatibility with tools which expect Thermo ids to be scan numbers >> with >> an implicit assumption about the controller, it is very reasonable to >> require those tools to simply parse the id. Parsing a comma-delimited >> pair is far easier than all the other crap one must do to get proper >> mzML support. ;) In particular for you Eric and other TPP users, the >> RAMP adapter that pwiz uses will pass only the scan number (and make >> sure the spectrum is a mass spectrum). >> >> -Matt >> >> >> Eric Deutsch wrote: >> >>> Hi Matt, thanks, this looks well thought out, although I'm not >>> sure I >>> fully understand the syntax you're proposing. Can you provide one >>> or two >>> examples of each type? >>> >>> Thanks! >>> Eric >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: psi...@li... >>>> >>>> >>> [mailto:psidev-ms-dev- >>> >>> >>>> bo...@li...] On Behalf Of Matthew Chambers >>>> Sent: Tuesday, July 22, 2008 3:15 PM >>>> To: Mass spectrometry standard development >>>> Subject: [Psidev-ms-dev] Nailing down NativeID >>>> >>>> Hi all, >>>> >>>> I think it's overdue that we get this part of mzML formally >>>> specified >>>> >>>> >>> - >>> >>> >>>> at least for the vendors and generic formats. I am proposing a >>>> draft >>>> >>>> >>> of >>> >>> >>>> nativeID formats, the place to put the formats in the specification >>>> documents, and to have mzML instance documents explicitly reference >>>> >>>> >>> the >>> >>> >>>> format they are using. This explicit reference should be required >>>> for >>>> semantic validation, but I'd also recommend that mzML readers that >>>> >>>> >>> don't >>> >>> >>>> find or ignore the nativeID format term specified simply treat the >>>> nativeID as a free string (rendering it pretty useless, but at >>>> least >>>> there would be a defined way to handle it). The terms would be >>>> placed >>>> >>>> >>> in >>> >>> >>>> the fileContent element to define the format for all nativeIDs in >>>> the >>>> file. >>>> >>>> I propose that the nativeID formats become CV terms, and that the >>>> term >>>> definitions define the formats unambiguously in a machine- >>>> readable way >>>> that a semantic validator can use to validate the nativeIDs. I >>>> will >>>> list my format drafts in OBO format. Each specific native format >>>> definition is a comma-delimited list of key-value pairs, where >>>> the key >>>> is the axis name (e.g. "scan number") and the value specifies the >>>> >>>> >>> format >>> >>> >>>> of the axis in one of two ways: >>>> 1) a Perl-style regular expression that can provide semantic/ >>>> logical >>>> choices for strings (e.g. "controller type" can be either "MS" or >>>> >>>> >>> "PDA" >>> >>> >>>> or "UV" etc.) >>>> 2) an XSD type that can specify unrestricted strings or a numeric >>>> type >>>> (possibly with semantic restrictions) >>>> >>>> I didn't actually need to use a regex for any of the formats below, >>>> >>>> >>> but >>> >>> >>>> I can see their usefulness. For example, they would be needed if >>>> I'm >>>> wrong about Xcalibur and it makes more sense for Thermo spectra >>>> to use >>>> controller names instead of controller numbers. >>>> >>>> Obviously the syntax of the format definitions is flexible if >>>> people >>>> have better ideas (ideally one that could combine the power of >>>> regex >>>> >>>> >>> and >>> >>> >>>> XSD; "infinite cosmic power, itty bitty living space!"). >>>> >>>> [Term] >>>> id: MS:x >>>> name: native spectrum identifier >>>> def: "References a spectrum in a native (non-mzML) spectrum source >>>> according to a strict format. The format is dependent on the type >>>> of >>>> >>>> >>> the >>> >>> >>>> spectra source." [PSI:MS] >>>> is_a: MS:1000524 ! data file content >>>> >>>> [Term] >>>> id: MS:x >>>> name: native chromatogram identifier >>>> def: "References a chromatogram in a native (non-mzML) chromatogram >>>> source according to a strict format. The format is dependent on the >>>> >>>> >>> type >>> >>> >>>> of the chromatogram source." [PSI:MS] >>>> is_a: MS:1000524 ! data file content >>>> ! note: I don't have any instances of native chromatogram >>>> identifiers, >>>> but I can conceive of the possibilities! >>>> >>>> [Term] >>>> id: MS:x >>>> name: Thermo RAW spectrum identifier >>>> def: "controller type=xsd:nonNegativeInteger,scan >>>> number=xsd:positiveInteger" [PSI:MS] >>>> is_a: MS:x ! native spectrum identifier >>>> ! note to Jim: apparently, Xcalibur can handle multiple >>>> controllers of >>>> the same type, so is a choice between strings still appropriate? >>>> >>>> [Term] >>>> id: MS:x >>>> name: Waters RAW spectrum identifier >>>> def: "function number=xsd:positiveInteger,process >>>> number=xsd:nonNegativeInteger,scan number=xsd:positiveInteger" >>>> >>>> >>> [PSI:MS] >>> >>> >>>> is_a: MS:x ! native spectrum identifier >>>> ! note: is process number ever non-zero? >>>> >>>> [Term] >>>> id: MS:x >>>> name: WIFF spectrum identifier >>>> def: "sample number=xsd:nonNegativeInteger,period >>>> number=xsd:positiveInteger,cycle >>>> number=xsd:positiveInteger,experiment >>>> number=xsd:positiveInteger" [PSI:MS] >>>> is_a: MS:x ! native spectrum identifier >>>> [Term] >>>> id: MS:x >>>> name: ABI Oracle database spectrum identifier >>>> def: "" [PSI:MS] >>>> is_a: MS:x ! native spectrum identifier >>>> ! note: need expertise here; alternatively, we could lump these >>>> >>>> >>> spectra >>> >>> >>>> in with DTA/PKL nativeIDs (see below) when they are extracted to >>>> T2Ds >>>> >>>> [Term] >>>> id: MS:x >>>> name: Bruker spectrum identifier >>>> def: "" [PSI:MS] >>>> is_a: MS:x ! native spectrum identifier >>>> ! note: need expertise here. AFAIK, each Bruker YEP/BAF/FID >>>> spectrum >>>> >>>> >>> is >>> >>> >>>> natively a single file, so that seems to make nativeID irrelevant >>>> and >>>> sourceFile[Ref] critical >>>> >>>> [Term] >>>> id: MS:x >>>> name: Shimadzu spectrum identifier >>>> def: "" [PSI:MS] >>>> is_a: MS:x ! native spectrum identifier >>>> ! note: need expertise here >>>> >>>> [Term] >>>> id: MS:x >>>> name: MGF spectrum identifier >>>> def: "index=xsd:nonNegativeInteger" [PSI:MS] >>>> is_a: MS:x ! native spectrum identifier >>>> ! note: TITLE attributes are optional, so the index into the file >>>> is >>>> >>>> >>> the >>> >>> >>>> only reliable source (TITLE can be used for the string id if >>>> present) >>>> >>>> [Term] >>>> id: MS:x >>>> name: mzData/mzXML/MS2 spectrum identifier >>>> def: "scan number=xsd:positiveInteger" [PSI:MS] >>>> is_a: MS:x ! native spectrum identifier >>>> [Term] >>>> id: MS:x >>>> name: PKL/DTA spectrum identifier >>>> def: "" [PSI:MS] >>>> is_a: MS:x ! native spectrum identifier >>>> ! note: like Bruker, a PKL or DTA could be standalone so AFAIK the >>>> >>>> >>> only >>> >>> >>>> way to reliably reference it is via sourceFileRef >>>> >>>> >>>> >>>> >> >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win >> great prizes >> Grand prize is a trip for two to an Open Source event anywhere in >> the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is STRICTLY PROHIBITED. If you have received this message in error, please notify us immediately by calling (310) 423-6428 and destroy the related message. Thank You for your cooperation. |