From: Eric D. <ede...@sy...> - 2008-07-23 02:02:29
|
Hi Matt, thanks, this looks well thought out, although I'm not sure I fully understand the syntax you're proposing. Can you provide one or two examples of each type? Thanks! Eric > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Matthew Chambers > Sent: Tuesday, July 22, 2008 3:15 PM > To: Mass spectrometry standard development > Subject: [Psidev-ms-dev] Nailing down NativeID > > Hi all, > > I think it's overdue that we get this part of mzML formally specified - > at least for the vendors and generic formats. I am proposing a draft of > nativeID formats, the place to put the formats in the specification > documents, and to have mzML instance documents explicitly reference the > format they are using. This explicit reference should be required for > semantic validation, but I'd also recommend that mzML readers that don't > find or ignore the nativeID format term specified simply treat the > nativeID as a free string (rendering it pretty useless, but at least > there would be a defined way to handle it). The terms would be placed in > the fileContent element to define the format for all nativeIDs in the > file. > > I propose that the nativeID formats become CV terms, and that the term > definitions define the formats unambiguously in a machine-readable way > that a semantic validator can use to validate the nativeIDs. I will > list my format drafts in OBO format. Each specific native format > definition is a comma-delimited list of key-value pairs, where the key > is the axis name (e.g. "scan number") and the value specifies the format > of the axis in one of two ways: > 1) a Perl-style regular expression that can provide semantic/logical > choices for strings (e.g. "controller type" can be either "MS" or "PDA" > or "UV" etc.) > 2) an XSD type that can specify unrestricted strings or a numeric type > (possibly with semantic restrictions) > > I didn't actually need to use a regex for any of the formats below, but > I can see their usefulness. For example, they would be needed if I'm > wrong about Xcalibur and it makes more sense for Thermo spectra to use > controller names instead of controller numbers. > > Obviously the syntax of the format definitions is flexible if people > have better ideas (ideally one that could combine the power of regex and > XSD; "infinite cosmic power, itty bitty living space!"). > > [Term] > id: MS:x > name: native spectrum identifier > def: "References a spectrum in a native (non-mzML) spectrum source > according to a strict format. The format is dependent on the type of the > spectra source." [PSI:MS] > is_a: MS:1000524 ! data file content > > [Term] > id: MS:x > name: native chromatogram identifier > def: "References a chromatogram in a native (non-mzML) chromatogram > source according to a strict format. The format is dependent on the type > of the chromatogram source." [PSI:MS] > is_a: MS:1000524 ! data file content > ! note: I don't have any instances of native chromatogram identifiers, > but I can conceive of the possibilities! > > [Term] > id: MS:x > name: Thermo RAW spectrum identifier > def: "controller type=xsd:nonNegativeInteger,scan > number=xsd:positiveInteger" [PSI:MS] > is_a: MS:x ! native spectrum identifier > ! note to Jim: apparently, Xcalibur can handle multiple controllers of > the same type, so is a choice between strings still appropriate? > > [Term] > id: MS:x > name: Waters RAW spectrum identifier > def: "function number=xsd:positiveInteger,process > number=xsd:nonNegativeInteger,scan number=xsd:positiveInteger" [PSI:MS] > is_a: MS:x ! native spectrum identifier > ! note: is process number ever non-zero? > > [Term] > id: MS:x > name: WIFF spectrum identifier > def: "sample number=xsd:nonNegativeInteger,period > number=xsd:positiveInteger,cycle number=xsd:positiveInteger,experiment > number=xsd:positiveInteger" [PSI:MS] > is_a: MS:x ! native spectrum identifier > [Term] > id: MS:x > name: ABI Oracle database spectrum identifier > def: "" [PSI:MS] > is_a: MS:x ! native spectrum identifier > ! note: need expertise here; alternatively, we could lump these spectra > in with DTA/PKL nativeIDs (see below) when they are extracted to T2Ds > > [Term] > id: MS:x > name: Bruker spectrum identifier > def: "" [PSI:MS] > is_a: MS:x ! native spectrum identifier > ! note: need expertise here. AFAIK, each Bruker YEP/BAF/FID spectrum is > natively a single file, so that seems to make nativeID irrelevant and > sourceFile[Ref] critical > > [Term] > id: MS:x > name: Shimadzu spectrum identifier > def: "" [PSI:MS] > is_a: MS:x ! native spectrum identifier > ! note: need expertise here > > [Term] > id: MS:x > name: MGF spectrum identifier > def: "index=xsd:nonNegativeInteger" [PSI:MS] > is_a: MS:x ! native spectrum identifier > ! note: TITLE attributes are optional, so the index into the file is the > only reliable source (TITLE can be used for the string id if present) > > [Term] > id: MS:x > name: mzData/mzXML/MS2 spectrum identifier > def: "scan number=xsd:positiveInteger" [PSI:MS] > is_a: MS:x ! native spectrum identifier > [Term] > id: MS:x > name: PKL/DTA spectrum identifier > def: "" [PSI:MS] > is_a: MS:x ! native spectrum identifier > ! note: like Bruker, a PKL or DTA could be standalone so AFAIK the only > way to reliably reference it is via sourceFileRef > > ------------------------------------------------------------------------ - > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the > world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |