[Psidev-ms-dev] Nailing down NativeID

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi all,

I think it's overdue that we get this part of mzML formally specified - 
at least for the vendors and generic formats. I am proposing a draft of 
nativeID formats, the place to put the formats in the specification 
documents, and to have mzML instance documents explicitly reference the 
format they are using. This explicit reference should be required for 
semantic validation, but I'd also recommend that mzML readers that don't 
find or ignore the nativeID format term specified simply treat the 
nativeID as a free string (rendering it pretty useless, but at least 
there would be a defined way to handle it). The terms would be placed in 
the fileContent element to define the format for all nativeIDs in the file.

I propose that the nativeID formats become CV terms, and that the term 
definitions define the formats unambiguously in a machine-readable way 
that a semantic validator can use to validate the nativeIDs.  I will 
list my format drafts in OBO format. Each specific native format 
definition is a comma-delimited list of key-value pairs, where the key 
is the axis name (e.g. "scan number") and the value specifies the format 
of the axis in one of two ways:
1) a Perl-style regular expression that can provide semantic/logical 
choices for strings (e.g. "controller type" can be either "MS" or "PDA" 
or "UV" etc.)
2) an XSD type that can specify unrestricted strings or a numeric type 
(possibly with semantic restrictions)

I didn't actually need to use a regex for any of the formats below, but 
I can see their usefulness. For example, they would be needed if I'm 
wrong about Xcalibur and it makes more sense for Thermo spectra to use 
controller names instead of controller numbers.

Obviously the syntax of the format definitions is flexible if people 
have better ideas (ideally one that could combine the power of regex and 
XSD; "infinite cosmic power, itty bitty living space!").

[Term]
id: MS:x
name: native spectrum identifier
def: "References a spectrum in a native (non-mzML) spectrum source 
according to a strict format. The format is dependent on the type of the 
spectra source." [PSI:MS]
is_a: MS:1000524 ! data file content

[Term]
id: MS:x
name: native chromatogram identifier
def: "References a chromatogram in a native (non-mzML) chromatogram 
source according to a strict format. The format is dependent on the type 
of the chromatogram source." [PSI:MS]
is_a: MS:1000524 ! data file content
! note: I don't have any instances of native chromatogram identifiers, 
but I can conceive of the possibilities!

[Term]
id: MS:x
name: Thermo RAW spectrum identifier
def: "controller type=xsd:nonNegativeInteger,scan 
number=xsd:positiveInteger" [PSI:MS]
is_a: MS:x ! native spectrum identifier
! note to Jim: apparently, Xcalibur can handle multiple controllers of 
the same type, so is a choice between strings still appropriate?

[Term]
id: MS:x
name: Waters RAW spectrum identifier
def: "function number=xsd:positiveInteger,process 
number=xsd:nonNegativeInteger,scan number=xsd:positiveInteger" [PSI:MS]
is_a: MS:x ! native spectrum identifier
! note: is process number ever non-zero?

[Term]
id: MS:x
name: WIFF spectrum identifier
def: "sample number=xsd:nonNegativeInteger,period 
number=xsd:positiveInteger,cycle number=xsd:positiveInteger,experiment 
number=xsd:positiveInteger" [PSI:MS]
is_a: MS:x ! native spectrum identifier
[Term]
id: MS:x
name: ABI Oracle database spectrum identifier
def: "" [PSI:MS]
is_a: MS:x ! native spectrum identifier
! note: need expertise here; alternatively, we could lump these spectra 
in with DTA/PKL nativeIDs (see below) when they are extracted to T2Ds

[Term]
id: MS:x
name: Bruker spectrum identifier
def: "" [PSI:MS]
is_a: MS:x ! native spectrum identifier
! note: need expertise here. AFAIK, each Bruker YEP/BAF/FID spectrum is 
natively a single file, so that seems to make nativeID irrelevant and 
sourceFile[Ref] critical

[Term]
id: MS:x
name: Shimadzu spectrum identifier
def: "" [PSI:MS]
is_a: MS:x ! native spectrum identifier
! note: need expertise here

[Term]
id: MS:x
name: MGF spectrum identifier
def: "index=xsd:nonNegativeInteger" [PSI:MS]
is_a: MS:x ! native spectrum identifier
! note: TITLE attributes are optional, so the index into the file is the 
only reliable source (TITLE can be used for the string id if present)

[Term]
id: MS:x
name: mzData/mzXML/MS2 spectrum identifier
def: "scan number=xsd:positiveInteger" [PSI:MS]
is_a: MS:x ! native spectrum identifier
[Term]
id: MS:x
name: PKL/DTA spectrum identifier
def: "" [PSI:MS]
is_a: MS:x ! native spectrum identifier
! note: like Bruker, a PKL or DTA could be standalone so AFAIK the only 
way to reliably reference it is via sourceFileRef