From: Eric D. <ede...@sy...> - 2008-01-24 03:40:00
|
Hi Darren et al. for this discussion. A few points from me: 1) It was decided (although not set in stone) that we would like to have a unique id per spectrum per file for two reasons: a) At some point in the future if we have multiple runs per file (not supported in this first release) it will continue to be true that a spectrum id must be unique within a file. We decided that external references from analysisXML, for example, should be to a unique id rather than a scan number. b) We also left the door open to use LSIDs for the id. It can be any unique string, and thus if someone wants to use LSIDs for this, the door is open. 2) Also, according to the docs, the precursor scan references are by id rather than by scan number (although this appears to be incorrectly represented in tiny1.mzML: FIXME). I think the spectrumRef should be to an id and thus it needs to be in the index for things to work nicely. I see that Matt disagrees that spectrumRef should be to an id. 3) It is true that the current examples only have scanNumber in the index and not id. This should also be fixed before release. 4) The spec document indeed currently says that scanNumbers in the file must be in ascending order, but not necessarily sequential. The comment could perhaps be a little more clearly written. 5) I also do not see the need for the index attribute. I think it should be left out, but if there is still a clear need, we could add. 6) Regarding the length attribute in offset, I am neutral on this. This makes it a little harder for the writers. I can see that it would be easier for random access readers. Darren says he's not interested in it. Anyone else out there want to lobby for it? 7) Regarding going fully attribute: the intent was to preserve the format of the mzXML index as closely as possible to reduce coding work, but a better syntax could be entertained. I wouldn't object to: <offset scanNumber=3D"19" id=3D"S19" byteOffset=3D"3512"/> 8) Regarding enforcing some of these index rules, we should add them to the validator so that validator will do that. Comments on these items? Thanks, Eric > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Kessner, Darren E. > Sent: Wednesday, January 23, 2008 11:19 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] mzML indexing >=20 > Right -- that's why I included the alternative, though I could have been > less terse: >=20 >=20 >=20 > "The alternative is to require that the <index> entries are written in the > same order as the <spectrumList> entries." >=20 >=20 >=20 > I don't know if there is a way to enforce this... >=20 >=20 >=20 >=20 >=20 > Darren >=20 >=20 >=20 >=20 >=20 > ________________________________ >=20 > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Brian Pratt > Sent: Wednesday, January 23, 2008 11:08 AM > To: 'Mass spectrometry standard development' > Subject: Re: [Psidev-ms-dev] mzML indexing >=20 >=20 >=20 > Hi Darren, >=20 >=20 >=20 > I wonder about this possibility: >=20 >=20 >=20 > <index name=3D"spectrum" > >=20 > <offset index=3D"0" scanNumber=3D"19" id=3D"S19">3512</offset> >=20 > <offset index=3D"2" scanNumber=3D"23" id=3D"S23">16217</offset> >=20 > <offset index=3D"4" scanNumber=3D"25" id=3D"S25">17258</offset> >=20 > ... >=20 > </index> >=20 >=20 >=20 > If the response is "well, that's not legal, the index values must increase > in increments of 1 starting from 0" then I don't see why it's needed in > the first place - I'd expect that the index would just get snarfed up into > an array and you'd access the nth element to get info on the nth scan > appearing in the file. And if it is legal then I don't see what it's for... >=20 >=20 >=20 > Brian >=20 >=20 >=20 > ________________________________ >=20 > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Kessner, Darren E. > Sent: Wednesday, January 23, 2008 10:33 AM > To: Mass spectrometry standard development > Subject: [Psidev-ms-dev] mzML indexing >=20 >=20 >=20 > Hi all, >=20 >=20 >=20 > There are three ways to refer to a <spectrum> element -- by zero-based > index into the <spectrumList>, by 'scanNumber', and by 'id'. However, the > <index> currently only contains scanNumber. I would like to encode the > zero-based index and the id as well in the <index> as follows: >=20 >=20 >=20 > <index name=3D"spectrum" > >=20 > <offset index=3D"0" scanNumber=3D"19" id=3D"S19">3512</offset> >=20 > <offset index=3D"1" scanNumber=3D"20" id=3D"S20">16217</offset> >=20 > ... >=20 > </index> >=20 >=20 >=20 > Including the zero-based index is important to enable random access to the > mzML file when you don't know what scan numbers are contained in the file. > The alternative is to require that the <index> entries are written in the > same order as the <spectrumList> entries. >=20 >=20 >=20 > Including the 'id' in the <index> entries is necessary for efficiently > dereferencing a "spectrumRef" (e.g. in <precursor> element). Without > this, a dereference requires reading through the <spectrumList> to find > the right 'id'. This info could be read once and cached, but this still > defeats the purpose of indexing. >=20 >=20 >=20 >=20 >=20 > Darren >=20 >=20 >=20 >=20 >=20 >=20 >=20 > Darren Kessner >=20 > Scientific Programmer >=20 > Dar...@cs... >=20 > 310-423-9538 >=20 >=20 >=20 > Spielberg Family Center for Applied Proteomics >=20 > Cedars-Sinai Medical Center >=20 > http://www.sfcap.cshs.org/ >=20 >=20 >=20 >=20 >=20 > IMPORTANT WARNING: This message is intended for the use of the person or > entity to which it is addressed and may contain information that is > privileged and confidential, the disclosure of which is governed by > applicable law. If the reader of this message is not the intended > recipient, or the employee or agent responsible for delivering it to the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this information is STRICTLY PROHIBITED. >=20 > If you have received this message in error, please notify us immediately > by calling (310) 423-6428 and destroy the related message. Thank You for > your cooperation. >=20 > IMPORTANT WARNING: This message is intended for the use of the person or > entity to which it is addressed and may contain information that is > privileged and confidential, the disclosure of which is governed by > applicable law. If the reader of this message is not the intended > recipient, or the employee or agent responsible for delivering it to the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this information is STRICTLY PROHIBITED. >=20 > If you have received this message in error, please notify us immediately > by calling (310) 423-6428 and destroy the related message. Thank You for > your cooperation. |