From: Eric D. <ede...@sy...> - 2008-01-31 19:28:51
|
Hi everyone, thank you for the discussion on this thread. I summarize what I read the general consensus opinion to be (acknowledged not to be unanimous) as follows. If you think I have not understood the consensus properly, please let me know. > From: Eric Deutsch >=20 > Hi Darren et al. for this discussion. A few points from me: >=20 > 1) It was decided (although not set in stone) that we would like to have a > unique id per spectrum per file for two reasons: Yes. > a) At some point in the future if we have multiple runs per file (not > supported in this first release) it will continue to be true that a > spectrum id must be unique within a file. We decided that external > references from analysisXML, for example, should be to a unique id rather > than a scan number. >=20 > b) We also left the door open to use LSIDs for the id. It can be any > unique string, and thus if someone wants to use LSIDs for this, the door > is open. >=20 > 2) Also, according to the docs, the precursor scan references are by id > rather than by scan number (although this appears to be incorrectly > represented in tiny1.mzML: FIXME). I think the spectrumRef should be to an > id and thus it needs to be in the index for things to work nicely. I see > that Matt disagrees that spectrumRef should be to an id. Yes, use id instead of scanNumber > 3) It is true that the current examples only have scanNumber in the index > and not id. This should also be fixed before release. Yes, fix. > 4) The spec document indeed currently says that scanNumbers in the file > must be in ascending order, but not necessarily sequential. The comment > could perhaps be a little more clearly written. Confirmed. Improve documentation. > 5) I also do not see the need for the index attribute. I think it should > be left out, but if there is still a clear need, we could add. No index attribute. > 6) Regarding the length attribute in offset, I am neutral on this. This > makes it a little harder for the writers. I can see that it would be > easier for random access readers. Darren says he's not interested in it. > Anyone else out there want to lobby for it? No length attribute. This can be easily inferred from the data. There will be no other polluting elements intermingled with <spectrum>. > 7) Regarding going fully attribute: the intent was to preserve the format > of the mzXML index as closely as possible to reduce coding work, but a > better syntax could be entertained. I wouldn't object to: >=20 > <offset scanNumber=3D"19" id=3D"S19" byteOffset=3D"3512"/> No obvious support for this, so retain current syntax: <offset scanNumber=3D"19" id=3D"S19">3512</offset> > 8) Regarding enforcing some of these index rules, we should add them to > the validator so that validator will do that. Yes. > Comments on these items? >=20 > Thanks, > Eric >=20 >=20 >=20 >=20 > > From: psi...@li... [mailto:psidev-ms-dev- > > bo...@li...] On Behalf Of Kessner, Darren E. > > Sent: Wednesday, January 23, 2008 11:19 AM > > To: Mass spectrometry standard development > > Subject: Re: [Psidev-ms-dev] mzML indexing > > > > Right -- that's why I included the alternative, though I could have been > > less terse: > > > > > > > > "The alternative is to require that the <index> entries are written in > the > > same order as the <spectrumList> entries." > > > > > > > > I don't know if there is a way to enforce this... > > > > > > > > > > > > Darren > > > > > > > > > > > > ________________________________ > > > > From: psi...@li... [mailto:psidev-ms-dev- > > bo...@li...] On Behalf Of Brian Pratt > > Sent: Wednesday, January 23, 2008 11:08 AM > > To: 'Mass spectrometry standard development' > > Subject: Re: [Psidev-ms-dev] mzML indexing > > > > > > > > Hi Darren, > > > > > > > > I wonder about this possibility: > > > > > > > > <index name=3D"spectrum" > > > > > <offset index=3D"0" scanNumber=3D"19" id=3D"S19">3512</offset> > > > > <offset index=3D"2" scanNumber=3D"23" id=3D"S23">16217</offset> > > > > <offset index=3D"4" scanNumber=3D"25" id=3D"S25">17258</offset> > > > > ... > > > > </index> > > > > > > > > If the response is "well, that's not legal, the index values must > increase > > in increments of 1 starting from 0" then I don't see why it's needed in > > the first place - I'd expect that the index would just get snarfed up > into > > an array and you'd access the nth element to get info on the nth scan > > appearing in the file. And if it is legal then I don't see what it's > for... > > > > > > > > Brian > > > > > > > > ________________________________ > > > > From: psi...@li... [mailto:psidev-ms-dev- > > bo...@li...] On Behalf Of Kessner, Darren E. > > Sent: Wednesday, January 23, 2008 10:33 AM > > To: Mass spectrometry standard development > > Subject: [Psidev-ms-dev] mzML indexing > > > > > > > > Hi all, > > > > > > > > There are three ways to refer to a <spectrum> element -- by zero-based > > index into the <spectrumList>, by 'scanNumber', and by 'id'. However, > the > > <index> currently only contains scanNumber. I would like to encode the > > zero-based index and the id as well in the <index> as follows: > > > > > > > > <index name=3D"spectrum" > > > > > <offset index=3D"0" scanNumber=3D"19" id=3D"S19">3512</offset> > > > > <offset index=3D"1" scanNumber=3D"20" id=3D"S20">16217</offset> > > > > ... > > > > </index> > > > > > > > > Including the zero-based index is important to enable random access to > the > > mzML file when you don't know what scan numbers are contained in the > file. > > The alternative is to require that the <index> entries are written in > the > > same order as the <spectrumList> entries. > > > > > > > > Including the 'id' in the <index> entries is necessary for efficiently > > dereferencing a "spectrumRef" (e.g. in <precursor> element). Without > > this, a dereference requires reading through the <spectrumList> to find > > the right 'id'. This info could be read once and cached, but this still > > defeats the purpose of indexing. > > > > > > > > > > > > Darren > > > > > > > > > > > > > > > > Darren Kessner > > > > Scientific Programmer > > > > Dar...@cs... > > > > 310-423-9538 > > > > > > > > Spielberg Family Center for Applied Proteomics > > > > Cedars-Sinai Medical Center > > > > http://www.sfcap.cshs.org/ > > > > > > > > > > > > IMPORTANT WARNING: This message is intended for the use of the person or > > entity to which it is addressed and may contain information that is > > privileged and confidential, the disclosure of which is governed by > > applicable law. If the reader of this message is not the intended > > recipient, or the employee or agent responsible for delivering it to the > > intended recipient, you are hereby notified that any dissemination, > > distribution or copying of this information is STRICTLY PROHIBITED. > > > > If you have received this message in error, please notify us immediately > > by calling (310) 423-6428 and destroy the related message. Thank You for > > your cooperation. > > > > IMPORTANT WARNING: This message is intended for the use of the person or > > entity to which it is addressed and may contain information that is > > privileged and confidential, the disclosure of which is governed by > > applicable law. If the reader of this message is not the intended > > recipient, or the employee or agent responsible for delivering it to the > > intended recipient, you are hereby notified that any dissemination, > > distribution or copying of this information is STRICTLY PROHIBITED. > > > > If you have received this message in error, please notify us immediately > > by calling (310) 423-6428 and destroy the related message. Thank You for > > your cooperation. |