From: Mike C. <tu...@gm...> - 2008-01-24 16:14:13
|
For what it's worth, I'd be in favor of removing *all* redundant information from the format (with the possible exception of a checksum). This would include the index, derivable counts, and anything else that can be determined by inspection. The general argument for doing this would be that it would eliminate a whole class of design decisions of the form "What do I if thing A and thing B, which by definition are supposed to be consistent, are not?" It's easy to say "just reject the file", but in reality, we won't be able to do that. That leaves us with writing code to try to correct the inconsistencies, for all of the different ways that they occur across different producers and different versions thereof, and arguably that will be more complex than the code that, for example, does things like build indices in the first place. Mike On Jan 24, 2008 9:47 AM, Brian Pratt <bri...@in...> wrote: > >> 6) Regarding the length attribute in offset, I am neutral on this. > >> This makes it a little harder for the writers. I can see that it > >> would be easier for random access readers. Darren says he's not > >> interested in it. > >> Anyone else out there want to lobby for it? > > Not me. I never felt the need for it in mzXML - it's derivable anyway as > length[n]=offset[n+1]-offset[n], or EOF-offset[n] when n==nmax. Keep it > simple. > > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On Behalf Of Eric > Deutsch > Sent: Wednesday, January 23, 2008 7:40 PM > To: Mass spectrometry standard development > > Cc: Eric Deutsch > Subject: Re: [Psidev-ms-dev] mzML indexing > > > Hi Darren et al. for this discussion. A few points from me: > > 1) It was decided (although not set in stone) that we would like to have > a unique id per spectrum per file for two reasons: > > a) At some point in the future if we have multiple runs per file (not > supported in this first release) it will continue to be true that a > spectrum id must be unique within a file. We decided that external > references from analysisXML, for example, should be to a unique id > rather than a scan number. > > b) We also left the door open to use LSIDs for the id. It can be any > unique string, and thus if someone wants to use LSIDs for this, the door > is open. > > 2) Also, according to the docs, the precursor scan references are by id > rather than by scan number (although this appears to be incorrectly > represented in tiny1.mzML: FIXME). I think the spectrumRef should be to > an id and thus it needs to be in the index for things to work nicely. I > see that Matt disagrees that spectrumRef should be to an id. > > 3) It is true that the current examples only have scanNumber in the > index and not id. This should also be fixed before release. > > 4) The spec document indeed currently says that scanNumbers in the file > must be in ascending order, but not necessarily sequential. The comment > could perhaps be a little more clearly written. > > 5) I also do not see the need for the index attribute. I think it should > be left out, but if there is still a clear need, we could add. > > 6) Regarding the length attribute in offset, I am neutral on this. This > makes it a little harder for the writers. I can see that it would be > easier for random access readers. Darren says he's not interested in it. > Anyone else out there want to lobby for it? > > 7) Regarding going fully attribute: the intent was to preserve the > format of the mzXML index as closely as possible to reduce coding work, > but a better syntax could be entertained. I wouldn't object to: > > <offset scanNumber="19" id="S19" byteOffset="3512"/> > > 8) Regarding enforcing some of these index rules, we should add them to > the validator so that validator will do that. > > Comments on these items? > > Thanks, > Eric > > > > > > From: psi...@li... > [mailto:psidev-ms-dev- > > bo...@li...] On Behalf Of Kessner, Darren E. > > Sent: Wednesday, January 23, 2008 11:19 AM > > To: Mass spectrometry standard development > > Subject: Re: [Psidev-ms-dev] mzML indexing > > > > Right -- that's why I included the alternative, though I could have > been > > less terse: > > > > > > > > "The alternative is to require that the <index> entries are written in > the > > same order as the <spectrumList> entries." > > > > > > > > I don't know if there is a way to enforce this... > > > > > > > > > > > > Darren > > > > > > > > > > > > ________________________________ > > > > From: psi...@li... > [mailto:psidev-ms-dev- > > bo...@li...] On Behalf Of Brian Pratt > > Sent: Wednesday, January 23, 2008 11:08 AM > > To: 'Mass spectrometry standard development' > > Subject: Re: [Psidev-ms-dev] mzML indexing > > > > > > > > Hi Darren, > > > > > > > > I wonder about this possibility: > > > > > > > > <index name="spectrum" > > > > > <offset index="0" scanNumber="19" id="S19">3512</offset> > > > > <offset index="2" scanNumber="23" id="S23">16217</offset> > > > > <offset index="4" scanNumber="25" id="S25">17258</offset> > > > > ... > > > > </index> > > > > > > > > If the response is "well, that's not legal, the index values must > increase > > in increments of 1 starting from 0" then I don't see why it's needed > in > > the first place - I'd expect that the index would just get snarfed up > into > > an array and you'd access the nth element to get info on the nth scan > > appearing in the file. And if it is legal then I don't see what it's > for... > > > > > > > > Brian > > > > > > > > ________________________________ > > > > From: psi...@li... > [mailto:psidev-ms-dev- > > bo...@li...] On Behalf Of Kessner, Darren E. > > Sent: Wednesday, January 23, 2008 10:33 AM > > To: Mass spectrometry standard development > > Subject: [Psidev-ms-dev] mzML indexing > > > > > > > > Hi all, > > > > > > > > There are three ways to refer to a <spectrum> element -- by zero-based > > index into the <spectrumList>, by 'scanNumber', and by 'id'. However, > the > > <index> currently only contains scanNumber. I would like to encode > the > > zero-based index and the id as well in the <index> as follows: > > > > > > > > <index name="spectrum" > > > > > <offset index="0" scanNumber="19" id="S19">3512</offset> > > > > <offset index="1" scanNumber="20" id="S20">16217</offset> > > > > ... > > > > </index> > > > > > > > > Including the zero-based index is important to enable random access to > the > > mzML file when you don't know what scan numbers are contained in the > file. > > The alternative is to require that the <index> entries are written in > the > > same order as the <spectrumList> entries. > > > > > > > > Including the 'id' in the <index> entries is necessary for efficiently > > dereferencing a "spectrumRef" (e.g. in <precursor> element). Without > > this, a dereference requires reading through the <spectrumList> to > find > > the right 'id'. This info could be read once and cached, but this > still > > defeats the purpose of indexing. > > > > > > > > > > > > Darren > > > > > > > > > > > > > > > > Darren Kessner > > > > Scientific Programmer > > > > Dar...@cs... > > > > 310-423-9538 > > > > > > > > Spielberg Family Center for Applied Proteomics > > > > Cedars-Sinai Medical Center > > > > http://www.sfcap.cshs.org/ > > > > > > > > > > > > IMPORTANT WARNING: This message is intended for the use of the person > or > > entity to which it is addressed and may contain information that is > > privileged and confidential, the disclosure of which is governed by > > applicable law. If the reader of this message is not the intended > > recipient, or the employee or agent responsible for delivering it to > the > > intended recipient, you are hereby notified that any dissemination, > > distribution or copying of this information is STRICTLY PROHIBITED. > > > > If you have received this message in error, please notify us > immediately > > by calling (310) 423-6428 and destroy the related message. Thank You > for > > your cooperation. > > > > IMPORTANT WARNING: This message is intended for the use of the person > or > > entity to which it is addressed and may contain information that is > > privileged and confidential, the disclosure of which is governed by > > applicable law. If the reader of this message is not the intended > > recipient, or the employee or agent responsible for delivering it to > the > > intended recipient, you are hereby notified that any dissemination, > > distribution or copying of this information is STRICTLY PROHIBITED. > > > > If you have received this message in error, please notify us > immediately > > by calling (310) 423-6428 and destroy the related message. Thank You > for > > your cooperation. > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |