From: Brian P. <bri...@in...> - 2008-01-24 17:05:31
|
Assuming you're doing SAX (streaming) style parsing, grabbing a little extra text after the stuff you want won't offend the parser as long as you stop parsing at the tag that balances the one opening the fragment, which is what you'd do anyway. If you try to use a DOM style parser on a fragment, even an exactly balanced one, I suspect there would be trouble since it lacks the anticipated root structure. But nobody uses DOM parsers with mzXML/mzdata/mzML anyway, streaming just makes more sense with files this size. So, no worries! I think. Certainly it hasn't been a problem in mzXML world. Brian -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Matthew Chambers Sent: Thursday, January 24, 2008 8:04 AM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] mzML indexing Brain, you're assuming that no junk (comments, user params, whatever) is allowed between the spectrum elements. Otherwise you'd get a fragment of XML that isn't very parser friendly (it would be multi-rooted). Is the no-junk assumption safe? As for the spectrumRef, unless we plan to support referencing a spectrum outside the current run as a precursor, the scan number makes more sense than the id. If we DO plan to support that, I would support referencing the id instead, but it would definitely need to go into the index at that point. -Matt Brian Pratt wrote: >>> 6) Regarding the length attribute in offset, I am neutral on this. >>> This makes it a little harder for the writers. I can see that it >>> would be easier for random access readers. Darren says he's not >>> interested in it. >>> Anyone else out there want to lobby for it? >>> > > Not me. I never felt the need for it in mzXML - it's derivable anyway as > length[n]=offset[n+1]-offset[n], or EOF-offset[n] when n==nmax. Keep it > simple. > > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On Behalf Of Eric > Deutsch > Sent: Wednesday, January 23, 2008 7:40 PM > To: Mass spectrometry standard development > Cc: Eric Deutsch > Subject: Re: [Psidev-ms-dev] mzML indexing > > > Hi Darren et al. for this discussion. A few points from me: > > 1) It was decided (although not set in stone) that we would like to have > a unique id per spectrum per file for two reasons: > > a) At some point in the future if we have multiple runs per file (not > supported in this first release) it will continue to be true that a > spectrum id must be unique within a file. We decided that external > references from analysisXML, for example, should be to a unique id > rather than a scan number. > > b) We also left the door open to use LSIDs for the id. It can be any > unique string, and thus if someone wants to use LSIDs for this, the door > is open. > > 2) Also, according to the docs, the precursor scan references are by id > rather than by scan number (although this appears to be incorrectly > represented in tiny1.mzML: FIXME). I think the spectrumRef should be to > an id and thus it needs to be in the index for things to work nicely. I > see that Matt disagrees that spectrumRef should be to an id. > > 3) It is true that the current examples only have scanNumber in the > index and not id. This should also be fixed before release. > > 4) The spec document indeed currently says that scanNumbers in the file > must be in ascending order, but not necessarily sequential. The comment > could perhaps be a little more clearly written. > > 5) I also do not see the need for the index attribute. I think it should > be left out, but if there is still a clear need, we could add. > > 6) Regarding the length attribute in offset, I am neutral on this. This > makes it a little harder for the writers. I can see that it would be > easier for random access readers. Darren says he's not interested in it. > Anyone else out there want to lobby for it? > > 7) Regarding going fully attribute: the intent was to preserve the > format of the mzXML index as closely as possible to reduce coding work, > but a better syntax could be entertained. I wouldn't object to: > > <offset scanNumber="19" id="S19" byteOffset="3512"/> > > 8) Regarding enforcing some of these index rules, we should add them to > the validator so that validator will do that. > > Comments on these items? > > Thanks, > Eric > > > > > >> From: psi...@li... >> > [mailto:psidev-ms-dev- > >> bo...@li...] On Behalf Of Kessner, Darren E. >> Sent: Wednesday, January 23, 2008 11:19 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] mzML indexing >> >> Right -- that's why I included the alternative, though I could have >> > been > >> less terse: >> >> >> >> "The alternative is to require that the <index> entries are written in >> > the > >> same order as the <spectrumList> entries." >> >> >> >> I don't know if there is a way to enforce this... >> >> >> >> >> >> Darren >> >> >> >> >> >> ________________________________ >> >> From: psi...@li... >> > [mailto:psidev-ms-dev- > >> bo...@li...] On Behalf Of Brian Pratt >> Sent: Wednesday, January 23, 2008 11:08 AM >> To: 'Mass spectrometry standard development' >> Subject: Re: [Psidev-ms-dev] mzML indexing >> >> >> >> Hi Darren, >> >> >> >> I wonder about this possibility: >> >> >> >> <index name="spectrum" > >> >> <offset index="0" scanNumber="19" id="S19">3512</offset> >> >> <offset index="2" scanNumber="23" id="S23">16217</offset> >> >> <offset index="4" scanNumber="25" id="S25">17258</offset> >> >> ... >> >> </index> >> >> >> >> If the response is "well, that's not legal, the index values must >> > increase > >> in increments of 1 starting from 0" then I don't see why it's needed >> > in > >> the first place - I'd expect that the index would just get snarfed up >> > into > >> an array and you'd access the nth element to get info on the nth scan >> appearing in the file. And if it is legal then I don't see what it's >> > for... > >> >> Brian >> >> >> >> ________________________________ >> >> From: psi...@li... >> > [mailto:psidev-ms-dev- > >> bo...@li...] On Behalf Of Kessner, Darren E. >> Sent: Wednesday, January 23, 2008 10:33 AM >> To: Mass spectrometry standard development >> Subject: [Psidev-ms-dev] mzML indexing >> >> >> >> Hi all, >> >> >> >> There are three ways to refer to a <spectrum> element -- by zero-based >> index into the <spectrumList>, by 'scanNumber', and by 'id'. However, >> > the > >> <index> currently only contains scanNumber. I would like to encode >> > the > >> zero-based index and the id as well in the <index> as follows: >> >> >> >> <index name="spectrum" > >> >> <offset index="0" scanNumber="19" id="S19">3512</offset> >> >> <offset index="1" scanNumber="20" id="S20">16217</offset> >> >> ... >> >> </index> >> >> >> >> Including the zero-based index is important to enable random access to >> > the > >> mzML file when you don't know what scan numbers are contained in the >> > file. > >> The alternative is to require that the <index> entries are written in >> > the > >> same order as the <spectrumList> entries. >> >> >> >> Including the 'id' in the <index> entries is necessary for efficiently >> dereferencing a "spectrumRef" (e.g. in <precursor> element). Without >> this, a dereference requires reading through the <spectrumList> to >> > find > >> the right 'id'. This info could be read once and cached, but this >> > still > >> defeats the purpose of indexing. >> >> >> >> >> >> Darren >> >> >> >> >> >> >> >> Darren Kessner >> >> Scientific Programmer >> >> Dar...@cs... >> >> 310-423-9538 >> >> >> >> Spielberg Family Center for Applied Proteomics >> >> Cedars-Sinai Medical Center >> >> http://www.sfcap.cshs.org/ >> >> >> >> >> >> IMPORTANT WARNING: This message is intended for the use of the person >> > or > >> entity to which it is addressed and may contain information that is >> privileged and confidential, the disclosure of which is governed by >> applicable law. If the reader of this message is not the intended >> recipient, or the employee or agent responsible for delivering it to >> > the > >> intended recipient, you are hereby notified that any dissemination, >> distribution or copying of this information is STRICTLY PROHIBITED. >> >> If you have received this message in error, please notify us >> > immediately > >> by calling (310) 423-6428 and destroy the related message. Thank You >> > for > >> your cooperation. >> >> IMPORTANT WARNING: This message is intended for the use of the person >> > or > >> entity to which it is addressed and may contain information that is >> privileged and confidential, the disclosure of which is governed by >> applicable law. If the reader of this message is not the intended >> recipient, or the employee or agent responsible for delivering it to >> > the > >> intended recipient, you are hereby notified that any dissemination, >> distribution or copying of this information is STRICTLY PROHIBITED. >> >> If you have received this message in error, please notify us >> > immediately > >> by calling (310) 423-6428 and destroy the related message. Thank You >> > for > >> your cooperation. >> > > ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |