From: Randy J. <rkj...@in...> - 2008-02-20 02:21:29
|
Based on today's discussion, I think the goal people had for scanNumber is better achieved using the acqNumber element. Actually, what people seemed to want was a place to put the Thermo-specific 'scan number' which allows you to go back to a raw file and see the scan in the vendor file. There is no reason why you couldn't put the scan number from Thermo as an 'index' (the acqNumber element would still allow a more accurate representation of the source of the spectrum, since it can handle the summation or averaging of spectra), and having a 'positiveInteger' for this makes great sense. The value of using a scan number as an index comes from the problem of trying to order spectra from non-LC experiments where acquisition time is meaningless. Thermo's 'scan number' works great for this, and we need something like this for the other vendor formats too. We should have more discussion on this, but it will probably help to have some examples - which should be available from the group soon. Randy -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Matthew Chambers Sent: Tuesday, February 19, 2008 3:54 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] Unique scan numbers There is no requirement that an ascending lexicographical sort would produce the same order that a sort by ascending scanNumber would (that would be far too restrictive), but there is a requirement that the spectrum elements in the file be stored in ascending order by scanNumber. We had quite a bit of discussion in the teleconference today about removing scanNumber as a primary key and replacing it with an index attribute; such an attribute would probably have different semantics (0-based and contiguous). I think Eric is probably preparing to post a good summary of the discussion. My proposal for positiveInteger for scanNumber was accepted but then rendered moot by the proposal to get rid of the scanNumber attribute. ;) "positiveInteger" has no schematic upper limit, so perhaps if we switch to an index attribute we should make it "unsignedLong" (schematically defined as a 64-bit unsigned integer). -Matt Coleman, Michael wrote: > Is there a requirement that scanNumber and id be "co-ordered" (if I sort > a file's spectra by one of these keys, the other key will then > necessarily also be sorted)? > > Is there a requirement that the spectra in all mzML files be ordered by > one or both of these keys? > > If scanNumber is being used for ordering in one of these ways, I agree > that lexicographic ordering should be specified. If not, I'm wondering > whether there is any other reason for specifying the ordering. > > > If scanNumbers were specified to be contiguous, I'd say we ought to > allow 0 as a scan number, since essentially all modern programming > languages use 0-based arrays. But if I understand correctly, > scanNumbers need not be contiguous (and thus programmers should not > assume that they can be directly used for array indexing). > > Are scan numbers up to at least 2**62 or so allowed, to prepare for the > coming ten-billion-spectrum runs? :-) > > > Mike > > > > >> -----Original Message----- >> From: psi...@li... >> [mailto:psi...@li...] On >> Behalf Of Matthew Chambers >> Sent: Tuesday, February 19, 2008 10:07 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] Unique scan numbers >> >> >> Hi Michael, >> >> As it currently stands, both scanNumber and id are unique keys to a >> spectrum - they need not be combined to create a unique key. Id is a >> string and as such should be compared on a lexicographical basis (if >> that isn't stated in the spec, it should be), and scanNumber >> is an integer: >> <xs:attribute name="scanNumber" type="xs:int" use="required"> >> >> By the way, I think we should change that type to be >> xs:positiveInteger >> so that the range is schematically limited to [1-infinity). 0 >> shouldn't >> be a valid scan number (if 0 is allowed then Michael's point >> about the >> -0 and 0 issue should be addressed, although that might be >> done by the >> XML Schema specification). >> >> -Matt >> >> >> Coleman, Michael wrote: >> >>> I don't understand the issues involved in this particular >>> >> question, but >> >>> it reminds me of this key requirement: >>> >>> - There has to be a way of generating a unique key for each spectrum >>> (i.e., unique across all spectra in the file) that will work for all >>> mzML files. >>> >>> In the example below, it looks like that key is the 2-tuple "(id, >>> scanNumber)". (Whatever the key is, it should be specified >>> >> as such in >> >>> the standard.) >>> >>> >>> If the key includes any numeric fields, it needs to be >>> >> specified whether >> >>> or not (say) "0010" is equal to "10", whether or not "1.0" >>> >> is equal to >> >>> "1", and whether or not "-0" is equal to "0". Hopefully >>> >> either (a) the >> >>> former is simply disallowed in all of these cases or (b) >>> >> all fields are >> >>> to be treated as strings, rather than numbers, and >>> >> comparison done on >> >>> that basis. >>> >>> Mike >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: psi...@li... >>>> [mailto:psi...@li...] On >>>> Behalf Of Matthew Chambers >>>> Sent: Tuesday, February 19, 2008 9:37 AM >>>> To: Mass spectrometry standard development >>>> Subject: Re: [Psidev-ms-dev] Unique scan numbers >>>> >>>> >>>> How do you feel about generating arbitrary unique scan >>>> numbers and then >>>> using the id attribute to preserve the original filename and >>>> scan number: >>>> <spectrum id="function1.1" scanNumber="1" ...> >>>> <spectrum id="function1.2" scanNumber="2" ...> >>>> ... >>>> <spectrum id="function2.1" scanNumber="100" ...> >>>> <spectrum id="function2.2" scanNumber="101" ...> >>>> ... >>>> >>>> Or probably more intuitive would be to store the parallel spectra >>>> sequentially (assuming that the same scan number from each >>>> function is >>>> correlated): >>>> <spectrum id="function1.1" scanNumber="1" ...> >>>> <spectrum id="function2.1" scanNumber="2" ...> >>>> ... >>>> <spectrum id="function1.2" scanNumber="100" ...> >>>> <spectrum id="function2.2" scanNumber="101" ...> >>>> ... >>>> >>>> It's either that or store each function in a separate mzML >>>> file, because >>>> mzML doesn't support multiple runs in the same file. >>>> >>>> -Matt >>>> >>>> >>>> Fredrik Levander wrote: >>>> >>>> >>>>> Hi All, >>>>> >>>>> In QTOF files from Waters with mixed MS1 and MS2 data we >>>>> >>>>> >>>> have several >>>> >>>> >>>>> parallel 'functions' with data being recorded into separate >>>>> >>>>> >>>> files. The >>>> >>>> >>>>> scan numbers are only unique within each function. In the >>>>> >> raw data >> >>>>> folder we thus have several different spectra with the same >>>>> >>>>> >>>> scan number >>>> >>>> >>>>> (but different source files). When converting this into an >>>>> >>>>> >>>> mzML file it >>>> >>>> >>>>> would be good to keep the original scan numbers which are >>>>> >>>>> >>>> useful for >>>> >>>> >>>>> traceability, but to generate unique spectrum ids. I thus >>>>> >>>>> >>>> propose that >>>> >>>> >>>>> the requirement for unique scanNumbers within an mzML file >>>>> >>>>> >>>> is removed. >>>> >>>> >>>>> However, spectra should not be repeated within the file, so >>>>> >>>>> >>>> this would >>>> >>>> >>>>> NOT be applicable to the dta to mzML conversion use case. >>>>> Would such a change generate problems for the readers? >>>>> How is this solved in MassWolf? >>>>> >>>>> >>>>> Regards >>>>> >>>>> Fredrik >>>>> >>>>> >>>>> >>>>> >>>> -------------------------------------------------------------- >>>> ----------- >>>> >>>> >>>>> This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> Psidev-ms-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>>> >>>>> >>>>> >>>>> >>>> -------------------------------------------------------------- >>>> ----------- >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Psidev-ms-dev mailing list >>>> Psi...@li... >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>> >>>> >>>> >>> >> -------------------------------------------------------------- >> ----------- >> >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >>> >> -------------------------------------------------------------- >> ----------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |