From: Matthew C. <mat...@va...> - 2008-02-19 20:54:21
|
There is no requirement that an ascending lexicographical sort would produce the same order that a sort by ascending scanNumber would (that would be far too restrictive), but there is a requirement that the spectrum elements in the file be stored in ascending order by scanNumber. We had quite a bit of discussion in the teleconference today about removing scanNumber as a primary key and replacing it with an index attribute; such an attribute would probably have different semantics (0-based and contiguous). I think Eric is probably preparing to post a good summary of the discussion. My proposal for positiveInteger for scanNumber was accepted but then rendered moot by the proposal to get rid of the scanNumber attribute. ;) "positiveInteger" has no schematic upper limit, so perhaps if we switch to an index attribute we should make it "unsignedLong" (schematically defined as a 64-bit unsigned integer). -Matt Coleman, Michael wrote: > Is there a requirement that scanNumber and id be "co-ordered" (if I sort > a file's spectra by one of these keys, the other key will then > necessarily also be sorted)? > > Is there a requirement that the spectra in all mzML files be ordered by > one or both of these keys? > > If scanNumber is being used for ordering in one of these ways, I agree > that lexicographic ordering should be specified. If not, I'm wondering > whether there is any other reason for specifying the ordering. > > > If scanNumbers were specified to be contiguous, I'd say we ought to > allow 0 as a scan number, since essentially all modern programming > languages use 0-based arrays. But if I understand correctly, > scanNumbers need not be contiguous (and thus programmers should not > assume that they can be directly used for array indexing). > > Are scan numbers up to at least 2**62 or so allowed, to prepare for the > coming ten-billion-spectrum runs? :-) > > > Mike > > > > >> -----Original Message----- >> From: psi...@li... >> [mailto:psi...@li...] On >> Behalf Of Matthew Chambers >> Sent: Tuesday, February 19, 2008 10:07 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] Unique scan numbers >> >> >> Hi Michael, >> >> As it currently stands, both scanNumber and id are unique keys to a >> spectrum - they need not be combined to create a unique key. Id is a >> string and as such should be compared on a lexicographical basis (if >> that isn't stated in the spec, it should be), and scanNumber >> is an integer: >> <xs:attribute name="scanNumber" type="xs:int" use="required"> >> >> By the way, I think we should change that type to be >> xs:positiveInteger >> so that the range is schematically limited to [1-infinity). 0 >> shouldn't >> be a valid scan number (if 0 is allowed then Michael's point >> about the >> -0 and 0 issue should be addressed, although that might be >> done by the >> XML Schema specification). >> >> -Matt >> >> >> Coleman, Michael wrote: >> >>> I don't understand the issues involved in this particular >>> >> question, but >> >>> it reminds me of this key requirement: >>> >>> - There has to be a way of generating a unique key for each spectrum >>> (i.e., unique across all spectra in the file) that will work for all >>> mzML files. >>> >>> In the example below, it looks like that key is the 2-tuple "(id, >>> scanNumber)". (Whatever the key is, it should be specified >>> >> as such in >> >>> the standard.) >>> >>> >>> If the key includes any numeric fields, it needs to be >>> >> specified whether >> >>> or not (say) "0010" is equal to "10", whether or not "1.0" >>> >> is equal to >> >>> "1", and whether or not "-0" is equal to "0". Hopefully >>> >> either (a) the >> >>> former is simply disallowed in all of these cases or (b) >>> >> all fields are >> >>> to be treated as strings, rather than numbers, and >>> >> comparison done on >> >>> that basis. >>> >>> Mike >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: psi...@li... >>>> [mailto:psi...@li...] On >>>> Behalf Of Matthew Chambers >>>> Sent: Tuesday, February 19, 2008 9:37 AM >>>> To: Mass spectrometry standard development >>>> Subject: Re: [Psidev-ms-dev] Unique scan numbers >>>> >>>> >>>> How do you feel about generating arbitrary unique scan >>>> numbers and then >>>> using the id attribute to preserve the original filename and >>>> scan number: >>>> <spectrum id="function1.1" scanNumber="1" ...> >>>> <spectrum id="function1.2" scanNumber="2" ...> >>>> ... >>>> <spectrum id="function2.1" scanNumber="100" ...> >>>> <spectrum id="function2.2" scanNumber="101" ...> >>>> ... >>>> >>>> Or probably more intuitive would be to store the parallel spectra >>>> sequentially (assuming that the same scan number from each >>>> function is >>>> correlated): >>>> <spectrum id="function1.1" scanNumber="1" ...> >>>> <spectrum id="function2.1" scanNumber="2" ...> >>>> ... >>>> <spectrum id="function1.2" scanNumber="100" ...> >>>> <spectrum id="function2.2" scanNumber="101" ...> >>>> ... >>>> >>>> It's either that or store each function in a separate mzML >>>> file, because >>>> mzML doesn't support multiple runs in the same file. >>>> >>>> -Matt >>>> >>>> >>>> Fredrik Levander wrote: >>>> >>>> >>>>> Hi All, >>>>> >>>>> In QTOF files from Waters with mixed MS1 and MS2 data we >>>>> >>>>> >>>> have several >>>> >>>> >>>>> parallel 'functions' with data being recorded into separate >>>>> >>>>> >>>> files. The >>>> >>>> >>>>> scan numbers are only unique within each function. In the >>>>> >> raw data >> >>>>> folder we thus have several different spectra with the same >>>>> >>>>> >>>> scan number >>>> >>>> >>>>> (but different source files). When converting this into an >>>>> >>>>> >>>> mzML file it >>>> >>>> >>>>> would be good to keep the original scan numbers which are >>>>> >>>>> >>>> useful for >>>> >>>> >>>>> traceability, but to generate unique spectrum ids. I thus >>>>> >>>>> >>>> propose that >>>> >>>> >>>>> the requirement for unique scanNumbers within an mzML file >>>>> >>>>> >>>> is removed. >>>> >>>> >>>>> However, spectra should not be repeated within the file, so >>>>> >>>>> >>>> this would >>>> >>>> >>>>> NOT be applicable to the dta to mzML conversion use case. >>>>> Would such a change generate problems for the readers? >>>>> How is this solved in MassWolf? >>>>> >>>>> >>>>> Regards >>>>> >>>>> Fredrik >>>>> >>>>> >>>>> >>>>> >>>> -------------------------------------------------------------- >>>> ----------- >>>> >>>> >>>>> This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> Psidev-ms-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>>> >>>>> >>>>> >>>>> >>>> -------------------------------------------------------------- >>>> ----------- >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Psidev-ms-dev mailing list >>>> Psi...@li... >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>> >>>> >>>> >>> >> -------------------------------------------------------------- >> ----------- >> >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >>> >> -------------------------------------------------------------- >> ----------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |