From: Matt C. <mat...@va...> - 2008-02-13 04:43:36
|
Hi Angel, Angel Pizarro wrote: > On Feb 12, 2008 8:06 PM, Matt Chambers wrote: > > It's reasonable that a user of the format would want to store > structured > information for a limited number of peaks (or store a variable > number of > values in one field, e.g. multidimensional array) so the binary data > might be laid out in a user-defined pattern: > m/z (same precision as main array) > > > errr, I don't get what you mean here. Does this mean that you ran a > peak detection alg and have a much reduced set of data points? If so > this is a new mzML file from the one prior to peak detection. No, I mean that complex, multibyte metadata for data points may only be available for, say, 10% of the total data points (even after peak picking). It would be silly to require the user to store a 19+ byte struct for every peak. Yes, this is a very advanced use case that will probably never be seen, but we can allow it with virtually no drawback. > count of charge assignments (2 bytes) > > > 1 per m/z? again # of array indexes the same as above I think you misunderstood this count. It is the "N" in the following series (represents an array of charge assignments for this peak, just like more than one charge can be assigned to a precursor): > > charge assignment 1 ... charge assignment N (2 bytes each) > > > can be encoded as a separate array for each charge with 0/1 > > > isotope profile ID (4 bytes; unique in a spectrum) > > > I am ignorant of what this is ;) Just something I made up to take up space. I imagined some isotope profile/envelop detection algorithm running on a file and then annotating the discovered isotope profiles in this structure. That information does not fit in a 1:1 relationship (although it could be rearranged to have one ID per peak as a single array, which would meet your desires, and then infer a peak's isotope number by the number of times that ID had been seen in the array). > > > isotope number of peak (2 bytes; monoisotope=0) > > > Also don;t have a clue about this. > > peak label (variable length, 0 terminated) > > > meh.. May be long encode length, but will have the same # of elements > as above, if indeed you were refering to some peak detection alg. > producing the m/z array above. If only a few peaks had labels, most of the peaks would have a single 0 in the label array? As you said, that would be wasteful. > > Allowing the secondary data arrays to have a > different length leaves the format open to user-defined craziness like > this, and I think that's a good thing. Definitely you wouldn't want to > define one of these structures for every data point if you're > dealing of > data with a decent amount of noise! > It comes down to giving the user some flexibility and not imposing unnecessary rigidity in the schema. How much simpler does it really make the schema to make ALL the arrays the same length? Not very much, I think. > > -Matt > > > Angel Pizarro wrote: > > On Feb 12, 2008 5:46 PM, Brian Pratt <bri...@in... > <mailto:bri...@in...> > > <mailto:bri...@in... > <mailto:bri...@in...>>> wrote: > > > > I think that's not quite right - arrayLength needs to remain an > > attribute of > > BinaryDataArray since not all BinaryDataArray elements in a > > spectrum will > > necessarily contain the same number of entries as an mz or > > intensity array, > > > > > > Such as .... ? I asked for examples of this and never got a reply. > > > > Related question, if they are different lengths how would you go > about > > assigning a value to a particular index (or set of indexes) that the > > value refers to in another binary array? > > > > My point is that it would be infinitely easier to just repeat > values > > (MRM transitions values, retention time values, or even nil values) > > that pertain to more than one index so you always have a 1:1 > > correspondence across arrays. Of course I am making the assumption > > that all binary arrays within a single spectrum element are > related to > > each other in some manner, so if this does not hold true, please > > someone tell me, as I am fairly ignorant on the mass spec > acquisition > > modes. > > > > The alternative representation would be coordinate systems and > > multidimensional data arrays /a la/ netcdf or HDF5, but we are too > > far along the route that we have laid out to even consider a change > > this radical. BTW, I did do some mzData (v 1.05) to netcdf > conversion > > and the netcdf files are even bigger, at a gain of built in > index into > > the data arrays within and across spectra. > > > > -angel > > > > > > since not all BinaryDataArray elements are guaranteed (as I > > understand mzML, > > which is but dimly) to be mz or intensity. You'll need to write > > it again as > > an attribute of spectrum, something like mzintPairsCount if you > > don't like > > PeaksCount. > > > > -----Original Message----- > > From: psi...@li... > <mailto:psi...@li...> > > <mailto:psi...@li... > <mailto:psi...@li...>> > > [mailto:psi...@li... > <mailto:psi...@li...> > > <mailto:psi...@li... > <mailto:psi...@li...>>] On Behalf Of > > Eric > > Deutsch > > Sent: Tuesday, February 12, 2008 1:28 PM > > To: Mass spectrometry standard development > > Cc: Eric Deutsch > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > > > So there seems to be broad consensus (4 for 4;) that moving the > > arrayLength up a little higher is a good idea. So instead of: > > > > <spectrum id="S19" scanNumber="19" msLevel="1"> > > <spectrumDescription> > > ... > > </spectrumDescription> > > <binaryDataArray arrayLength="1313" encodedLength="5433" > > dataProcessingRef="Xcalibur Processing"> > > ... > > <binary>AAAAwDsGeUAAAAD...</binary> > > </binaryDataArray> > > <binaryDataArray arrayLength="1313" encodedLength="4892"> > > ... > > <binary>AAAAAIBJxk...</binary> > > </binaryDataArray> > > </spectrum> > > > > We will have: > !!!!!!!!!!!!!!!!!! > > > > <spectrum id="S19" scanNumber="19" msLevel="1" > arrayLength="1313"> > > <spectrumDescription> > > ... > > </spectrumDescription> > > <binaryDataArray encodedLength="5433" > > dataProcessingRef="Xcalibur Processing"> > > ... > > <binary>AAAAwDsGeUAAAAD...</binary> > > </binaryDataArray> > > <binaryDataArray encodedLength="4892"> > > ... > > <binary>AAAAAIBJxk...</binary> > > </binaryDataArray> > > </spectrum> > > > > > > Agreed? > > > > > > > > > -----Original Message----- > > > From: psi...@li... > <mailto:psi...@li...> > > <mailto:psi...@li... > <mailto:psi...@li...>> > > [mailto:psidev-ms-dev- <mailto:psidev-ms-dev-> > <mailto:psidev-ms-dev- <mailto:psidev-ms-dev->> > > > bo...@li... > <mailto:bo...@li...> > > <mailto:bo...@li... > <mailto:bo...@li...>>] On Behalf Of Matthew Chambers > > > Sent: Wednesday, February 06, 2008 10:49 AM > > > To: Mass spectrometry standard development > > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > > > I agree that the primary data arrays should probably be > treated as > > > special in the schema so it's clear that they are paired > values and > > thus > > > peak count could move into the spectrum element or > > spectrumDescription. > > > There should still be options to have additional arrays > that aren't > > the > > > same as the main arrays (for example, an additional set of > > arrays, one > > > for a subset of the m/zs and the other for peak charge > information). > > > > > > -Matt > > > > > > > > > Kessner, Darren E. wrote: > > > > Any other comments regarding <binaryArrayData> lengths? > > > > > > > > > > > >> (from Rune) > > > >> If they have to be equal size, then > > > >> that size ought to be specified in the spectrumDescription. > > > >> > > > > > > > > I agree -- I would like to encode the length in <spectrum> > > somewhere > > > > (either attribute or cvParam) so that: > > > > 1) it's clear that the arrays are of equal size > > > > 2) Readers don't have to peek into the attributes of the > first > > > > <binaryArrayData> to get the info > > > > > > > > I need this right now for the MSData RAMP adapter code, > so I'll > > encode > > > > it as a <userParam> until a decision has been made on the > > specification. > > > > > > > > > > > > Darren > > > > > > > > > > |