From: Angel P. <an...@ma...> - 2008-02-13 02:41:11
|
On Feb 12, 2008 8:06 PM, Matt Chambers <mat...@va...> wrote: > It's reasonable that a user of the format would want to store structured > information for a limited number of peaks (or store a variable number of > values in one field, e.g. multidimensional array) so the binary data > might be laid out in a user-defined pattern: > m/z (same precision as main array) > errr, I don't get what you mean here. Does this mean that you ran a peak detection alg and have a much reduced set of data points? If so this is a new mzML file from the one prior to peak detection. count of charge assignments (2 bytes) > 1 per m/z? again # of array indexes the same as above charge assignment 1 ... charge assignment N (2 bytes each) can be encoded as a separate array for each charge with 0/1 > > isotope profile ID (4 bytes; unique in a spectrum) > I am ignorant of what this is ;) > isotope number of peak (2 bytes; monoisotope=0) > Also don;t have a clue about this. > peak label (variable length, 0 terminated) > meh.. May be long encode length, but will have the same # of elements as above, if indeed you were refering to some peak detection alg. producing the m/z array above. > > This structure would still be binary and base64 encoded, so netcdf et > al. are not necessary. Sorry to not be clear. I was not putting forth a use of netcdf , just that they have an alternate binary storage scheme that mzML that is coordinate based, hence the n-dimensional data can indeed have different lengths per axis. Allowing the secondary data arrays to have a > different length leaves the format open to user-defined craziness like > this, and I think that's a good thing. Definitely you wouldn't want to > define one of these structures for every data point if you're dealing of > data with a decent amount of noise! > > -Matt > > > Angel Pizarro wrote: > > On Feb 12, 2008 5:46 PM, Brian Pratt <bri...@in... > > <mailto:bri...@in...>> wrote: > > > > I think that's not quite right - arrayLength needs to remain an > > attribute of > > BinaryDataArray since not all BinaryDataArray elements in a > > spectrum will > > necessarily contain the same number of entries as an mz or > > intensity array, > > > > > > Such as .... ? I asked for examples of this and never got a reply. > > > > Related question, if they are different lengths how would you go about > > assigning a value to a particular index (or set of indexes) that the > > value refers to in another binary array? > > > > My point is that it would be infinitely easier to just repeat values > > (MRM transitions values, retention time values, or even nil values) > > that pertain to more than one index so you always have a 1:1 > > correspondence across arrays. Of course I am making the assumption > > that all binary arrays within a single spectrum element are related to > > each other in some manner, so if this does not hold true, please > > someone tell me, as I am fairly ignorant on the mass spec acquisition > > modes. > > > > The alternative representation would be coordinate systems and > > multidimensional data arrays /a la/ netcdf or HDF5, but we are too > > far along the route that we have laid out to even consider a change > > this radical. BTW, I did do some mzData (v 1.05) to netcdf conversion > > and the netcdf files are even bigger, at a gain of built in index into > > the data arrays within and across spectra. > > > > -angel > > > > > > since not all BinaryDataArray elements are guaranteed (as I > > understand mzML, > > which is but dimly) to be mz or intensity. You'll need to write > > it again as > > an attribute of spectrum, something like mzintPairsCount if you > > don't like > > PeaksCount. > > > > -----Original Message----- > > From: psi...@li... > > <mailto:psi...@li...> > > [mailto:psi...@li... > > <mailto:psi...@li...>] On Behalf Of > > Eric > > Deutsch > > Sent: Tuesday, February 12, 2008 1:28 PM > > To: Mass spectrometry standard development > > Cc: Eric Deutsch > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > > > So there seems to be broad consensus (4 for 4;) that moving the > > arrayLength up a little higher is a good idea. So instead of: > > > > <spectrum id="S19" scanNumber="19" msLevel="1"> > > <spectrumDescription> > > ... > > </spectrumDescription> > > <binaryDataArray arrayLength="1313" encodedLength="5433" > > dataProcessingRef="Xcalibur Processing"> > > ... > > <binary>AAAAwDsGeUAAAAD...</binary> > > </binaryDataArray> > > <binaryDataArray arrayLength="1313" encodedLength="4892"> > > ... > > <binary>AAAAAIBJxk...</binary> > > </binaryDataArray> > > </spectrum> > > > > We will have: !!!!!!!!!!!!!!!!!! > > > > <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> > > <spectrumDescription> > > ... > > </spectrumDescription> > > <binaryDataArray encodedLength="5433" > > dataProcessingRef="Xcalibur Processing"> > > ... > > <binary>AAAAwDsGeUAAAAD...</binary> > > </binaryDataArray> > > <binaryDataArray encodedLength="4892"> > > ... > > <binary>AAAAAIBJxk...</binary> > > </binaryDataArray> > > </spectrum> > > > > > > Agreed? > > > > > > > > > -----Original Message----- > > > From: psi...@li... > > <mailto:psi...@li...> > > [mailto:psidev-ms-dev- <mailto:psidev-ms-dev-> > > > bo...@li... > > <mailto:bo...@li...>] On Behalf Of Matthew > Chambers > > > Sent: Wednesday, February 06, 2008 10:49 AM > > > To: Mass spectrometry standard development > > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > > > I agree that the primary data arrays should probably be treated as > > > special in the schema so it's clear that they are paired values > and > > thus > > > peak count could move into the spectrum element or > > spectrumDescription. > > > There should still be options to have additional arrays that > aren't > > the > > > same as the main arrays (for example, an additional set of > > arrays, one > > > for a subset of the m/zs and the other for peak charge > information). > > > > > > -Matt > > > > > > > > > Kessner, Darren E. wrote: > > > > Any other comments regarding <binaryArrayData> lengths? > > > > > > > > > > > >> (from Rune) > > > >> If they have to be equal size, then > > > >> that size ought to be specified in the spectrumDescription. > > > >> > > > > > > > > I agree -- I would like to encode the length in <spectrum> > > somewhere > > > > (either attribute or cvParam) so that: > > > > 1) it's clear that the arrays are of equal size > > > > 2) Readers don't have to peek into the attributes of the first > > > > <binaryArrayData> to get the info > > > > > > > > I need this right now for the MSData RAMP adapter code, so I'll > > encode > > > > it as a <userParam> until a decision has been made on the > > specification. > > > > > > > > > > > > Darren > > > > > > > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |