From: Kessner, D. E. <Dar...@cs...> - 2008-02-04 23:08:38
|
Hi all, Brian Pratt and I are working together on plugging the MSData library into RAMP, to allow the TPP tools to handle mzML. We came across something that should probably be mentioned in the specification. There can be (and usually are) multiple <binaryDataArray> elements in a <spectrum> (e.g. one for m/z values, one for intensity values): <spectrum> <spectrumDescription> ... </spectrumDescription> <binaryDataArray arrayLength="1000" ...> <cvParam cvLabel="MS" accession="MS:1000514" name="m/z array" value=""/> <binary> ... </binary> </binaryDataArray> <binaryDataArray arrayLength="1000" ...> <cvParam cvLabel="MS" accession="MS:1000515" name="intensity array" value=""/> <binary> ... </binary> </binaryDataArray> </spectrum> Readers (like MSData) want to be able to read the scan meta-data (i.e. <spectrum> up to the first <binaryDataArray> tag) without having to read and decode the binary data. Part of this scan meta-data is the size of the binaryDataArray (arrayLength attribute in mzML, peaksCount in mzXML). We can obtain this size by reading the first <binaryDataArray> tag attributes, but we have to take it on faith that the other binaryDataArrays are the same size. So the question is, can there be <binaryDataArray> elements with different arrayLengths in the same <spectrum>? If not, this should be in the specification. If there can be different sized arrays, can we assume that at least the m/z array size == intensity array size? In this case, we still need one of the following for efficient retrieval of the number of m/z-intensity pairs: 1) Either the m/z or intensity array must occur first in the list of all the <binaryDataArray> elements. 2) The number of m/z-intensity pairs is encoded in the <spectrumDescription>, either as a cvParam or an attribute. Darren Darren Kessner Scientific Programmer Dar...@cs... 310-423-9538 Spielberg Family Center for Applied Proteomics Cedars-Sinai Medical Center http://www.sfcap.cshs.org/ IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is STRICTLY PROHIBITED. If you have received this message in error, please notify us immediately by calling (310) 423-6428 and destroy the related message. Thank You for your cooperation. |
From: Angel P. <an...@ma...> - 2008-02-05 00:15:50
|
On Feb 4, 2008 7:08 PM, Kessner, Darren E. <Dar...@cs...> wrote: > > So the question is, can there be <binaryDataArray> elements with different > arrayLengths in the same <spectrum>? If not, this should be in the > specification. > > I would punt this question to the hardware guys. Anyone? -angel > > |
From: Rune S. P. <mai...@ph...> - 2008-02-05 08:44:05
|
Kessner, Darren E. wrote: > > So the question is, can there be <binaryDataArray> elements with > different arrayLengths in the same <spectrum>? If not, this should be > in the specification. > It easily makes sense how the different arrays works together if they are equal length. Without a specified way to interpret different sized arrays, they have to be equal size. If they have to be equal size, then that size ought to be specified in the spectrumDescription. I can see why some of the arrays might not have to be the same size. For instance, often you don't know the charges of all the peaks. However, it is impossible to interpret such an array, without a clearly specified way of encoding which charges belong to which m/z values. By the way, what data is the time array supposed to be used for? -- With regards Rune |
From: Kessner, D. E. <Dar...@cs...> - 2008-02-06 18:42:56
|
Any other comments regarding <binaryArrayData> lengths? > (from Rune) > If they have to be equal size, then > that size ought to be specified in the spectrumDescription. I agree -- I would like to encode the length in <spectrum> somewhere (either attribute or cvParam) so that: 1) it's clear that the arrays are of equal size 2) Readers don't have to peek into the attributes of the first <binaryArrayData> to get the info I need this right now for the MSData RAMP adapter code, so I'll encode it as a <userParam> until a decision has been made on the specification. Darren IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is STRICTLY PROHIBITED. If you have received this message in error, please notify us immediately by calling (310) 423-6428 and destroy the related message. Thank You for your cooperation. |
From: Matthew C. <mat...@va...> - 2008-02-06 18:49:43
|
I agree that the primary data arrays should probably be treated as special in the schema so it's clear that they are paired values and thus peak count could move into the spectrum element or spectrumDescription. There should still be options to have additional arrays that aren't the same as the main arrays (for example, an additional set of arrays, one for a subset of the m/zs and the other for peak charge information). -Matt Kessner, Darren E. wrote: > Any other comments regarding <binaryArrayData> lengths? > > >> (from Rune) >> If they have to be equal size, then >> that size ought to be specified in the spectrumDescription. >> > > I agree -- I would like to encode the length in <spectrum> somewhere > (either attribute or cvParam) so that: > 1) it's clear that the arrays are of equal size > 2) Readers don't have to peek into the attributes of the first > <binaryArrayData> to get the info > > I need this right now for the MSData RAMP adapter code, so I'll encode > it as a <userParam> until a decision has been made on the specification. > > > Darren |
From: Angel P. <an...@ma...> - 2008-02-06 18:56:29
|
On Feb 6, 2008 1:49 PM, Matthew Chambers <mat...@va...> wrote: > I agree that the primary data arrays should probably be treated as > special in the schema so it's clear that they are paired values and thus > peak count could move into the spectrum element or spectrumDescription. > There should still be options to have additional arrays that aren't the > same as the main arrays (for example, an additional set of arrays, one > for a subset of the m/zs and the other for peak charge information). > Hi Matt and Darren, First thanks for all the feedback over the last year. I don't think I have ever expressed my gratitude. As for this issue of length of binary arrays, again I ask that the hardware vendors speak on this. Your example of a subset of peaks and charge states seems to me like the raw data went through some process to get that subset or to determine charge, hence this should be a new mzML file. Just my 2¢ -angel |
From: Eric D. <ede...@sy...> - 2008-02-12 21:27:54
|
So there seems to be broad consensus (4 for 4;) that moving the arrayLength up a little higher is a good idea. So instead of: <spectrum id="S19" scanNumber="19" msLevel="1"> <spectrumDescription> ... </spectrumDescription> <binaryDataArray arrayLength="1313" encodedLength="5433" dataProcessingRef="Xcalibur Processing"> ... <binary>AAAAwDsGeUAAAAD...</binary> </binaryDataArray> <binaryDataArray arrayLength="1313" encodedLength="4892"> ... <binary>AAAAAIBJxk...</binary> </binaryDataArray> </spectrum> We will have: !!!!!!!!!!!!!!!!!! <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> <spectrumDescription> ... </spectrumDescription> <binaryDataArray encodedLength="5433" dataProcessingRef="Xcalibur Processing"> ... <binary>AAAAwDsGeUAAAAD...</binary> </binaryDataArray> <binaryDataArray encodedLength="4892"> ... <binary>AAAAAIBJxk...</binary> </binaryDataArray> </spectrum> Agreed? > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Matthew Chambers > Sent: Wednesday, February 06, 2008 10:49 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > I agree that the primary data arrays should probably be treated as > special in the schema so it's clear that they are paired values and thus > peak count could move into the spectrum element or spectrumDescription. > There should still be options to have additional arrays that aren't the > same as the main arrays (for example, an additional set of arrays, one > for a subset of the m/zs and the other for peak charge information). > > -Matt > > > Kessner, Darren E. wrote: > > Any other comments regarding <binaryArrayData> lengths? > > > > > >> (from Rune) > >> If they have to be equal size, then > >> that size ought to be specified in the spectrumDescription. > >> > > > > I agree -- I would like to encode the length in <spectrum> somewhere > > (either attribute or cvParam) so that: > > 1) it's clear that the arrays are of equal size > > 2) Readers don't have to peek into the attributes of the first > > <binaryArrayData> to get the info > > > > I need this right now for the MSData RAMP adapter code, so I'll encode > > it as a <userParam> until a decision has been made on the specification. > > > > > > Darren > > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Angel P. <an...@ma...> - 2008-02-12 21:34:18
|
+1 agreed -angel On Feb 12, 2008 4:27 PM, Eric Deutsch <ede...@sy...> wrote: > > So there seems to be broad consensus (4 for 4;) that moving the > arrayLength up a little higher is a good idea. So instead of: > > <spectrum id="S19" scanNumber="19" msLevel="1"> > <spectrumDescription> > ... > </spectrumDescription> > <binaryDataArray arrayLength="1313" encodedLength="5433" > dataProcessingRef="Xcalibur Processing"> > ... > <binary>AAAAwDsGeUAAAAD...</binary> > </binaryDataArray> > <binaryDataArray arrayLength="1313" encodedLength="4892"> > ... > <binary>AAAAAIBJxk...</binary> > </binaryDataArray> > </spectrum> > > We will have: !!!!!!!!!!!!!!!!!! > > <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> > <spectrumDescription> > ... > </spectrumDescription> > <binaryDataArray encodedLength="5433" > dataProcessingRef="Xcalibur Processing"> > ... > <binary>AAAAwDsGeUAAAAD...</binary> > </binaryDataArray> > <binaryDataArray encodedLength="4892"> > ... > <binary>AAAAAIBJxk...</binary> > </binaryDataArray> > </spectrum> > > > Agreed? > > > > > -----Original Message----- > > From: psi...@li... > [mailto:psidev-ms-dev- > > bo...@li...] On Behalf Of Matthew Chambers > > Sent: Wednesday, February 06, 2008 10:49 AM > > To: Mass spectrometry standard development > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > I agree that the primary data arrays should probably be treated as > > special in the schema so it's clear that they are paired values and > thus > > peak count could move into the spectrum element or > spectrumDescription. > > There should still be options to have additional arrays that aren't > the > > same as the main arrays (for example, an additional set of arrays, one > > for a subset of the m/zs and the other for peak charge information). > > > > -Matt > > > > > > Kessner, Darren E. wrote: > > > Any other comments regarding <binaryArrayData> lengths? > > > > > > > > >> (from Rune) > > >> If they have to be equal size, then > > >> that size ought to be specified in the spectrumDescription. > > >> > > > > > > I agree -- I would like to encode the length in <spectrum> somewhere > > > (either attribute or cvParam) so that: > > > 1) it's clear that the arrays are of equal size > > > 2) Readers don't have to peek into the attributes of the first > > > <binaryArrayData> to get the info > > > > > > I need this right now for the MSData RAMP adapter code, so I'll > encode > > > it as a <userParam> until a decision has been made on the > specification. > > > > > > > > > Darren > > > > > > > ------------------------------------------------------------------------ > - > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Randy J. <rkj...@in...> - 2008-02-13 21:27:17
|
I also agree that there is no problem with moving the array length up. I would like to ask again: 1. is scanNumber different from <scan> which lives lower 2. can we make msLevel either optional, or a cvParam? Randy ________________________________ From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: Tuesday, February 12, 2008 4:34 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] binaryArrayData lengths +1 agreed -angel On Feb 12, 2008 4:27 PM, Eric Deutsch <ede...@sy...> wrote: So there seems to be broad consensus (4 for 4;) that moving the arrayLength up a little higher is a good idea. So instead of: <spectrum id="S19" scanNumber="19" msLevel="1"> <spectrumDescription> ... </spectrumDescription> <binaryDataArray arrayLength="1313" encodedLength="5433" dataProcessingRef="Xcalibur Processing"> ... <binary>AAAAwDsGeUAAAAD...</binary> </binaryDataArray> <binaryDataArray arrayLength="1313" encodedLength="4892"> ... <binary>AAAAAIBJxk...</binary> </binaryDataArray> </spectrum> We will have: !!!!!!!!!!!!!!!!!! <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> <spectrumDescription> ... </spectrumDescription> <binaryDataArray encodedLength="5433" dataProcessingRef="Xcalibur Processing"> ... <binary>AAAAwDsGeUAAAAD...</binary> </binaryDataArray> <binaryDataArray encodedLength="4892"> ... <binary>AAAAAIBJxk...</binary> </binaryDataArray> </spectrum> Agreed? > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Matthew Chambers > Sent: Wednesday, February 06, 2008 10:49 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > I agree that the primary data arrays should probably be treated as > special in the schema so it's clear that they are paired values and thus > peak count could move into the spectrum element or spectrumDescription. > There should still be options to have additional arrays that aren't the > same as the main arrays (for example, an additional set of arrays, one > for a subset of the m/zs and the other for peak charge information). > > -Matt > > > Kessner, Darren E. wrote: > > Any other comments regarding <binaryArrayData> lengths? > > > > > >> (from Rune) > >> If they have to be equal size, then > >> that size ought to be specified in the spectrumDescription. > >> > > > > I agree -- I would like to encode the length in <spectrum> somewhere > > (either attribute or cvParam) so that: > > 1) it's clear that the arrays are of equal size > > 2) Readers don't have to peek into the attributes of the first > > <binaryArrayData> to get the info > > > > I need this right now for the MSData RAMP adapter code, so I'll encode > > it as a <userParam> until a decision has been made on the specification. > > > > > > Darren > > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Matthew C. <mat...@va...> - 2008-02-12 21:38:12
|
I can live with that (or "length" or "peakCount"). -Matt Eric Deutsch wrote: > So there seems to be broad consensus (4 for 4;) that moving the > arrayLength up a little higher is a good idea. So instead of: > > <spectrum id="S19" scanNumber="19" msLevel="1"> > <spectrumDescription> > ... > </spectrumDescription> > <binaryDataArray arrayLength="1313" encodedLength="5433" > dataProcessingRef="Xcalibur Processing"> > ... > <binary>AAAAwDsGeUAAAAD...</binary> > </binaryDataArray> > <binaryDataArray arrayLength="1313" encodedLength="4892"> > ... > <binary>AAAAAIBJxk...</binary> > </binaryDataArray> > </spectrum> > > We will have: !!!!!!!!!!!!!!!!!! > > <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> > <spectrumDescription> > ... > </spectrumDescription> > <binaryDataArray encodedLength="5433" > dataProcessingRef="Xcalibur Processing"> > ... > <binary>AAAAwDsGeUAAAAD...</binary> > </binaryDataArray> > <binaryDataArray encodedLength="4892"> > ... > <binary>AAAAAIBJxk...</binary> > </binaryDataArray> > </spectrum> > > > Agreed? > > > > >> -----Original Message----- >> From: psi...@li... >> > [mailto:psidev-ms-dev- > >> bo...@li...] On Behalf Of Matthew Chambers >> Sent: Wednesday, February 06, 2008 10:49 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] binaryArrayData lengths >> >> I agree that the primary data arrays should probably be treated as >> special in the schema so it's clear that they are paired values and >> > thus > >> peak count could move into the spectrum element or >> > spectrumDescription. > >> There should still be options to have additional arrays that aren't >> > the > >> same as the main arrays (for example, an additional set of arrays, one >> for a subset of the m/zs and the other for peak charge information). >> >> -Matt >> >> >> Kessner, Darren E. wrote: >> >>> Any other comments regarding <binaryArrayData> lengths? >>> >>> >>> >>>> (from Rune) >>>> If they have to be equal size, then >>>> that size ought to be specified in the spectrumDescription. >>>> >>>> >>> I agree -- I would like to encode the length in <spectrum> somewhere >>> (either attribute or cvParam) so that: >>> 1) it's clear that the arrays are of equal size >>> 2) Readers don't have to peek into the attributes of the first >>> <binaryArrayData> to get the info >>> >>> I need this right now for the MSData RAMP adapter code, so I'll >>> > encode > >>> it as a <userParam> until a decision has been made on the >>> > specification. > >>> Darren >>> >> >> > ------------------------------------------------------------------------ > - > >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Mike C. <tu...@gm...> - 2008-02-12 21:42:05
|
I'm in favor of this change. Would the interpretation be a little more obvious is this were called something like "peakCount" instead of "arrayLength"? Mike On Feb 12, 2008 3:27 PM, Eric Deutsch <ede...@sy...> wrote: > <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> |
From: Kessner, D. E. <Dar...@cs...> - 2008-02-12 21:49:20
|
I have an objection to the use of "peak" or "peakCount", since this has the alternate meaning of "local maximum" (as opposed to "data point"). Darren -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Mike Coleman Sent: Tuesday, February 12, 2008 1:42 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] binaryArrayData lengths I'm in favor of this change. Would the interpretation be a little more obvious is this were called something like "peakCount" instead of "arrayLength"? Mike On Feb 12, 2008 3:27 PM, Eric Deutsch <ede...@sy...> wrote: > <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is STRICTLY PROHIBITED. If you have received this message in error, please notify us immediately by calling (310) 423-6428 and destroy the related message. Thank You for your cooperation. |
From: Eric D. <ede...@sy...> - 2008-02-12 22:14:40
|
I agree with Darren. For profile (aka continuous) mode data, each element in the array is not a peak, so I would not want to label it such. We use the term encodedLength to refer to the length of the string after base64 encoding. It seems like a natural thing to call this concept arrayLength. Something like arrayElementCount could work, too. But I'm currently still in favor of arrayLength unless someone has a more elegant name. Thanks, Eric > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Kessner, Darren E. > Sent: Tuesday, February 12, 2008 1:49 PM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > I have an objection to the use of "peak" or "peakCount", since this has > the alternate meaning of "local maximum" (as opposed to "data point"). > > Darren > > > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On Behalf Of Mike > Coleman > Sent: Tuesday, February 12, 2008 1:42 PM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > I'm in favor of this change. Would the interpretation be a little > more obvious is this were called something like "peakCount" instead of > "arrayLength"? > > Mike > > > On Feb 12, 2008 3:27 PM, Eric Deutsch <ede...@sy...> > wrote: > > <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> > > ------------------------------------------------------------------------ > - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > IMPORTANT WARNING: This message is intended for the use of the person or > entity to which it is addressed and may contain information that is > privileged and confidential, the disclosure of which is governed by > applicable law. If the reader of this message is not the intended > recipient, or the employee or agent responsible for delivering it to the > intended recipient, you are hereby notified that any dissemination, > distribution or copying of this information is STRICTLY PROHIBITED. > > If you have received this message in error, please notify us immediately > by calling (310) 423-6428 and destroy the related message. Thank You for > your cooperation. > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Matthew C. <mat...@va...> - 2008-02-12 22:57:03
|
We could go with "primaryArrayLength" which is generic in case a future update to the schema adds support for other types of primary data array pairs (like time vs. intensity for chromatograms), or go for the unambiguous "mzIntArrayLength" to be specific now (and if chromatograms are added to the schema in the future, it might be with an analogous "timeIntArrayLength" attribute). -Matt Eric Deutsch wrote: > I agree with Darren. For profile (aka continuous) mode data, each > element in the array is not a peak, so I would not want to label it > such. > > We use the term encodedLength to refer to the length of the string after > base64 encoding. It seems like a natural thing to call this concept > arrayLength. Something like arrayElementCount could work, too. > > But I'm currently still in favor of arrayLength unless someone has a > more elegant name. > > Thanks, > Eric > > > >> -----Original Message----- >> From: psi...@li... >> > [mailto:psidev-ms-dev- > >> bo...@li...] On Behalf Of Kessner, Darren E. >> Sent: Tuesday, February 12, 2008 1:49 PM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] binaryArrayData lengths >> >> I have an objection to the use of "peak" or "peakCount", since this >> > has > >> the alternate meaning of "local maximum" (as opposed to "data point"). >> >> Darren >> >> >> -----Original Message----- >> From: psi...@li... >> [mailto:psi...@li...] On Behalf Of Mike >> Coleman >> Sent: Tuesday, February 12, 2008 1:42 PM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] binaryArrayData lengths >> >> I'm in favor of this change. Would the interpretation be a little >> more obvious is this were called something like "peakCount" instead of >> "arrayLength"? >> >> Mike >> >> >> On Feb 12, 2008 3:27 PM, Eric Deutsch <ede...@sy...> >> wrote: >> >>> <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> >>> >> > ------------------------------------------------------------------------ > >> - >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> IMPORTANT WARNING: This message is intended for the use of the person >> > or > >> entity to which it is addressed and may contain information that is >> privileged and confidential, the disclosure of which is governed by >> applicable law. If the reader of this message is not the intended >> recipient, or the employee or agent responsible for delivering it to >> > the > >> intended recipient, you are hereby notified that any dissemination, >> distribution or copying of this information is STRICTLY PROHIBITED. >> >> If you have received this message in error, please notify us >> > immediately > >> by calling (310) 423-6428 and destroy the related message. Thank You >> > for > >> your cooperation. >> >> >> > ------------------------------------------------------------------------ > - > >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Brian P. <bri...@in...> - 2008-02-12 22:46:16
|
I think that's not quite right - arrayLength needs to remain an attribute of BinaryDataArray since not all BinaryDataArray elements in a spectrum will necessarily contain the same number of entries as an mz or intensity array, since not all BinaryDataArray elements are guaranteed (as I understand mzML, which is but dimly) to be mz or intensity. You'll need to write it again as an attribute of spectrum, something like mzintPairsCount if you don't like PeaksCount. -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Eric Deutsch Sent: Tuesday, February 12, 2008 1:28 PM To: Mass spectrometry standard development Cc: Eric Deutsch Subject: Re: [Psidev-ms-dev] binaryArrayData lengths So there seems to be broad consensus (4 for 4;) that moving the arrayLength up a little higher is a good idea. So instead of: <spectrum id="S19" scanNumber="19" msLevel="1"> <spectrumDescription> ... </spectrumDescription> <binaryDataArray arrayLength="1313" encodedLength="5433" dataProcessingRef="Xcalibur Processing"> ... <binary>AAAAwDsGeUAAAAD...</binary> </binaryDataArray> <binaryDataArray arrayLength="1313" encodedLength="4892"> ... <binary>AAAAAIBJxk...</binary> </binaryDataArray> </spectrum> We will have: !!!!!!!!!!!!!!!!!! <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> <spectrumDescription> ... </spectrumDescription> <binaryDataArray encodedLength="5433" dataProcessingRef="Xcalibur Processing"> ... <binary>AAAAwDsGeUAAAAD...</binary> </binaryDataArray> <binaryDataArray encodedLength="4892"> ... <binary>AAAAAIBJxk...</binary> </binaryDataArray> </spectrum> Agreed? > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Matthew Chambers > Sent: Wednesday, February 06, 2008 10:49 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > I agree that the primary data arrays should probably be treated as > special in the schema so it's clear that they are paired values and thus > peak count could move into the spectrum element or spectrumDescription. > There should still be options to have additional arrays that aren't the > same as the main arrays (for example, an additional set of arrays, one > for a subset of the m/zs and the other for peak charge information). > > -Matt > > > Kessner, Darren E. wrote: > > Any other comments regarding <binaryArrayData> lengths? > > > > > >> (from Rune) > >> If they have to be equal size, then > >> that size ought to be specified in the spectrumDescription. > >> > > > > I agree -- I would like to encode the length in <spectrum> somewhere > > (either attribute or cvParam) so that: > > 1) it's clear that the arrays are of equal size > > 2) Readers don't have to peek into the attributes of the first > > <binaryArrayData> to get the info > > > > I need this right now for the MSData RAMP adapter code, so I'll encode > > it as a <userParam> until a decision has been made on the specification. > > > > > > Darren > > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Angel P. <an...@ma...> - 2008-02-13 00:41:26
|
On Feb 12, 2008 5:46 PM, Brian Pratt <bri...@in...> wrote: > I think that's not quite right - arrayLength needs to remain an attribute > of > BinaryDataArray since not all BinaryDataArray elements in a spectrum will > necessarily contain the same number of entries as an mz or intensity > array, Such as .... ? I asked for examples of this and never got a reply. Related question, if they are different lengths how would you go about assigning a value to a particular index (or set of indexes) that the value refers to in another binary array? My point is that it would be infinitely easier to just repeat values (MRM transitions values, retention time values, or even nil values) that pertain to more than one index so you always have a 1:1 correspondence across arrays. Of course I am making the assumption that all binary arrays within a single spectrum element are related to each other in some manner, so if this does not hold true, please someone tell me, as I am fairly ignorant on the mass spec acquisition modes. The alternative representation would be coordinate systems and multidimensional data arrays *a la* netcdf or HDF5, but we are too far along the route that we have laid out to even consider a change this radical. BTW, I did do some mzData (v 1.05) to netcdf conversion and the netcdf files are even bigger, at a gain of built in index into the data arrays within and across spectra. -angel > > since not all BinaryDataArray elements are guaranteed (as I understand > mzML, > which is but dimly) to be mz or intensity. You'll need to write it again > as > an attribute of spectrum, something like mzintPairsCount if you don't like > PeaksCount. > > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On Behalf Of Eric > Deutsch > Sent: Tuesday, February 12, 2008 1:28 PM > To: Mass spectrometry standard development > Cc: Eric Deutsch > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > So there seems to be broad consensus (4 for 4;) that moving the > arrayLength up a little higher is a good idea. So instead of: > > <spectrum id="S19" scanNumber="19" msLevel="1"> > <spectrumDescription> > ... > </spectrumDescription> > <binaryDataArray arrayLength="1313" encodedLength="5433" > dataProcessingRef="Xcalibur Processing"> > ... > <binary>AAAAwDsGeUAAAAD...</binary> > </binaryDataArray> > <binaryDataArray arrayLength="1313" encodedLength="4892"> > ... > <binary>AAAAAIBJxk...</binary> > </binaryDataArray> > </spectrum> > > We will have: !!!!!!!!!!!!!!!!!! > > <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> > <spectrumDescription> > ... > </spectrumDescription> > <binaryDataArray encodedLength="5433" > dataProcessingRef="Xcalibur Processing"> > ... > <binary>AAAAwDsGeUAAAAD...</binary> > </binaryDataArray> > <binaryDataArray encodedLength="4892"> > ... > <binary>AAAAAIBJxk...</binary> > </binaryDataArray> > </spectrum> > > > Agreed? > > > > > -----Original Message----- > > From: psi...@li... > [mailto:psidev-ms-dev- > > bo...@li...] On Behalf Of Matthew Chambers > > Sent: Wednesday, February 06, 2008 10:49 AM > > To: Mass spectrometry standard development > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > I agree that the primary data arrays should probably be treated as > > special in the schema so it's clear that they are paired values and > thus > > peak count could move into the spectrum element or > spectrumDescription. > > There should still be options to have additional arrays that aren't > the > > same as the main arrays (for example, an additional set of arrays, one > > for a subset of the m/zs and the other for peak charge information). > > > > -Matt > > > > > > Kessner, Darren E. wrote: > > > Any other comments regarding <binaryArrayData> lengths? > > > > > > > > >> (from Rune) > > >> If they have to be equal size, then > > >> that size ought to be specified in the spectrumDescription. > > >> > > > > > > I agree -- I would like to encode the length in <spectrum> somewhere > > > (either attribute or cvParam) so that: > > > 1) it's clear that the arrays are of equal size > > > 2) Readers don't have to peek into the attributes of the first > > > <binaryArrayData> to get the info > > > > > > I need this right now for the MSData RAMP adapter code, so I'll > encode > > > it as a <userParam> until a decision has been made on the > specification. > > > > > > > > > Darren > > > > > > > ------------------------------------------------------------------------ > - > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Eric D. <ede...@sy...> - 2008-02-13 01:06:08
|
I also think that opening the door to arrays of differing lengths is a complexity is best not done at this time. Any additional arrays that I can think of, S/N or charge or similar could all have the same number of elements. Yes, I suppose I could imagine having something like: m/z axis 1 intensity values corresponding to axis 1 m/z axis 2 for a subset of points in axis 1 charge values for axis 2 But I think that this is complexity that we have survived thus far without and we best leave out at this point. So, I propose we limit all binaryDataArrays lengths within one <spectrum> to be the same. Any dissent? Please provide a specific example of XML you'd like to see instead. Thanks, Eric ________________________________ From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: Tuesday, February 12, 2008 4:42 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] binaryArrayData lengths On Feb 12, 2008 5:46 PM, Brian Pratt <bri...@in...> wrote: I think that's not quite right - arrayLength needs to remain an attribute of BinaryDataArray since not all BinaryDataArray elements in a spectrum will necessarily contain the same number of entries as an mz or intensity array, Such as .... ? I asked for examples of this and never got a reply. Related question, if they are different lengths how would you go about assigning a value to a particular index (or set of indexes) that the value refers to in another binary array? My point is that it would be infinitely easier to just repeat values (MRM transitions values, retention time values, or even nil values) that pertain to more than one index so you always have a 1:1 correspondence across arrays. Of course I am making the assumption that all binary arrays within a single spectrum element are related to each other in some manner, so if this does not hold true, please someone tell me, as I am fairly ignorant on the mass spec acquisition modes. The alternative representation would be coordinate systems and multidimensional data arrays a la netcdf or HDF5, but we are too far along the route that we have laid out to even consider a change this radical. BTW, I did do some mzData (v 1.05) to netcdf conversion and the netcdf files are even bigger, at a gain of built in index into the data arrays within and across spectra. -angel since not all BinaryDataArray elements are guaranteed (as I understand mzML, which is but dimly) to be mz or intensity. You'll need to write it again as an attribute of spectrum, something like mzintPairsCount if you don't like PeaksCount. -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Eric Deutsch Sent: Tuesday, February 12, 2008 1:28 PM To: Mass spectrometry standard development Cc: Eric Deutsch Subject: Re: [Psidev-ms-dev] binaryArrayData lengths So there seems to be broad consensus (4 for 4;) that moving the arrayLength up a little higher is a good idea. So instead of: <spectrum id="S19" scanNumber="19" msLevel="1"> <spectrumDescription> ... </spectrumDescription> <binaryDataArray arrayLength="1313" encodedLength="5433" dataProcessingRef="Xcalibur Processing"> ... <binary>AAAAwDsGeUAAAAD...</binary> </binaryDataArray> <binaryDataArray arrayLength="1313" encodedLength="4892"> ... <binary>AAAAAIBJxk...</binary> </binaryDataArray> </spectrum> We will have: !!!!!!!!!!!!!!!!!! <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> <spectrumDescription> ... </spectrumDescription> <binaryDataArray encodedLength="5433" dataProcessingRef="Xcalibur Processing"> ... <binary>AAAAwDsGeUAAAAD...</binary> </binaryDataArray> <binaryDataArray encodedLength="4892"> ... <binary>AAAAAIBJxk...</binary> </binaryDataArray> </spectrum> Agreed? > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Matthew Chambers > Sent: Wednesday, February 06, 2008 10:49 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > I agree that the primary data arrays should probably be treated as > special in the schema so it's clear that they are paired values and thus > peak count could move into the spectrum element or spectrumDescription. > There should still be options to have additional arrays that aren't the > same as the main arrays (for example, an additional set of arrays, one > for a subset of the m/zs and the other for peak charge information). > > -Matt > > > Kessner, Darren E. wrote: > > Any other comments regarding <binaryArrayData> lengths? > > > > > >> (from Rune) > >> If they have to be equal size, then > >> that size ought to be specified in the spectrumDescription. > >> > > > > I agree -- I would like to encode the length in <spectrum> somewhere > > (either attribute or cvParam) so that: > > 1) it's clear that the arrays are of equal size > > 2) Readers don't have to peek into the attributes of the first > > <binaryArrayData> to get the info > > > > I need this right now for the MSData RAMP adapter code, so I'll encode > > it as a <userParam> until a decision has been made on the specification. > > > > > > Darren > > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Matt C. <mat...@va...> - 2008-02-13 01:06:14
|
It's reasonable that a user of the format would want to store structured information for a limited number of peaks (or store a variable number of values in one field, e.g. multidimensional array) so the binary data might be laid out in a user-defined pattern: m/z (same precision as main array) count of charge assignments (2 bytes) charge assignment 1 ... charge assignment N (2 bytes each) isotope profile ID (4 bytes; unique in a spectrum) isotope number of peak (2 bytes; monoisotope=0) peak label (variable length, 0 terminated) This structure would still be binary and base64 encoded, so netcdf et al. are not necessary. Allowing the secondary data arrays to have a different length leaves the format open to user-defined craziness like this, and I think that's a good thing. Definitely you wouldn't want to define one of these structures for every data point if you're dealing of data with a decent amount of noise! -Matt Angel Pizarro wrote: > On Feb 12, 2008 5:46 PM, Brian Pratt <bri...@in... > <mailto:bri...@in...>> wrote: > > I think that's not quite right - arrayLength needs to remain an > attribute of > BinaryDataArray since not all BinaryDataArray elements in a > spectrum will > necessarily contain the same number of entries as an mz or > intensity array, > > > Such as .... ? I asked for examples of this and never got a reply. > > Related question, if they are different lengths how would you go about > assigning a value to a particular index (or set of indexes) that the > value refers to in another binary array? > > My point is that it would be infinitely easier to just repeat values > (MRM transitions values, retention time values, or even nil values) > that pertain to more than one index so you always have a 1:1 > correspondence across arrays. Of course I am making the assumption > that all binary arrays within a single spectrum element are related to > each other in some manner, so if this does not hold true, please > someone tell me, as I am fairly ignorant on the mass spec acquisition > modes. > > The alternative representation would be coordinate systems and > multidimensional data arrays /a la/ netcdf or HDF5, but we are too > far along the route that we have laid out to even consider a change > this radical. BTW, I did do some mzData (v 1.05) to netcdf conversion > and the netcdf files are even bigger, at a gain of built in index into > the data arrays within and across spectra. > > -angel > > > since not all BinaryDataArray elements are guaranteed (as I > understand mzML, > which is but dimly) to be mz or intensity. You'll need to write > it again as > an attribute of spectrum, something like mzintPairsCount if you > don't like > PeaksCount. > > -----Original Message----- > From: psi...@li... > <mailto:psi...@li...> > [mailto:psi...@li... > <mailto:psi...@li...>] On Behalf Of > Eric > Deutsch > Sent: Tuesday, February 12, 2008 1:28 PM > To: Mass spectrometry standard development > Cc: Eric Deutsch > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > So there seems to be broad consensus (4 for 4;) that moving the > arrayLength up a little higher is a good idea. So instead of: > > <spectrum id="S19" scanNumber="19" msLevel="1"> > <spectrumDescription> > ... > </spectrumDescription> > <binaryDataArray arrayLength="1313" encodedLength="5433" > dataProcessingRef="Xcalibur Processing"> > ... > <binary>AAAAwDsGeUAAAAD...</binary> > </binaryDataArray> > <binaryDataArray arrayLength="1313" encodedLength="4892"> > ... > <binary>AAAAAIBJxk...</binary> > </binaryDataArray> > </spectrum> > > We will have: !!!!!!!!!!!!!!!!!! > > <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> > <spectrumDescription> > ... > </spectrumDescription> > <binaryDataArray encodedLength="5433" > dataProcessingRef="Xcalibur Processing"> > ... > <binary>AAAAwDsGeUAAAAD...</binary> > </binaryDataArray> > <binaryDataArray encodedLength="4892"> > ... > <binary>AAAAAIBJxk...</binary> > </binaryDataArray> > </spectrum> > > > Agreed? > > > > > -----Original Message----- > > From: psi...@li... > <mailto:psi...@li...> > [mailto:psidev-ms-dev- <mailto:psidev-ms-dev-> > > bo...@li... > <mailto:bo...@li...>] On Behalf Of Matthew Chambers > > Sent: Wednesday, February 06, 2008 10:49 AM > > To: Mass spectrometry standard development > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > I agree that the primary data arrays should probably be treated as > > special in the schema so it's clear that they are paired values and > thus > > peak count could move into the spectrum element or > spectrumDescription. > > There should still be options to have additional arrays that aren't > the > > same as the main arrays (for example, an additional set of > arrays, one > > for a subset of the m/zs and the other for peak charge information). > > > > -Matt > > > > > > Kessner, Darren E. wrote: > > > Any other comments regarding <binaryArrayData> lengths? > > > > > > > > >> (from Rune) > > >> If they have to be equal size, then > > >> that size ought to be specified in the spectrumDescription. > > >> > > > > > > I agree -- I would like to encode the length in <spectrum> > somewhere > > > (either attribute or cvParam) so that: > > > 1) it's clear that the arrays are of equal size > > > 2) Readers don't have to peek into the attributes of the first > > > <binaryArrayData> to get the info > > > > > > I need this right now for the MSData RAMP adapter code, so I'll > encode > > > it as a <userParam> until a decision has been made on the > specification. > > > > > > > > > Darren > > > > > |
From: Angel P. <an...@ma...> - 2008-02-13 02:41:11
|
On Feb 12, 2008 8:06 PM, Matt Chambers <mat...@va...> wrote: > It's reasonable that a user of the format would want to store structured > information for a limited number of peaks (or store a variable number of > values in one field, e.g. multidimensional array) so the binary data > might be laid out in a user-defined pattern: > m/z (same precision as main array) > errr, I don't get what you mean here. Does this mean that you ran a peak detection alg and have a much reduced set of data points? If so this is a new mzML file from the one prior to peak detection. count of charge assignments (2 bytes) > 1 per m/z? again # of array indexes the same as above charge assignment 1 ... charge assignment N (2 bytes each) can be encoded as a separate array for each charge with 0/1 > > isotope profile ID (4 bytes; unique in a spectrum) > I am ignorant of what this is ;) > isotope number of peak (2 bytes; monoisotope=0) > Also don;t have a clue about this. > peak label (variable length, 0 terminated) > meh.. May be long encode length, but will have the same # of elements as above, if indeed you were refering to some peak detection alg. producing the m/z array above. > > This structure would still be binary and base64 encoded, so netcdf et > al. are not necessary. Sorry to not be clear. I was not putting forth a use of netcdf , just that they have an alternate binary storage scheme that mzML that is coordinate based, hence the n-dimensional data can indeed have different lengths per axis. Allowing the secondary data arrays to have a > different length leaves the format open to user-defined craziness like > this, and I think that's a good thing. Definitely you wouldn't want to > define one of these structures for every data point if you're dealing of > data with a decent amount of noise! > > -Matt > > > Angel Pizarro wrote: > > On Feb 12, 2008 5:46 PM, Brian Pratt <bri...@in... > > <mailto:bri...@in...>> wrote: > > > > I think that's not quite right - arrayLength needs to remain an > > attribute of > > BinaryDataArray since not all BinaryDataArray elements in a > > spectrum will > > necessarily contain the same number of entries as an mz or > > intensity array, > > > > > > Such as .... ? I asked for examples of this and never got a reply. > > > > Related question, if they are different lengths how would you go about > > assigning a value to a particular index (or set of indexes) that the > > value refers to in another binary array? > > > > My point is that it would be infinitely easier to just repeat values > > (MRM transitions values, retention time values, or even nil values) > > that pertain to more than one index so you always have a 1:1 > > correspondence across arrays. Of course I am making the assumption > > that all binary arrays within a single spectrum element are related to > > each other in some manner, so if this does not hold true, please > > someone tell me, as I am fairly ignorant on the mass spec acquisition > > modes. > > > > The alternative representation would be coordinate systems and > > multidimensional data arrays /a la/ netcdf or HDF5, but we are too > > far along the route that we have laid out to even consider a change > > this radical. BTW, I did do some mzData (v 1.05) to netcdf conversion > > and the netcdf files are even bigger, at a gain of built in index into > > the data arrays within and across spectra. > > > > -angel > > > > > > since not all BinaryDataArray elements are guaranteed (as I > > understand mzML, > > which is but dimly) to be mz or intensity. You'll need to write > > it again as > > an attribute of spectrum, something like mzintPairsCount if you > > don't like > > PeaksCount. > > > > -----Original Message----- > > From: psi...@li... > > <mailto:psi...@li...> > > [mailto:psi...@li... > > <mailto:psi...@li...>] On Behalf Of > > Eric > > Deutsch > > Sent: Tuesday, February 12, 2008 1:28 PM > > To: Mass spectrometry standard development > > Cc: Eric Deutsch > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > > > So there seems to be broad consensus (4 for 4;) that moving the > > arrayLength up a little higher is a good idea. So instead of: > > > > <spectrum id="S19" scanNumber="19" msLevel="1"> > > <spectrumDescription> > > ... > > </spectrumDescription> > > <binaryDataArray arrayLength="1313" encodedLength="5433" > > dataProcessingRef="Xcalibur Processing"> > > ... > > <binary>AAAAwDsGeUAAAAD...</binary> > > </binaryDataArray> > > <binaryDataArray arrayLength="1313" encodedLength="4892"> > > ... > > <binary>AAAAAIBJxk...</binary> > > </binaryDataArray> > > </spectrum> > > > > We will have: !!!!!!!!!!!!!!!!!! > > > > <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> > > <spectrumDescription> > > ... > > </spectrumDescription> > > <binaryDataArray encodedLength="5433" > > dataProcessingRef="Xcalibur Processing"> > > ... > > <binary>AAAAwDsGeUAAAAD...</binary> > > </binaryDataArray> > > <binaryDataArray encodedLength="4892"> > > ... > > <binary>AAAAAIBJxk...</binary> > > </binaryDataArray> > > </spectrum> > > > > > > Agreed? > > > > > > > > > -----Original Message----- > > > From: psi...@li... > > <mailto:psi...@li...> > > [mailto:psidev-ms-dev- <mailto:psidev-ms-dev-> > > > bo...@li... > > <mailto:bo...@li...>] On Behalf Of Matthew > Chambers > > > Sent: Wednesday, February 06, 2008 10:49 AM > > > To: Mass spectrometry standard development > > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > > > I agree that the primary data arrays should probably be treated as > > > special in the schema so it's clear that they are paired values > and > > thus > > > peak count could move into the spectrum element or > > spectrumDescription. > > > There should still be options to have additional arrays that > aren't > > the > > > same as the main arrays (for example, an additional set of > > arrays, one > > > for a subset of the m/zs and the other for peak charge > information). > > > > > > -Matt > > > > > > > > > Kessner, Darren E. wrote: > > > > Any other comments regarding <binaryArrayData> lengths? > > > > > > > > > > > >> (from Rune) > > > >> If they have to be equal size, then > > > >> that size ought to be specified in the spectrumDescription. > > > >> > > > > > > > > I agree -- I would like to encode the length in <spectrum> > > somewhere > > > > (either attribute or cvParam) so that: > > > > 1) it's clear that the arrays are of equal size > > > > 2) Readers don't have to peek into the attributes of the first > > > > <binaryArrayData> to get the info > > > > > > > > I need this right now for the MSData RAMP adapter code, so I'll > > encode > > > > it as a <userParam> until a decision has been made on the > > specification. > > > > > > > > > > > > Darren > > > > > > > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Matt C. <mat...@va...> - 2008-02-13 04:43:36
|
Hi Angel, Angel Pizarro wrote: > On Feb 12, 2008 8:06 PM, Matt Chambers wrote: > > It's reasonable that a user of the format would want to store > structured > information for a limited number of peaks (or store a variable > number of > values in one field, e.g. multidimensional array) so the binary data > might be laid out in a user-defined pattern: > m/z (same precision as main array) > > > errr, I don't get what you mean here. Does this mean that you ran a > peak detection alg and have a much reduced set of data points? If so > this is a new mzML file from the one prior to peak detection. No, I mean that complex, multibyte metadata for data points may only be available for, say, 10% of the total data points (even after peak picking). It would be silly to require the user to store a 19+ byte struct for every peak. Yes, this is a very advanced use case that will probably never be seen, but we can allow it with virtually no drawback. > count of charge assignments (2 bytes) > > > 1 per m/z? again # of array indexes the same as above I think you misunderstood this count. It is the "N" in the following series (represents an array of charge assignments for this peak, just like more than one charge can be assigned to a precursor): > > charge assignment 1 ... charge assignment N (2 bytes each) > > > can be encoded as a separate array for each charge with 0/1 > > > isotope profile ID (4 bytes; unique in a spectrum) > > > I am ignorant of what this is ;) Just something I made up to take up space. I imagined some isotope profile/envelop detection algorithm running on a file and then annotating the discovered isotope profiles in this structure. That information does not fit in a 1:1 relationship (although it could be rearranged to have one ID per peak as a single array, which would meet your desires, and then infer a peak's isotope number by the number of times that ID had been seen in the array). > > > isotope number of peak (2 bytes; monoisotope=0) > > > Also don;t have a clue about this. > > peak label (variable length, 0 terminated) > > > meh.. May be long encode length, but will have the same # of elements > as above, if indeed you were refering to some peak detection alg. > producing the m/z array above. If only a few peaks had labels, most of the peaks would have a single 0 in the label array? As you said, that would be wasteful. > > Allowing the secondary data arrays to have a > different length leaves the format open to user-defined craziness like > this, and I think that's a good thing. Definitely you wouldn't want to > define one of these structures for every data point if you're > dealing of > data with a decent amount of noise! > It comes down to giving the user some flexibility and not imposing unnecessary rigidity in the schema. How much simpler does it really make the schema to make ALL the arrays the same length? Not very much, I think. > > -Matt > > > Angel Pizarro wrote: > > On Feb 12, 2008 5:46 PM, Brian Pratt <bri...@in... > <mailto:bri...@in...> > > <mailto:bri...@in... > <mailto:bri...@in...>>> wrote: > > > > I think that's not quite right - arrayLength needs to remain an > > attribute of > > BinaryDataArray since not all BinaryDataArray elements in a > > spectrum will > > necessarily contain the same number of entries as an mz or > > intensity array, > > > > > > Such as .... ? I asked for examples of this and never got a reply. > > > > Related question, if they are different lengths how would you go > about > > assigning a value to a particular index (or set of indexes) that the > > value refers to in another binary array? > > > > My point is that it would be infinitely easier to just repeat > values > > (MRM transitions values, retention time values, or even nil values) > > that pertain to more than one index so you always have a 1:1 > > correspondence across arrays. Of course I am making the assumption > > that all binary arrays within a single spectrum element are > related to > > each other in some manner, so if this does not hold true, please > > someone tell me, as I am fairly ignorant on the mass spec > acquisition > > modes. > > > > The alternative representation would be coordinate systems and > > multidimensional data arrays /a la/ netcdf or HDF5, but we are too > > far along the route that we have laid out to even consider a change > > this radical. BTW, I did do some mzData (v 1.05) to netcdf > conversion > > and the netcdf files are even bigger, at a gain of built in > index into > > the data arrays within and across spectra. > > > > -angel > > > > > > since not all BinaryDataArray elements are guaranteed (as I > > understand mzML, > > which is but dimly) to be mz or intensity. You'll need to write > > it again as > > an attribute of spectrum, something like mzintPairsCount if you > > don't like > > PeaksCount. > > > > -----Original Message----- > > From: psi...@li... > <mailto:psi...@li...> > > <mailto:psi...@li... > <mailto:psi...@li...>> > > [mailto:psi...@li... > <mailto:psi...@li...> > > <mailto:psi...@li... > <mailto:psi...@li...>>] On Behalf Of > > Eric > > Deutsch > > Sent: Tuesday, February 12, 2008 1:28 PM > > To: Mass spectrometry standard development > > Cc: Eric Deutsch > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > > > So there seems to be broad consensus (4 for 4;) that moving the > > arrayLength up a little higher is a good idea. So instead of: > > > > <spectrum id="S19" scanNumber="19" msLevel="1"> > > <spectrumDescription> > > ... > > </spectrumDescription> > > <binaryDataArray arrayLength="1313" encodedLength="5433" > > dataProcessingRef="Xcalibur Processing"> > > ... > > <binary>AAAAwDsGeUAAAAD...</binary> > > </binaryDataArray> > > <binaryDataArray arrayLength="1313" encodedLength="4892"> > > ... > > <binary>AAAAAIBJxk...</binary> > > </binaryDataArray> > > </spectrum> > > > > We will have: > !!!!!!!!!!!!!!!!!! > > > > <spectrum id="S19" scanNumber="19" msLevel="1" > arrayLength="1313"> > > <spectrumDescription> > > ... > > </spectrumDescription> > > <binaryDataArray encodedLength="5433" > > dataProcessingRef="Xcalibur Processing"> > > ... > > <binary>AAAAwDsGeUAAAAD...</binary> > > </binaryDataArray> > > <binaryDataArray encodedLength="4892"> > > ... > > <binary>AAAAAIBJxk...</binary> > > </binaryDataArray> > > </spectrum> > > > > > > Agreed? > > > > > > > > > -----Original Message----- > > > From: psi...@li... > <mailto:psi...@li...> > > <mailto:psi...@li... > <mailto:psi...@li...>> > > [mailto:psidev-ms-dev- <mailto:psidev-ms-dev-> > <mailto:psidev-ms-dev- <mailto:psidev-ms-dev->> > > > bo...@li... > <mailto:bo...@li...> > > <mailto:bo...@li... > <mailto:bo...@li...>>] On Behalf Of Matthew Chambers > > > Sent: Wednesday, February 06, 2008 10:49 AM > > > To: Mass spectrometry standard development > > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > > > I agree that the primary data arrays should probably be > treated as > > > special in the schema so it's clear that they are paired > values and > > thus > > > peak count could move into the spectrum element or > > spectrumDescription. > > > There should still be options to have additional arrays > that aren't > > the > > > same as the main arrays (for example, an additional set of > > arrays, one > > > for a subset of the m/zs and the other for peak charge > information). > > > > > > -Matt > > > > > > > > > Kessner, Darren E. wrote: > > > > Any other comments regarding <binaryArrayData> lengths? > > > > > > > > > > > >> (from Rune) > > > >> If they have to be equal size, then > > > >> that size ought to be specified in the spectrumDescription. > > > >> > > > > > > > > I agree -- I would like to encode the length in <spectrum> > > somewhere > > > > (either attribute or cvParam) so that: > > > > 1) it's clear that the arrays are of equal size > > > > 2) Readers don't have to peek into the attributes of the > first > > > > <binaryArrayData> to get the info > > > > > > > > I need this right now for the MSData RAMP adapter code, > so I'll > > encode > > > > it as a <userParam> until a decision has been made on the > > specification. > > > > > > > > > > > > Darren > > > > > > > > > > |
From: Rune S. P. <mai...@ph...> - 2008-02-13 09:12:39
|
Hello See comments below. Matt Chambers wrote: > Angel Pizarro wrote: > >> On Feb 12, 2008 8:06 PM, Matt Chambers wrote: >> >> It's reasonable that a user of the format would want to store >> structured >> information for a limited number of peaks (or store a variable >> number of >> values in one field, e.g. multidimensional array) so the binary data >> might be laid out in a user-defined pattern: >> m/z (same precision as main array) >> >> >> errr, I don't get what you mean here. Does this mean that you ran a >> peak detection alg and have a much reduced set of data points? If so >> this is a new mzML file from the one prior to peak detection. >> > No, I mean that complex, multibyte metadata for data points may only be > available for, say, 10% of the total data points (even after peak > picking). It would be silly to require the user to store a 19+ byte > struct for every peak. Yes, this is a very advanced use case that will > probably never be seen, but we can allow it with virtually no drawback. > As I understand it Angel means that after peak picking a new mzML file, with only the picked peak in it, would be created. What Matt is suggesting is that additional metadata as picked peaks could be in the mzML file together with the raw data. >> Allowing the secondary data arrays to have a >> different length leaves the format open to user-defined craziness like >> this, and I think that's a good thing. Definitely you wouldn't want to >> define one of these structures for every data point if you're >> dealing of >> data with a decent amount of noise! >> >> > It comes down to giving the user some flexibility and not imposing > unnecessary rigidity in the schema. How much simpler does it really make > the schema to make ALL the arrays the same length? Not very much, I think I don't like the idea of having the flexibility to have user-defined craziness. It is easy to imagine two different pieces of software using this flexibility in different and incompatible ways. I would prefer the format of the binaryArrayData to be fully specified. Making it that much easier to create a reader for mzML. As I understand it the userParam is available for user-defined craziness, right? -- Regards Rune |
From: Marc S. <st...@in...> - 2008-02-13 09:38:08
|
Hi all, i like that idea of being able to annotate a small subset of the peaks in a spectrum. This is e.g. needed when assigning ion types for MS/MS: b1, b2, ..., y1, y2, ..., y7-H2O, ... Most of the peaks are simply noise and so only a minority of peaks will have an annotation. Using a full-sized array would be possible, but a waste of space. In my opinion, there should be a recommended way to do such a thing. What do you suggest? Before i forget: Is it possible to annotate peaks with strings? Otherwise we would have to use some kind of dictionary to assign ion type an integer index. -Marc |
From: Lennart M. <len...@eb...> - 2008-02-13 14:15:50
|
Hi Marc, > i like that idea of being able to annotate a small subset of the peaks > in a spectrum. > This is e.g. needed when assigning ion types for MS/MS: b1, b2, ..., y1, > y2, ..., y7-H2O, ... > Most of the peaks are simply noise and so only a minority of peaks will > have an annotation. > Using a full-sized array would be possible, but a waste of space. > In my opinion, there should be a recommended way to do such a thing. > What do you suggest? > > Before i forget: Is it possible to annotate peaks with strings? > Otherwise we would have to use some kind of dictionary to assign ion > type an integer index. The annotation of a mass spectrum with fragment ion types and indices presents a significant amount of processing of the original mass spec data, as well as a certain type of 'inference' (uncertainty, and often ambiguity!) that has nothing to do with the mass spectrometer, but relates to an identification algorithm of some description. As such, I don't think we want to annotate this information in mzML at all, or encourage people to do so. The scope of mzML should remain limited to the instrument output (with possibly some signal processing done by the instrument software). Fragment ion annotation should therefore be held elsewhere, and the PSI is actually creating analysisXML for the purpose of recording identification algorithm output (such as fragment ion assignment). analysisXML will link back to the mzML files used as input, and through this link, peak annotation can be extracted. Cheers, lnnrt. > > -Marc > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Marc S. <st...@in...> - 2008-02-13 14:24:18
|
>> i like that idea of being able to annotate a small subset of the peaks >> in a spectrum. >> This is e.g. needed when assigning ion types for MS/MS: b1, b2, ..., y1, >> y2, ..., y7-H2O, ... >> Most of the peaks are simply noise and so only a minority of peaks will >> have an annotation. >> Using a full-sized array would be possible, but a waste of space. >> In my opinion, there should be a recommended way to do such a thing. >> What do you suggest? >> >> Before i forget: Is it possible to annotate peaks with strings? >> Otherwise we would have to use some kind of dictionary to assign ion >> type an integer index. >> > > The annotation of a mass spectrum with fragment ion types and indices > presents a significant amount of processing of the original mass spec > data, as well as a certain type of 'inference' (uncertainty, and often > ambiguity!) that has nothing to do with the mass spectrometer, but > relates to an identification algorithm of some description. > > As such, I don't think we want to annotate this information in mzML at > all, or encourage people to do so. The scope of mzML should remain > limited to the instrument output (with possibly some signal processing > done by the instrument software). > > Fragment ion annotation should therefore be held elsewhere, and the PSI > is actually creating analysisXML for the purpose of recording > identification algorithm output (such as fragment ion assignment). > analysisXML will link back to the mzML files used as input, and through > this link, peak annotation can be extracted. > The fragment ion annotation was only an example. It's true that mzML is not the right place for it. But i still think that there should be a way to annotate a subset of the peaks with arbitrary data. I could imagine several usecases for such a feature. - Marc |
From: Matthew C. <mat...@va...> - 2008-02-13 16:12:18
|
It's true that identification output doesn't belong in mzML, but peak charge state assignments and isotope assignments (to name two examples) do not fall under that umbrella. Such annotation does belong in the mzML IMO, either in the same file or in a new one, it doesn't really matter. And such advanced annotation is unlikely to be available for every peak (much less every data point for profile data!). I fail to see the harm of allowing the length attribute of binaryDataArrays to be optional, and if not present for a given binaryDataArray, readers would be instructed to treat it the same as the required length attribute (given as an attribute on the corresponding spectrum element). As for how this will allow for user-defined craziness, "userParam" does already allow for that, but binary data cannot be encoded in a userParam to my knowledge. -Matt Lennart Martens wrote: > Hi Marc, > > > >> i like that idea of being able to annotate a small subset of the peaks >> in a spectrum. >> This is e.g. needed when assigning ion types for MS/MS: b1, b2, ..., y1, >> y2, ..., y7-H2O, ... >> Most of the peaks are simply noise and so only a minority of peaks will >> have an annotation. >> Using a full-sized array would be possible, but a waste of space. >> In my opinion, there should be a recommended way to do such a thing. >> What do you suggest? >> >> Before i forget: Is it possible to annotate peaks with strings? >> Otherwise we would have to use some kind of dictionary to assign ion >> type an integer index. >> > > The annotation of a mass spectrum with fragment ion types and indices > presents a significant amount of processing of the original mass spec > data, as well as a certain type of 'inference' (uncertainty, and often > ambiguity!) that has nothing to do with the mass spectrometer, but > relates to an identification algorithm of some description. > > As such, I don't think we want to annotate this information in mzML at > all, or encourage people to do so. The scope of mzML should remain > limited to the instrument output (with possibly some signal processing > done by the instrument software). > > Fragment ion annotation should therefore be held elsewhere, and the PSI > is actually creating analysisXML for the purpose of recording > identification algorithm output (such as fragment ion assignment). > analysisXML will link back to the mzML files used as input, and through > this link, peak annotation can be extracted. > > > Cheers, > > lnnrt. > >> -Marc >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |