From: Fredrik L. <Fre...@im...> - 2008-02-13 19:12:45
|
As Eric concluded, a problem with arrays of different lengths is that you would normally want pairs (or higher) of data, i.e. an m/z and charge state pair. This would require two m/z arrays in the set if there would be one set of m/z-intensity pairs and another set of different length with m/z-charge state. Using the current schema structure it would not be possible to determine which m/z array belong to which other array. OK, you could identify pairs by looking at the arrayLength of the different arrays and use that for pairing, but it seems suboptimal to me. Also, if the spectrum element represents a list of picked peaks I think you would have charge assignments for all the peaks, even if some would be zero or another dummy value if the assignment failed. If the spectrum element represents a profile spectrum I cannot see the use for a set of binary arrays of different lengths. By definition the spectrum has to be either profile or centroid (peak list), so there shouldn't be a mixture of profile / centroid data in one spectrum. So, I also vote for binary arrays of the same length for a spectrum. Fredrik ----- Original Message ----- From: Matthew Chambers <mat...@va...> Date: Wednesday, February 13, 2008 5:12 pm Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > It's true that identification output doesn't belong in mzML, but > peak > charge state assignments and isotope assignments (to name two > examples) > do not fall under that umbrella. Such annotation does belong in the > mzML > IMO, either in the same file or in a new one, it doesn't really > matter. > And such advanced annotation is unlikely to be available for every > peak > (much less every data point for profile data!). I fail to see the > harm > of allowing the length attribute of binaryDataArrays to be > optional, and > if not present for a given binaryDataArray, readers would be > instructed > to treat it the same as the required length attribute (given as an > attribute on the corresponding spectrum element). As for how this > will > allow for user-defined craziness, "userParam" does already allow > for > that, but binary data cannot be encoded in a userParam to my > knowledge. > -Matt > > > Lennart Martens wrote: > > Hi Marc, > > > > > > > >> i like that idea of being able to annotate a small subset of the > peaks > >> in a spectrum. > >> This is e.g. needed when assigning ion types for MS/MS: b1, b2, > ..., y1, > >> y2, ..., y7-H2O, ... > >> Most of the peaks are simply noise and so only a minority of > peaks will > >> have an annotation. > >> Using a full-sized array would be possible, but a waste of space. > >> In my opinion, there should be a recommended way to do such a > thing. > >> What do you suggest? > >> > >> Before i forget: Is it possible to annotate peaks with strings? > >> Otherwise we would have to use some kind of dictionary to assign > ion > >> type an integer index. > >> > > > > The annotation of a mass spectrum with fragment ion types and > indices > > presents a significant amount of processing of the original mass > spec > > data, as well as a certain type of 'inference' (uncertainty, and > often > > ambiguity!) that has nothing to do with the mass spectrometer, > but > > relates to an identification algorithm of some description. > > > > As such, I don't think we want to annotate this information in > mzML at > > all, or encourage people to do so. The scope of mzML should > remain > > limited to the instrument output (with possibly some signal > processing > > done by the instrument software). > > > > Fragment ion annotation should therefore be held elsewhere, and > the PSI > > is actually creating analysisXML for the purpose of recording > > identification algorithm output (such as fragment ion > assignment). > > analysisXML will link back to the mzML files used as input, and > through > > this link, peak annotation can be extracted. > > > > > > Cheers, > > > > lnnrt. > > > >> -Marc > >> > >> ----------------------------------------------------------------- > -------- > >> This SF.net email is sponsored by: Microsoft > >> Defy all challenges. Microsoft(R) Visual Studio 2008. > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> _______________________________________________ > >> Psidev-ms-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >> > > > > > > ------------------------------------------------------------------ > ------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > -------------------------------------------------------------------- > ----- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Eric D. <ede...@sy...> - 2008-02-19 01:04:30
|
Hi everyone, thank you for the lively discussion on this arrayLength topic. I have summarized the discussion as this: - Eric proposed putting arrayLength= attr in <spectrum> and gives example - Angel agrees - Brian disagrees (or maybe just points out that this formatting goes against our intent as perceived by him?) - Matt proposes primayArrayLength or mzIntArrayLength and allow other arrays of different length in addtiion to the primary one - Angel asks for rational examples of non-same-length arrays - Eric pleads for keeping it simple - Matt proposes some complex multi-dim data: charge assignments, isotope number, peak label - Angel is not impressed with Matt's examples - Matt defends his complex use case although it will "probably never be seen" - Rune tries to dissuade such "user-defined craziness" - Marc likes the idea of being able to annotate a subset of peaks "b6, y6, y7-H2O" - Rune offers that arrayLength= could be under <spectrumDescription> - Lennart counters that the annotations that Marc suggests do not belong in mzML since they are interpretation, not raw mass spec output - Marc concedes that mzML is not the right place for the above example, but still likes the idea of being able to make such annotations - Matt still lobbies for allowing multiple parallel X-axes and corresponding Y-axes - Fredrik votes for keeping it simple with 1 fixed array size - Randy has no problem with "moving arrayLength up" [end of thread] I apologize that I have bluntly oversimplified the elegant arguments put forth, but I needed to see the whole discussion in a series of one-liners. I tally up the votes like this: In favor of Eric's proposal: 6 (Eric, Angel, Rune, Lennart, Fredrik, Randy) In favor of expanding the schema to handle multiple groups of arrays with different lengths between groups: 3 (Brian?, Matt, Marc) Have I inaccurately pegged anyone? Anyone else want to change their vote or weigh in anew? I suspect we may not all agree on what to do here and may just have to go with the majority. Speaking for only myself, I have not yet seen an example that I find compelling enough to complexify the schema to handle it. I can still envision one mzML file containing profile spectra and a second mzML file after peak picking that contains the centroided peaks along with a charge array and even an isotope number for each picked peak, with some agreed-upon, documented NULL value within the array for missing information. This is already fully supported in mzML and not something that is even possible in mzXML and mzData, so we're already extending the capability in a clear but simple way IMHO. Well? > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Fredrik Levander > Sent: Wednesday, February 13, 2008 11:13 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > As Eric concluded, a problem with arrays of different lengths is that you > would normally want pairs (or higher) of data, i.e. an m/z and charge > state pair. This would require two m/z arrays in the set if there would be > one set of m/z-intensity pairs and another set of different length with > m/z-charge state. Using the current schema structure it would not be > possible to determine which m/z array belong to which other array. OK, you > could identify pairs by looking at the arrayLength of the different arrays > and use that for pairing, but it seems suboptimal to me. > Also, if the spectrum element represents a list of picked peaks I think > you would have charge assignments for all the peaks, even if some would be > zero or another dummy value if the assignment failed. > If the spectrum element represents a profile spectrum I cannot see the use > for a set of binary arrays of different lengths. By definition the > spectrum has to be either profile or centroid (peak list), so there > shouldn't be a mixture of profile / centroid data in one spectrum. > > So, I also vote for binary arrays of the same length for a spectrum. > > Fredrik > > ----- Original Message ----- > From: Matthew Chambers <mat...@va...> > Date: Wednesday, February 13, 2008 5:12 pm > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > It's true that identification output doesn't belong in mzML, but > > peak > > charge state assignments and isotope assignments (to name two > > examples) > > do not fall under that umbrella. Such annotation does belong in the > > mzML > > IMO, either in the same file or in a new one, it doesn't really > > matter. > > And such advanced annotation is unlikely to be available for every > > peak > > (much less every data point for profile data!). I fail to see the > > harm > > of allowing the length attribute of binaryDataArrays to be > > optional, and > > if not present for a given binaryDataArray, readers would be > > instructed > > to treat it the same as the required length attribute (given as an > > attribute on the corresponding spectrum element). As for how this > > will > > allow for user-defined craziness, "userParam" does already allow > > for > > that, but binary data cannot be encoded in a userParam to my > > knowledge. > > -Matt > > > > > > Lennart Martens wrote: > > > Hi Marc, > > > > > > > > > > > >> i like that idea of being able to annotate a small subset of the > > peaks > > >> in a spectrum. > > >> This is e.g. needed when assigning ion types for MS/MS: b1, b2, > > ..., y1, > > >> y2, ..., y7-H2O, ... > > >> Most of the peaks are simply noise and so only a minority of > > peaks will > > >> have an annotation. > > >> Using a full-sized array would be possible, but a waste of space. > > >> In my opinion, there should be a recommended way to do such a > > thing. > > >> What do you suggest? > > >> > > >> Before i forget: Is it possible to annotate peaks with strings? > > >> Otherwise we would have to use some kind of dictionary to assign > > ion > > >> type an integer index. > > >> > > > > > > The annotation of a mass spectrum with fragment ion types and > > indices > > > presents a significant amount of processing of the original mass > > spec > > > data, as well as a certain type of 'inference' (uncertainty, and > > often > > > ambiguity!) that has nothing to do with the mass spectrometer, > > but > > > relates to an identification algorithm of some description. > > > > > > As such, I don't think we want to annotate this information in > > mzML at > > > all, or encourage people to do so. The scope of mzML should > > remain > > > limited to the instrument output (with possibly some signal > > processing > > > done by the instrument software). > > > > > > Fragment ion annotation should therefore be held elsewhere, and > > the PSI > > > is actually creating analysisXML for the purpose of recording > > > identification algorithm output (such as fragment ion > > assignment). > > > analysisXML will link back to the mzML files used as input, and > > through > > > this link, peak annotation can be extracted. > > > > > > > > > Cheers, > > > > > > lnnrt. > > > > > >> -Marc > > >> > > >> ----------------------------------------------------------------- > > -------- > > >> This SF.net email is sponsored by: Microsoft > > >> Defy all challenges. Microsoft(R) Visual Studio 2008. > > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > >> _______________________________________________ > > >> Psidev-ms-dev mailing list > > >> Psi...@li... > > >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > >> > > > > > > > > > ------------------------------------------------------------------ > > ------- > > > This SF.net email is sponsored by: Microsoft > > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > > _______________________________________________ > > > Psidev-ms-dev mailing list > > > Psi...@li... > > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > > > > > -------------------------------------------------------------------- > > ----- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Matt C. <mat...@va...> - 2008-02-19 02:21:20
|
Hi Eric, Your summary looks fine to me. Let me clarify my proposal though: 1) add arrayLength attribute as a global in the spectrum hierarchy, and specify in the spec that the m/z and intensity arrays must be that length 2) make arrayLength attribute on binaryDataArray optional, and specify in the spec that the attribute can be used to override the global arrayLength attribute (but it would be semantically invalid to do so for the m/z or intensity array) This is far less complicated than other parts of the schema, like specifying scans vs. acquisitions, so I'm confident implementors would be able to grasp it. Implementors who wish to keep things simple and only care about the primary data arrays can simply always ignore a binaryArrayData's arrayLength attribute along with the extra arrays. Just implementing support for the extra arrays would be complicated, regardless of whether or not they are allowed to be different in length. -Matt Eric Deutsch wrote: > Hi everyone, thank you for the lively discussion on this arrayLength > topic. I have summarized the discussion as this: > > - Eric proposed putting arrayLength= attr in <spectrum> and gives > example > - Angel agrees > - Brian disagrees (or maybe just points out that this formatting goes > against our intent as perceived by him?) > - Matt proposes primayArrayLength or mzIntArrayLength and allow other > arrays of different length in addtiion to the primary one > - Angel asks for rational examples of non-same-length arrays > - Eric pleads for keeping it simple > - Matt proposes some complex multi-dim data: charge assignments, isotope > number, peak label > - Angel is not impressed with Matt's examples > - Matt defends his complex use case although it will "probably never be > seen" > - Rune tries to dissuade such "user-defined craziness" > - Marc likes the idea of being able to annotate a subset of peaks "b6, > y6, y7-H2O" > - Rune offers that arrayLength= could be under <spectrumDescription> > - Lennart counters that the annotations that Marc suggests do not belong > in mzML since they are interpretation, not raw mass spec output > - Marc concedes that mzML is not the right place for the above example, > but still likes the idea of being able to make such annotations > - Matt still lobbies for allowing multiple parallel X-axes and > corresponding Y-axes > - Fredrik votes for keeping it simple with 1 fixed array size > - Randy has no problem with "moving arrayLength up" > [end of thread] > > I apologize that I have bluntly oversimplified the elegant arguments put > forth, but I needed to see the whole discussion in a series of > one-liners. > > I tally up the votes like this: > > In favor of Eric's proposal: 6 (Eric, Angel, Rune, Lennart, Fredrik, > Randy) > In favor of expanding the schema to handle multiple groups of arrays > with > different lengths between groups: 3 (Brian?, Matt, Marc) > > Have I inaccurately pegged anyone? Anyone else want to change their vote > or weigh in anew? I suspect we may not all agree on what to do here and > may just have to go with the majority. > > Speaking for only myself, I have not yet seen an example that I find > compelling enough to complexify the schema to handle it. I can still > envision one mzML file containing profile spectra and a second mzML file > after peak picking that contains the centroided peaks along with a > charge array and even an isotope number for each picked peak, with some > agreed-upon, documented NULL value within the array for missing > information. This is already fully supported in mzML and not something > that is even possible in mzXML and mzData, so we're already extending > the capability in a clear but simple way IMHO. > > Well? > > > > >> -----Original Message----- >> From: psi...@li... >> > [mailto:psidev-ms-dev- > >> bo...@li...] On Behalf Of Fredrik Levander >> Sent: Wednesday, February 13, 2008 11:13 AM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] binaryArrayData lengths >> >> As Eric concluded, a problem with arrays of different lengths is that >> > you > >> would normally want pairs (or higher) of data, i.e. an m/z and charge >> state pair. This would require two m/z arrays in the set if there >> > would be > >> one set of m/z-intensity pairs and another set of different length >> > with > >> m/z-charge state. Using the current schema structure it would not be >> possible to determine which m/z array belong to which other array. OK, >> > you > >> could identify pairs by looking at the arrayLength of the different >> > arrays > >> and use that for pairing, but it seems suboptimal to me. >> Also, if the spectrum element represents a list of picked peaks I >> > think > >> you would have charge assignments for all the peaks, even if some >> > would be > >> zero or another dummy value if the assignment failed. >> If the spectrum element represents a profile spectrum I cannot see the >> > use > >> for a set of binary arrays of different lengths. By definition the >> spectrum has to be either profile or centroid (peak list), so there >> shouldn't be a mixture of profile / centroid data in one spectrum. >> >> So, I also vote for binary arrays of the same length for a spectrum. >> >> Fredrik >> >> ----- Original Message ----- >> From: Matthew Chambers <mat...@va...> >> Date: Wednesday, February 13, 2008 5:12 pm >> Subject: Re: [Psidev-ms-dev] binaryArrayData lengths >> >> >>> It's true that identification output doesn't belong in mzML, but >>> peak >>> charge state assignments and isotope assignments (to name two >>> examples) >>> do not fall under that umbrella. Such annotation does belong in the >>> mzML >>> IMO, either in the same file or in a new one, it doesn't really >>> matter. >>> And such advanced annotation is unlikely to be available for every >>> peak >>> (much less every data point for profile data!). I fail to see the >>> harm >>> of allowing the length attribute of binaryDataArrays to be >>> optional, and >>> if not present for a given binaryDataArray, readers would be >>> instructed >>> to treat it the same as the required length attribute (given as an >>> attribute on the corresponding spectrum element). As for how this >>> will >>> allow for user-defined craziness, "userParam" does already allow >>> for >>> that, but binary data cannot be encoded in a userParam to my >>> knowledge. >>> -Matt >>> >>> >>> Lennart Martens wrote: >>> >>>> Hi Marc, >>>> >>>> >>>> >>>> >>>>> i like that idea of being able to annotate a small subset of the >>>>> >>> peaks >>> >>>>> in a spectrum. >>>>> This is e.g. needed when assigning ion types for MS/MS: b1, b2, >>>>> >>> ..., y1, >>> >>>>> y2, ..., y7-H2O, ... >>>>> Most of the peaks are simply noise and so only a minority of >>>>> >>> peaks will >>> >>>>> have an annotation. >>>>> Using a full-sized array would be possible, but a waste of space. >>>>> In my opinion, there should be a recommended way to do such a >>>>> >>> thing. >>> >>>>> What do you suggest? >>>>> >>>>> Before i forget: Is it possible to annotate peaks with strings? >>>>> Otherwise we would have to use some kind of dictionary to assign >>>>> >>> ion >>> >>>>> type an integer index. >>>>> >>>>> >>>> The annotation of a mass spectrum with fragment ion types and >>>> >>> indices >>> >>>> presents a significant amount of processing of the original mass >>>> >>> spec >>> >>>> data, as well as a certain type of 'inference' (uncertainty, and >>>> >>> often >>> >>>> ambiguity!) that has nothing to do with the mass spectrometer, >>>> >>> but >>> >>>> relates to an identification algorithm of some description. >>>> >>>> As such, I don't think we want to annotate this information in >>>> >>> mzML at >>> >>>> all, or encourage people to do so. The scope of mzML should >>>> >>> remain >>> >>>> limited to the instrument output (with possibly some signal >>>> >>> processing >>> >>>> done by the instrument software). >>>> >>>> Fragment ion annotation should therefore be held elsewhere, and >>>> >>> the PSI >>> >>>> is actually creating analysisXML for the purpose of recording >>>> identification algorithm output (such as fragment ion >>>> >>> assignment). >>> >>>> analysisXML will link back to the mzML files used as input, and >>>> >>> through >>> >>>> this link, peak annotation can be extracted. >>>> >>>> >>>> Cheers, >>>> >>>> lnnrt. >>>> >>>> >>>>> -Marc >>>>> > |