You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
|
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
(1) |
Aug
(5) |
Sep
|
Oct
(5) |
Nov
(1) |
Dec
(2) |
2005 |
Jan
(2) |
Feb
(5) |
Mar
|
Apr
(1) |
May
(5) |
Jun
(2) |
Jul
(3) |
Aug
(7) |
Sep
(18) |
Oct
(22) |
Nov
(10) |
Dec
(15) |
2006 |
Jan
(15) |
Feb
(8) |
Mar
(16) |
Apr
(8) |
May
(2) |
Jun
(5) |
Jul
(3) |
Aug
(1) |
Sep
(34) |
Oct
(21) |
Nov
(14) |
Dec
(2) |
2007 |
Jan
|
Feb
(17) |
Mar
(10) |
Apr
(25) |
May
(11) |
Jun
(30) |
Jul
(1) |
Aug
(38) |
Sep
|
Oct
(119) |
Nov
(18) |
Dec
(3) |
2008 |
Jan
(34) |
Feb
(202) |
Mar
(57) |
Apr
(76) |
May
(44) |
Jun
(33) |
Jul
(33) |
Aug
(32) |
Sep
(41) |
Oct
(49) |
Nov
(84) |
Dec
(216) |
2009 |
Jan
(102) |
Feb
(126) |
Mar
(112) |
Apr
(26) |
May
(91) |
Jun
(54) |
Jul
(39) |
Aug
(29) |
Sep
(16) |
Oct
(18) |
Nov
(12) |
Dec
(23) |
2010 |
Jan
(29) |
Feb
(7) |
Mar
(11) |
Apr
(22) |
May
(9) |
Jun
(13) |
Jul
(7) |
Aug
(10) |
Sep
(9) |
Oct
(20) |
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
(4) |
Mar
(27) |
Apr
(15) |
May
(23) |
Jun
(13) |
Jul
(15) |
Aug
(11) |
Sep
(23) |
Oct
(18) |
Nov
(10) |
Dec
(7) |
2012 |
Jan
(23) |
Feb
(19) |
Mar
(7) |
Apr
(20) |
May
(16) |
Jun
(4) |
Jul
(6) |
Aug
(6) |
Sep
(14) |
Oct
(16) |
Nov
(31) |
Dec
(23) |
2013 |
Jan
(14) |
Feb
(19) |
Mar
(7) |
Apr
(25) |
May
(8) |
Jun
(5) |
Jul
(5) |
Aug
(6) |
Sep
(20) |
Oct
(19) |
Nov
(10) |
Dec
(12) |
2014 |
Jan
(6) |
Feb
(15) |
Mar
(6) |
Apr
(4) |
May
(16) |
Jun
(6) |
Jul
(4) |
Aug
(2) |
Sep
(3) |
Oct
(3) |
Nov
(7) |
Dec
(3) |
2015 |
Jan
(3) |
Feb
(8) |
Mar
(14) |
Apr
(3) |
May
(17) |
Jun
(9) |
Jul
(4) |
Aug
(2) |
Sep
|
Oct
(13) |
Nov
|
Dec
(6) |
2016 |
Jan
(8) |
Feb
(1) |
Mar
(20) |
Apr
(16) |
May
(11) |
Jun
(6) |
Jul
(5) |
Aug
|
Sep
(2) |
Oct
(5) |
Nov
(7) |
Dec
(2) |
2017 |
Jan
(10) |
Feb
(3) |
Mar
(17) |
Apr
(7) |
May
(5) |
Jun
(11) |
Jul
(4) |
Aug
(12) |
Sep
(9) |
Oct
(7) |
Nov
(2) |
Dec
(4) |
2018 |
Jan
(7) |
Feb
(2) |
Mar
(5) |
Apr
(6) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(1) |
Sep
(9) |
Oct
(5) |
Nov
(3) |
Dec
(5) |
2019 |
Jan
(10) |
Feb
|
Mar
(4) |
Apr
(4) |
May
(2) |
Jun
(8) |
Jul
(2) |
Aug
(2) |
Sep
|
Oct
(2) |
Nov
(9) |
Dec
(1) |
2020 |
Jan
(3) |
Feb
(1) |
Mar
(2) |
Apr
|
May
(3) |
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(1) |
2021 |
Jan
|
Feb
|
Mar
|
Apr
(5) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Kessner, D. E. <Dar...@cs...> - 2008-02-14 00:15:53
|
Hi all, Please excuse me if this has been discussed before. In mzXML, the <software> element is encoded as follows: <software type="acquisition" name="Xcalibur" version="1.3 alpha 8"/> In mzML, we have: <software id="Xcalibur"> <softwareParam cvLabel="MS" accession="MS:1000532" name="Xcalibur" version="2.0.5"/> </software> Note that the name and version are encodable, but there is no convenient place to save the "type" attribute, since the <software> element does not have <cvParam> or <userParam> sub-elements. Currently this causes a loss of this info when converting mzXML->mzML. If there is a good place to put this attribute, the conversion mzXML -> mzML -> mzXML will be doable with no loss of information. (I think this is the last missing piece). Or perhaps it's no great loss? I don't have a strong attachment to this attribute -- just thought it would be nice to be able to get the same mzXML after converting to and from mzML... Thoughts? Darren Darren Kessner Scientific Programmer Dar...@cs... 310-423-9538 Spielberg Family Center for Applied Proteomics Cedars-Sinai Medical Center http://www.sfcap.cshs.org/ IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is STRICTLY PROHIBITED. If you have received this message in error, please notify us immediately by calling (310) 423-6428 and destroy the related message. Thank You for your cooperation. |
From: Randy J. <rkj...@in...> - 2008-02-13 21:27:18
|
Eric, I am in favor of 'B', I think mzML validity and MIAPE compliance are two different things. If a journal won't let you publish your old convertered .dta files because you cannot remember the instrument doesn't mean you can't continue to search them internal to your group or share them with another group. Randy -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Eric Deutsch Sent: Tuesday, February 12, 2008 2:52 PM To: Mass spectrometry standard development Cc: Eric Deutsch Subject: Re: [Psidev-ms-dev] Some additional 'Unknown instrument' CVparameter thoughts Hi everyone, I'm trying to see if we can get to some consensus on some of these ongoing threads. Regarding the "unknown instrument" problem, I think there has been some confusion, so let me see if I can clarify and ask for a final round of opinions. I agree with Fredrik's comments below that his examples below are *not* what is intended. Here is what I believe Lennart intended: A) <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" value=""/> Or the other alternative is to create a term for unknown: B) <cvParam cvLabel="MS" accession="MS:1099931" name="unknown instrument model" value=""/> (where the number is obviously made up by me right now, but would be in the CV) So those are the choices. Putting something in the value attribute is not an option as Fredrik concludes below. Benefits of A) - No need to litter the CV with "xxx unknown" terms - Happenstance very easy for the existing validator software to accommodate - Somewhat counterintuitive and thus dissuades laziness Drawbacks of A) - Somewhat counterintuitive and awkward Benefits of B) - Very intuitive and straightforward: the concept of what instrument generated these spectra is captured by the concept "sorry, I just don't know which instrument it was" Drawbacks of B) - Opens the door to perhaps needing to sprinkle other unknowns in the CV - Is a little more inviting to users to be lazy and claim they don't know, when with a little more effort they could find out and report properly (because "unknown" is not an *obvious* option) - Would require more development in the validator to properly handle a special term like this. Based on the feedback I saw so far, Lennart, Luisa and Angel like A. Matt seemed more in favor of B. No clear reads on others. I myself prefer B. To me it feels like A is a convenient but counterintuitive trick to working around the problem. B feels like the right solution even if it facilitates laziness. I don't think that will be a big problem. I'm sure we can come up with some syntax for the validator to permit or disallow "ambiguity terms" as desired. So, what say ye? > From: psi...@li... [mailto:psidev-ms-dev- > > Hi Lennart, Josh, Matt and others, > > If the top level term is allowed it will be possible to define not only > instrument value='unknown', but also instruments that are not in the CV > by putting something in the value field: > <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" > value="The new mass spec not in CV"/> > <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" > value="unknown"/> > Instead of the intended: > <cvParam cvLabel="MS" accession="MS:1000189" name="q-tof ultima" > value=""/> > I'm not so sure that this is wanted. Especially since unknown could be > written as 'not known', 'not specified' etcetera. It make sense to have > a CV term for 'unknown', but it would be quite a few 'unknown' terms to > add to the CV to get one for each required category in the mzML > schema...At some places it would be enough with just 'unknown' > (source,detector etc), but at other places it must be specified what is > unknown! > > Anyway, I am still for usage of top level elements :-) , see line 16 at: > http://trac.thep.lu.se/trac/fp6- > prodac/browser/trunk/mzML/FF_070504_MSMS_5B.mzML > > cheers > > Fredrik > > Joshua Tasman skrev: > > I'm with Matt on this one, and like his solution. There are > unfortunately lots of real use cases (combining dta, mgfs) where the > information will really be unknown, and we should accurately represent the > lack of information. If it's not too much effort to add a little more > code to the validator, I would much prefer the accurate addition of an > "unknown" term. There has been so much effort getting the CV and document > to line up with reality, it looks very strange to me to force this > ontological 'hack' by allowing the category to appear as a value, as Matt > has said. > > > > Josh > > > > > > Matthew Chambers wrote: > > > >> Lennart Martens wrote: > >> > >>> Hi Matt, and Colleagues, > >>> > >>> > >>> > >>> > >>>> I don't really prefer one to the other very much, but I don't see how > >>>> the parent term would be easier to validate ("all but X children of a > >>>> term" doesn't make sense to me, do you mean "all children of a term > >>>> except X"?) > >>>> > >>>> > >>> You are right; I provided bad shorthand for: 'all children of a term, > >>> except X (and Y, and Z, ... -- potentially). > >>> > >>> The reason why it it is easier to validate is due to the way the > >>> validator mapping file is designed, e.g. (example verbatim from > current > >>> 0.99.1 mapping file): > >>> > >>> <CvTerm termAccession="MS:1000031" useTerm="false" > >>> termName="instrument model" isRepeatable="false" > >>> scope="/mzML/instrumentList/instrument" allowChildren="true" > >>> cvIdentifier="MS"></CvTerm> > >>> > >>> this means that although all children of term 'MS:1000031 -- > instrument > >>> model' are allowed (allowChildren="true"), the term itself is not > >>> allowed (useTerm="false"). By flipping this latter boolean, we can > allow > >>> the parent term, thus separating between MIAPE requirements (current > >>> configuration) and the 'usable mzML requirements' (flipped boolean as > >>> explained above) -- for the instrument model at least. > >>> > >>> > >> OK, so it's an implementation thing. That's fine. > >> > >> > >>>> What about data converted from DTAs or MGFs > >>>> where the user doesn't even remember (or never knew) what kind of > >>>> instrument it came from? > >>>> > >>>> > >>> When the instrument is really unknown (which is unfortunate and > >>> constitutes dramatic metadata loss whichever way you look at it), the > >>> proposed scenario (usage of toplevel term) provides solace. For all > >>> other scenarios (where an incentive to adapt convertor software or > >>> report the development of a new instrument is concerned), the relative > >>> obscurity of the 'fix' might contribute to 'going the extra mile' > >>> (upgrading the convertor, mailing in the new instrument name). > >>> > >>> > >> While the toplevel term does provide some solace, it is obscure enough > >> that a casual user might look at it and think that something was wrong > >> because it does not intuitively make sense for the category to appear > as > >> a value. What about this alternative: provide an "unknown instrument" > >> term with a unique accession #, but make the term name something like > >> "unknown (instrument type not specified or not in CV)". That would be > >> intuitive but still eye-catching (and it would be the eye-catching part > >> that implementors would want to minimize, because it makes them look > >> bad). ;) > >> > >> -Matt > >> > >> ----------------------------------------------------------------------- > -- > >> This SF.net email is sponsored by: Microsoft > >> Defy all challenges. Microsoft(R) Visual Studio 2008. > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> _______________________________________________ > >> Psidev-ms-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >> > > > > ------------------------------------------------------------------------ > - > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Randy J. <rkj...@in...> - 2008-02-13 21:27:17
|
I also agree that there is no problem with moving the array length up. I would like to ask again: 1. is scanNumber different from <scan> which lives lower 2. can we make msLevel either optional, or a cvParam? Randy ________________________________ From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: Tuesday, February 12, 2008 4:34 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] binaryArrayData lengths +1 agreed -angel On Feb 12, 2008 4:27 PM, Eric Deutsch <ede...@sy...> wrote: So there seems to be broad consensus (4 for 4;) that moving the arrayLength up a little higher is a good idea. So instead of: <spectrum id="S19" scanNumber="19" msLevel="1"> <spectrumDescription> ... </spectrumDescription> <binaryDataArray arrayLength="1313" encodedLength="5433" dataProcessingRef="Xcalibur Processing"> ... <binary>AAAAwDsGeUAAAAD...</binary> </binaryDataArray> <binaryDataArray arrayLength="1313" encodedLength="4892"> ... <binary>AAAAAIBJxk...</binary> </binaryDataArray> </spectrum> We will have: !!!!!!!!!!!!!!!!!! <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> <spectrumDescription> ... </spectrumDescription> <binaryDataArray encodedLength="5433" dataProcessingRef="Xcalibur Processing"> ... <binary>AAAAwDsGeUAAAAD...</binary> </binaryDataArray> <binaryDataArray encodedLength="4892"> ... <binary>AAAAAIBJxk...</binary> </binaryDataArray> </spectrum> Agreed? > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Matthew Chambers > Sent: Wednesday, February 06, 2008 10:49 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > I agree that the primary data arrays should probably be treated as > special in the schema so it's clear that they are paired values and thus > peak count could move into the spectrum element or spectrumDescription. > There should still be options to have additional arrays that aren't the > same as the main arrays (for example, an additional set of arrays, one > for a subset of the m/zs and the other for peak charge information). > > -Matt > > > Kessner, Darren E. wrote: > > Any other comments regarding <binaryArrayData> lengths? > > > > > >> (from Rune) > >> If they have to be equal size, then > >> that size ought to be specified in the spectrumDescription. > >> > > > > I agree -- I would like to encode the length in <spectrum> somewhere > > (either attribute or cvParam) so that: > > 1) it's clear that the arrays are of equal size > > 2) Readers don't have to peek into the attributes of the first > > <binaryArrayData> to get the info > > > > I need this right now for the MSData RAMP adapter code, so I'll encode > > it as a <userParam> until a decision has been made on the specification. > > > > > > Darren > > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Angel P. <an...@ma...> - 2008-02-13 19:33:18
|
quick validation error, the xml namespace has a different version than target schema spec, still @ 0.99.1 is: xmlns:dx="http://psi.hupo.org/schema_revision/mzML_0.99.1" should be: xmlns:dx="http://psi.hupo.org/schema_revision/mzML_0.99.9" -angel On Feb 13, 2008 9:28 AM, Lennart Martens <len...@eb...> wrote: > Dear PSI-MS enthousiasts, > > > I have started to update the mzML schema as per the progress we're making. > > As the basis schema, I have used the reformatted version helpfully > contributed by Phil Jones and Richard Cote (I believe no one objects > against a schema form that facilitates code generation). > > I will post all (incremental!) updates I make to the schema as we go > along, and these will be called: > > YYYYMMdd_mzML0.99.9_SNAPSHOT.xsd > > And the subject line will look like the one in this mail, with the end > bit indicating the latest change. > > A download link to the latest schema version will also be provided, > current one is: > > http://www.ebi.ac.uk/~lmartens/mzML/20080213_mzML0.99.9_SNAPSHOT.xsd<http://www.ebi.ac.uk/%7Elmartens/mzML/20080213_mzML0.99.9_SNAPSHOT.xsd> > > > This one has the binaryArrayData length as a Spectrum element attribute, > and slightly revised documentation accordingly. > > > Cheers, > > lnnrt. > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Fredrik L. <Fre...@im...> - 2008-02-13 19:12:45
|
As Eric concluded, a problem with arrays of different lengths is that you would normally want pairs (or higher) of data, i.e. an m/z and charge state pair. This would require two m/z arrays in the set if there would be one set of m/z-intensity pairs and another set of different length with m/z-charge state. Using the current schema structure it would not be possible to determine which m/z array belong to which other array. OK, you could identify pairs by looking at the arrayLength of the different arrays and use that for pairing, but it seems suboptimal to me. Also, if the spectrum element represents a list of picked peaks I think you would have charge assignments for all the peaks, even if some would be zero or another dummy value if the assignment failed. If the spectrum element represents a profile spectrum I cannot see the use for a set of binary arrays of different lengths. By definition the spectrum has to be either profile or centroid (peak list), so there shouldn't be a mixture of profile / centroid data in one spectrum. So, I also vote for binary arrays of the same length for a spectrum. Fredrik ----- Original Message ----- From: Matthew Chambers <mat...@va...> Date: Wednesday, February 13, 2008 5:12 pm Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > It's true that identification output doesn't belong in mzML, but > peak > charge state assignments and isotope assignments (to name two > examples) > do not fall under that umbrella. Such annotation does belong in the > mzML > IMO, either in the same file or in a new one, it doesn't really > matter. > And such advanced annotation is unlikely to be available for every > peak > (much less every data point for profile data!). I fail to see the > harm > of allowing the length attribute of binaryDataArrays to be > optional, and > if not present for a given binaryDataArray, readers would be > instructed > to treat it the same as the required length attribute (given as an > attribute on the corresponding spectrum element). As for how this > will > allow for user-defined craziness, "userParam" does already allow > for > that, but binary data cannot be encoded in a userParam to my > knowledge. > -Matt > > > Lennart Martens wrote: > > Hi Marc, > > > > > > > >> i like that idea of being able to annotate a small subset of the > peaks > >> in a spectrum. > >> This is e.g. needed when assigning ion types for MS/MS: b1, b2, > ..., y1, > >> y2, ..., y7-H2O, ... > >> Most of the peaks are simply noise and so only a minority of > peaks will > >> have an annotation. > >> Using a full-sized array would be possible, but a waste of space. > >> In my opinion, there should be a recommended way to do such a > thing. > >> What do you suggest? > >> > >> Before i forget: Is it possible to annotate peaks with strings? > >> Otherwise we would have to use some kind of dictionary to assign > ion > >> type an integer index. > >> > > > > The annotation of a mass spectrum with fragment ion types and > indices > > presents a significant amount of processing of the original mass > spec > > data, as well as a certain type of 'inference' (uncertainty, and > often > > ambiguity!) that has nothing to do with the mass spectrometer, > but > > relates to an identification algorithm of some description. > > > > As such, I don't think we want to annotate this information in > mzML at > > all, or encourage people to do so. The scope of mzML should > remain > > limited to the instrument output (with possibly some signal > processing > > done by the instrument software). > > > > Fragment ion annotation should therefore be held elsewhere, and > the PSI > > is actually creating analysisXML for the purpose of recording > > identification algorithm output (such as fragment ion > assignment). > > analysisXML will link back to the mzML files used as input, and > through > > this link, peak annotation can be extracted. > > > > > > Cheers, > > > > lnnrt. > > > >> -Marc > >> > >> ----------------------------------------------------------------- > -------- > >> This SF.net email is sponsored by: Microsoft > >> Defy all challenges. Microsoft(R) Visual Studio 2008. > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> _______________________________________________ > >> Psidev-ms-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >> > > > > > > ------------------------------------------------------------------ > ------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > -------------------------------------------------------------------- > ----- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Fredrik L. <Fre...@el...> - 2008-02-13 17:25:33
|
As Eric concluded, a problem with arrays of different lengths is that you would normally want pairs of data, i.e. an m/z and charge state pair. This would require two m/z arrays in the set. OK, you could identify pairs by looking at the arrayLength of the different arrays and use that for pairing, but it seems suboptimal to me. Also, if the spectrum element represents a list of picked peaks I think you would have charge assignments for all the peaks, even if some would be zero or another dummy value if the assignment failed. So, I also vote for binary arrays of the same length for a spectrum. Fredrik 13 feb 2008 kl. 17.12 skrev Matthew Chambers: > It's true that identification output doesn't belong in mzML, but peak > charge state assignments and isotope assignments (to name two > examples) > do not fall under that umbrella. Such annotation does belong in the > mzML > IMO, either in the same file or in a new one, it doesn't really > matter. > And such advanced annotation is unlikely to be available for every > peak > (much less every data point for profile data!). I fail to see the harm > of allowing the length attribute of binaryDataArrays to be optional, > and > if not present for a given binaryDataArray, readers would be > instructed > to treat it the same as the required length attribute (given as an > attribute on the corresponding spectrum element). As for how this will > allow for user-defined craziness, "userParam" does already allow for > that, but binary data cannot be encoded in a userParam to my > knowledge. > > -Matt > > > Lennart Martens wrote: >> Hi Marc, >> >> >> >>> i like that idea of being able to annotate a small subset of the >>> peaks >>> in a spectrum. >>> This is e.g. needed when assigning ion types for MS/MS: b1, >>> b2, ..., y1, >>> y2, ..., y7-H2O, ... >>> Most of the peaks are simply noise and so only a minority of peaks >>> will >>> have an annotation. >>> Using a full-sized array would be possible, but a waste of space. >>> In my opinion, there should be a recommended way to do such a thing. >>> What do you suggest? >>> >>> Before i forget: Is it possible to annotate peaks with strings? >>> Otherwise we would have to use some kind of dictionary to assign ion >>> type an integer index. >>> >> >> The annotation of a mass spectrum with fragment ion types and indices >> presents a significant amount of processing of the original mass spec >> data, as well as a certain type of 'inference' (uncertainty, and >> often >> ambiguity!) that has nothing to do with the mass spectrometer, but >> relates to an identification algorithm of some description. >> >> As such, I don't think we want to annotate this information in mzML >> at >> all, or encourage people to do so. The scope of mzML should remain >> limited to the instrument output (with possibly some signal >> processing >> done by the instrument software). >> >> Fragment ion annotation should therefore be held elsewhere, and the >> PSI >> is actually creating analysisXML for the purpose of recording >> identification algorithm output (such as fragment ion assignment). >> analysisXML will link back to the mzML files used as input, and >> through >> this link, peak annotation can be extracted. >> >> >> Cheers, >> >> lnnrt. >> >>> -Marc >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Matthew C. <mat...@va...> - 2008-02-13 16:12:18
|
It's true that identification output doesn't belong in mzML, but peak charge state assignments and isotope assignments (to name two examples) do not fall under that umbrella. Such annotation does belong in the mzML IMO, either in the same file or in a new one, it doesn't really matter. And such advanced annotation is unlikely to be available for every peak (much less every data point for profile data!). I fail to see the harm of allowing the length attribute of binaryDataArrays to be optional, and if not present for a given binaryDataArray, readers would be instructed to treat it the same as the required length attribute (given as an attribute on the corresponding spectrum element). As for how this will allow for user-defined craziness, "userParam" does already allow for that, but binary data cannot be encoded in a userParam to my knowledge. -Matt Lennart Martens wrote: > Hi Marc, > > > >> i like that idea of being able to annotate a small subset of the peaks >> in a spectrum. >> This is e.g. needed when assigning ion types for MS/MS: b1, b2, ..., y1, >> y2, ..., y7-H2O, ... >> Most of the peaks are simply noise and so only a minority of peaks will >> have an annotation. >> Using a full-sized array would be possible, but a waste of space. >> In my opinion, there should be a recommended way to do such a thing. >> What do you suggest? >> >> Before i forget: Is it possible to annotate peaks with strings? >> Otherwise we would have to use some kind of dictionary to assign ion >> type an integer index. >> > > The annotation of a mass spectrum with fragment ion types and indices > presents a significant amount of processing of the original mass spec > data, as well as a certain type of 'inference' (uncertainty, and often > ambiguity!) that has nothing to do with the mass spectrometer, but > relates to an identification algorithm of some description. > > As such, I don't think we want to annotate this information in mzML at > all, or encourage people to do so. The scope of mzML should remain > limited to the instrument output (with possibly some signal processing > done by the instrument software). > > Fragment ion annotation should therefore be held elsewhere, and the PSI > is actually creating analysisXML for the purpose of recording > identification algorithm output (such as fragment ion assignment). > analysisXML will link back to the mzML files used as input, and through > this link, peak annotation can be extracted. > > > Cheers, > > lnnrt. > >> -Marc >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |
From: Lennart M. <len...@eb...> - 2008-02-13 14:34:43
|
Hi Darren, Great job! Well spotted! I'm sure there's more of these silly little things in there, so please, if any of you have twenty minutes to spare, have a quick read through the CV to see if you can spot some of these glitches. As they're so easy to fix, but so difficult to spot, the cost-benefit of everyone's twenty minutes will be enormous. I'll keep track of all emails about glitches in the CV from now on as well. Cheers, lnnrt. Darren Kessner wrote: > In psi-ms.obo: > > exact_synonym: "peak processing" > occurs in multiple terms. > > I assume it's a copy/paste error. > > > [Term] > id: MS:1000035 > name: peak picking > def: ... > exact_synonym: "peak processing" [] > is_a: MS:1000543 ! data processing action > > [Term] > id: MS:1000592 > name: smoothing > def: ... > exact_synonym: "peak processing" [] > is_a: MS:1000543 ! data processing action > > [Term] > id: MS:1000593 > name: baseline reduction > def: ... > exact_synonym: "peak processing" [] > is_a: MS:1000543 ! data processing action |
From: Lennart M. <len...@eb...> - 2008-02-13 14:28:23
|
Dear PSI-MS enthousiasts, I have started to update the mzML schema as per the progress we're making. As the basis schema, I have used the reformatted version helpfully contributed by Phil Jones and Richard Cote (I believe no one objects against a schema form that facilitates code generation). I will post all (incremental!) updates I make to the schema as we go along, and these will be called: YYYYMMdd_mzML0.99.9_SNAPSHOT.xsd And the subject line will look like the one in this mail, with the end bit indicating the latest change. A download link to the latest schema version will also be provided, current one is: http://www.ebi.ac.uk/~lmartens/mzML/20080213_mzML0.99.9_SNAPSHOT.xsd This one has the binaryArrayData length as a Spectrum element attribute, and slightly revised documentation accordingly. Cheers, lnnrt. |
From: Marc S. <st...@in...> - 2008-02-13 14:24:18
|
>> i like that idea of being able to annotate a small subset of the peaks >> in a spectrum. >> This is e.g. needed when assigning ion types for MS/MS: b1, b2, ..., y1, >> y2, ..., y7-H2O, ... >> Most of the peaks are simply noise and so only a minority of peaks will >> have an annotation. >> Using a full-sized array would be possible, but a waste of space. >> In my opinion, there should be a recommended way to do such a thing. >> What do you suggest? >> >> Before i forget: Is it possible to annotate peaks with strings? >> Otherwise we would have to use some kind of dictionary to assign ion >> type an integer index. >> > > The annotation of a mass spectrum with fragment ion types and indices > presents a significant amount of processing of the original mass spec > data, as well as a certain type of 'inference' (uncertainty, and often > ambiguity!) that has nothing to do with the mass spectrometer, but > relates to an identification algorithm of some description. > > As such, I don't think we want to annotate this information in mzML at > all, or encourage people to do so. The scope of mzML should remain > limited to the instrument output (with possibly some signal processing > done by the instrument software). > > Fragment ion annotation should therefore be held elsewhere, and the PSI > is actually creating analysisXML for the purpose of recording > identification algorithm output (such as fragment ion assignment). > analysisXML will link back to the mzML files used as input, and through > this link, peak annotation can be extracted. > The fragment ion annotation was only an example. It's true that mzML is not the right place for it. But i still think that there should be a way to annotate a subset of the peaks with arbitrary data. I could imagine several usecases for such a feature. - Marc |
From: Lennart M. <len...@eb...> - 2008-02-13 14:15:50
|
Hi Marc, > i like that idea of being able to annotate a small subset of the peaks > in a spectrum. > This is e.g. needed when assigning ion types for MS/MS: b1, b2, ..., y1, > y2, ..., y7-H2O, ... > Most of the peaks are simply noise and so only a minority of peaks will > have an annotation. > Using a full-sized array would be possible, but a waste of space. > In my opinion, there should be a recommended way to do such a thing. > What do you suggest? > > Before i forget: Is it possible to annotate peaks with strings? > Otherwise we would have to use some kind of dictionary to assign ion > type an integer index. The annotation of a mass spectrum with fragment ion types and indices presents a significant amount of processing of the original mass spec data, as well as a certain type of 'inference' (uncertainty, and often ambiguity!) that has nothing to do with the mass spectrometer, but relates to an identification algorithm of some description. As such, I don't think we want to annotate this information in mzML at all, or encourage people to do so. The scope of mzML should remain limited to the instrument output (with possibly some signal processing done by the instrument software). Fragment ion annotation should therefore be held elsewhere, and the PSI is actually creating analysisXML for the purpose of recording identification algorithm output (such as fragment ion assignment). analysisXML will link back to the mzML files used as input, and through this link, peak annotation can be extracted. Cheers, lnnrt. > > -Marc > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Darren K. <Dar...@cs...> - 2008-02-13 13:54:44
|
In psi-ms.obo: exact_synonym: "peak processing" occurs in multiple terms. I assume it's a copy/paste error. [Term] id: MS:1000035 name: peak picking def: ... exact_synonym: "peak processing" [] is_a: MS:1000543 ! data processing action [Term] id: MS:1000592 name: smoothing def: ... exact_synonym: "peak processing" [] is_a: MS:1000543 ! data processing action [Term] id: MS:1000593 name: baseline reduction def: ... exact_synonym: "peak processing" [] is_a: MS:1000543 ! data processing action |
From: Darren K. <Dar...@cs...> - 2008-02-13 13:48:23
|
Got it -- that works. Thanks, Eric. Darren On Tue, 2008-02-12 at 14:07 -0800, Eric Deutsch wrote: > > From: psi...@li... > [mailto:psidev-ms-dev- > > > > Hi all, > > > > Just wanted a clarification on the encoding of the instrument > > manufacturer. > > > > In the example tiny*.mzML, we have: > > > > <instrument> > > <cvParam cvLabel="MS" accession="MS:1000554" name="LCQ Deca" > value=""/> > > ... > > </instrument> > > > > In the CV we have the following branch: > > - "instrument description" > > - "model by vendor" > > - "Thermo Fisher Scientific" > > - "Thermo Finnigan" > > - "LCQ Deca" > > This is the way it was many months ago. But with the release of 0.99.1 > in November, it became: > > - "instrument" > -(has) "instrument model" > -(is) "Thermo Fisher Scientific instrument model" > -(is) "Thermo Finnigan instrument model" > -(is) "LCQ Deca" > > Please insure that you're using the latest CV from the dev web page. It > hasn't changed since November, but quite a few things changed with that > release. We'll be adding some minor things soon, too (including some of > your requests) > > > > Is this the intended procedure for determining the instrument > > manufacturer? > > 1) look for a cvParam child of "model by vendor" <find "LCQ Deca"> > > 2) walk up the branch until you get to the immediate child of "model > by > > vendor" <walk back to "Thermo Fisher Scientific"> > > I guess I would imagine the following logic: > getVendor("MS:1000554") // "LCQ Deca" > would: > - getTermParent("MS:1000554") > - regexp s/ instrument model// > > > Or do we want to encode the manufacturer as a separate CV term in the > > <instrument> element? > > I think we decided that separately encoding models and vendors was > unnecessarily intricate. We do not have a concept for > manufacturer/vendor. However, the models are organized in a predictable > way in the CV to allow the above regexp logic. > > > One other thing, the Thermo tree looks like: > > - "Thermo Fisher Scientific" > > - "Finnigan MAT" > > - some instruments > > - "Thermo Electron" > > - one instrument > > - "Thermo Finnigan" > > - some instruments > > - "Thermo Scientific" > > - more instruments > > > > Perhaps this tree should be flattened like the other vendors CV trees? > > Perhaps. I have no special attachment to the current layout if it seems > overly burdensome. The general idea was that since Thermo* has evolved > considerably, it made sense to categorize the instruments by the most > recent entity name that manufactured them and then lump all those under > the most recent umbrella company name, which may likely change over time > given past history. I think the reasoning was that "Thermo Scientific > never really made an LCQ; that was a different older company." Maybe > we're splitting hairs here or maybe this seems like a reasonable way to > build in some scheme to gracefully handle the case when, say, two > totally different vendors merge a couple years from now. I'm happy with > the way it is now, but don't feel super strongly about it. > > Eric > > > > > > > > > > > > > Darren > > > > > > > > > > > > > > > > Darren Kessner > > > > Scientific Programmer > > > > Dar...@cs... > > > > 310-423-9538 > > > > > > > > Spielberg Family Center for Applied Proteomics > > > > Cedars-Sinai Medical Center > > > > http://www.sfcap.cshs.org/ > > > > > > > > > > > > IMPORTANT WARNING: This message is intended for the use of the person > or > > entity to which it is addressed and may contain information that is > > privileged and confidential, the disclosure of which is governed by > > applicable law. If the reader of this message is not the intended > > recipient, or the employee or agent responsible for delivering it to > the > > intended recipient, you are hereby notified that any dissemination, > > distribution or copying of this information is STRICTLY PROHIBITED. > > > > If you have received this message in error, please notify us > immediately > > by calling (310) 423-6428 and destroy the related message. Thank You > for > > your cooperation. > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Rune S. P. <mai...@ph...> - 2008-02-13 10:37:50
|
Hello all Eric Deutsch wrote: > <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> > <spectrumDescription> > ... > </spectrumDescription> > The information could be an element under <spectrumDescription> -- Regards Rune |
From: Rune S. P. <ru...@ph...> - 2008-02-13 10:29:07
|
Hello all Eric Deutsch wrote: > <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> > <spectrumDescription> > ... > </spectrumDescription> > The information could be an element under <spectrumDescription> -- Regards Rune |
From: Marc S. <st...@in...> - 2008-02-13 09:38:08
|
Hi all, i like that idea of being able to annotate a small subset of the peaks in a spectrum. This is e.g. needed when assigning ion types for MS/MS: b1, b2, ..., y1, y2, ..., y7-H2O, ... Most of the peaks are simply noise and so only a minority of peaks will have an annotation. Using a full-sized array would be possible, but a waste of space. In my opinion, there should be a recommended way to do such a thing. What do you suggest? Before i forget: Is it possible to annotate peaks with strings? Otherwise we would have to use some kind of dictionary to assign ion type an integer index. -Marc |
From: Rune S. P. <mai...@ph...> - 2008-02-13 09:12:39
|
Hello See comments below. Matt Chambers wrote: > Angel Pizarro wrote: > >> On Feb 12, 2008 8:06 PM, Matt Chambers wrote: >> >> It's reasonable that a user of the format would want to store >> structured >> information for a limited number of peaks (or store a variable >> number of >> values in one field, e.g. multidimensional array) so the binary data >> might be laid out in a user-defined pattern: >> m/z (same precision as main array) >> >> >> errr, I don't get what you mean here. Does this mean that you ran a >> peak detection alg and have a much reduced set of data points? If so >> this is a new mzML file from the one prior to peak detection. >> > No, I mean that complex, multibyte metadata for data points may only be > available for, say, 10% of the total data points (even after peak > picking). It would be silly to require the user to store a 19+ byte > struct for every peak. Yes, this is a very advanced use case that will > probably never be seen, but we can allow it with virtually no drawback. > As I understand it Angel means that after peak picking a new mzML file, with only the picked peak in it, would be created. What Matt is suggesting is that additional metadata as picked peaks could be in the mzML file together with the raw data. >> Allowing the secondary data arrays to have a >> different length leaves the format open to user-defined craziness like >> this, and I think that's a good thing. Definitely you wouldn't want to >> define one of these structures for every data point if you're >> dealing of >> data with a decent amount of noise! >> >> > It comes down to giving the user some flexibility and not imposing > unnecessary rigidity in the schema. How much simpler does it really make > the schema to make ALL the arrays the same length? Not very much, I think I don't like the idea of having the flexibility to have user-defined craziness. It is easy to imagine two different pieces of software using this flexibility in different and incompatible ways. I would prefer the format of the binaryArrayData to be fully specified. Making it that much easier to create a reader for mzML. As I understand it the userParam is available for user-defined craziness, right? -- Regards Rune |
From: Marc S. <st...@in...> - 2008-02-13 08:27:52
|
Hi all, if the vote is still up, i vote for B) too! - Marc |
From: Matt C. <mat...@va...> - 2008-02-13 04:43:36
|
Hi Angel, Angel Pizarro wrote: > On Feb 12, 2008 8:06 PM, Matt Chambers wrote: > > It's reasonable that a user of the format would want to store > structured > information for a limited number of peaks (or store a variable > number of > values in one field, e.g. multidimensional array) so the binary data > might be laid out in a user-defined pattern: > m/z (same precision as main array) > > > errr, I don't get what you mean here. Does this mean that you ran a > peak detection alg and have a much reduced set of data points? If so > this is a new mzML file from the one prior to peak detection. No, I mean that complex, multibyte metadata for data points may only be available for, say, 10% of the total data points (even after peak picking). It would be silly to require the user to store a 19+ byte struct for every peak. Yes, this is a very advanced use case that will probably never be seen, but we can allow it with virtually no drawback. > count of charge assignments (2 bytes) > > > 1 per m/z? again # of array indexes the same as above I think you misunderstood this count. It is the "N" in the following series (represents an array of charge assignments for this peak, just like more than one charge can be assigned to a precursor): > > charge assignment 1 ... charge assignment N (2 bytes each) > > > can be encoded as a separate array for each charge with 0/1 > > > isotope profile ID (4 bytes; unique in a spectrum) > > > I am ignorant of what this is ;) Just something I made up to take up space. I imagined some isotope profile/envelop detection algorithm running on a file and then annotating the discovered isotope profiles in this structure. That information does not fit in a 1:1 relationship (although it could be rearranged to have one ID per peak as a single array, which would meet your desires, and then infer a peak's isotope number by the number of times that ID had been seen in the array). > > > isotope number of peak (2 bytes; monoisotope=0) > > > Also don;t have a clue about this. > > peak label (variable length, 0 terminated) > > > meh.. May be long encode length, but will have the same # of elements > as above, if indeed you were refering to some peak detection alg. > producing the m/z array above. If only a few peaks had labels, most of the peaks would have a single 0 in the label array? As you said, that would be wasteful. > > Allowing the secondary data arrays to have a > different length leaves the format open to user-defined craziness like > this, and I think that's a good thing. Definitely you wouldn't want to > define one of these structures for every data point if you're > dealing of > data with a decent amount of noise! > It comes down to giving the user some flexibility and not imposing unnecessary rigidity in the schema. How much simpler does it really make the schema to make ALL the arrays the same length? Not very much, I think. > > -Matt > > > Angel Pizarro wrote: > > On Feb 12, 2008 5:46 PM, Brian Pratt <bri...@in... > <mailto:bri...@in...> > > <mailto:bri...@in... > <mailto:bri...@in...>>> wrote: > > > > I think that's not quite right - arrayLength needs to remain an > > attribute of > > BinaryDataArray since not all BinaryDataArray elements in a > > spectrum will > > necessarily contain the same number of entries as an mz or > > intensity array, > > > > > > Such as .... ? I asked for examples of this and never got a reply. > > > > Related question, if they are different lengths how would you go > about > > assigning a value to a particular index (or set of indexes) that the > > value refers to in another binary array? > > > > My point is that it would be infinitely easier to just repeat > values > > (MRM transitions values, retention time values, or even nil values) > > that pertain to more than one index so you always have a 1:1 > > correspondence across arrays. Of course I am making the assumption > > that all binary arrays within a single spectrum element are > related to > > each other in some manner, so if this does not hold true, please > > someone tell me, as I am fairly ignorant on the mass spec > acquisition > > modes. > > > > The alternative representation would be coordinate systems and > > multidimensional data arrays /a la/ netcdf or HDF5, but we are too > > far along the route that we have laid out to even consider a change > > this radical. BTW, I did do some mzData (v 1.05) to netcdf > conversion > > and the netcdf files are even bigger, at a gain of built in > index into > > the data arrays within and across spectra. > > > > -angel > > > > > > since not all BinaryDataArray elements are guaranteed (as I > > understand mzML, > > which is but dimly) to be mz or intensity. You'll need to write > > it again as > > an attribute of spectrum, something like mzintPairsCount if you > > don't like > > PeaksCount. > > > > -----Original Message----- > > From: psi...@li... > <mailto:psi...@li...> > > <mailto:psi...@li... > <mailto:psi...@li...>> > > [mailto:psi...@li... > <mailto:psi...@li...> > > <mailto:psi...@li... > <mailto:psi...@li...>>] On Behalf Of > > Eric > > Deutsch > > Sent: Tuesday, February 12, 2008 1:28 PM > > To: Mass spectrometry standard development > > Cc: Eric Deutsch > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > > > So there seems to be broad consensus (4 for 4;) that moving the > > arrayLength up a little higher is a good idea. So instead of: > > > > <spectrum id="S19" scanNumber="19" msLevel="1"> > > <spectrumDescription> > > ... > > </spectrumDescription> > > <binaryDataArray arrayLength="1313" encodedLength="5433" > > dataProcessingRef="Xcalibur Processing"> > > ... > > <binary>AAAAwDsGeUAAAAD...</binary> > > </binaryDataArray> > > <binaryDataArray arrayLength="1313" encodedLength="4892"> > > ... > > <binary>AAAAAIBJxk...</binary> > > </binaryDataArray> > > </spectrum> > > > > We will have: > !!!!!!!!!!!!!!!!!! > > > > <spectrum id="S19" scanNumber="19" msLevel="1" > arrayLength="1313"> > > <spectrumDescription> > > ... > > </spectrumDescription> > > <binaryDataArray encodedLength="5433" > > dataProcessingRef="Xcalibur Processing"> > > ... > > <binary>AAAAwDsGeUAAAAD...</binary> > > </binaryDataArray> > > <binaryDataArray encodedLength="4892"> > > ... > > <binary>AAAAAIBJxk...</binary> > > </binaryDataArray> > > </spectrum> > > > > > > Agreed? > > > > > > > > > -----Original Message----- > > > From: psi...@li... > <mailto:psi...@li...> > > <mailto:psi...@li... > <mailto:psi...@li...>> > > [mailto:psidev-ms-dev- <mailto:psidev-ms-dev-> > <mailto:psidev-ms-dev- <mailto:psidev-ms-dev->> > > > bo...@li... > <mailto:bo...@li...> > > <mailto:bo...@li... > <mailto:bo...@li...>>] On Behalf Of Matthew Chambers > > > Sent: Wednesday, February 06, 2008 10:49 AM > > > To: Mass spectrometry standard development > > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > > > I agree that the primary data arrays should probably be > treated as > > > special in the schema so it's clear that they are paired > values and > > thus > > > peak count could move into the spectrum element or > > spectrumDescription. > > > There should still be options to have additional arrays > that aren't > > the > > > same as the main arrays (for example, an additional set of > > arrays, one > > > for a subset of the m/zs and the other for peak charge > information). > > > > > > -Matt > > > > > > > > > Kessner, Darren E. wrote: > > > > Any other comments regarding <binaryArrayData> lengths? > > > > > > > > > > > >> (from Rune) > > > >> If they have to be equal size, then > > > >> that size ought to be specified in the spectrumDescription. > > > >> > > > > > > > > I agree -- I would like to encode the length in <spectrum> > > somewhere > > > > (either attribute or cvParam) so that: > > > > 1) it's clear that the arrays are of equal size > > > > 2) Readers don't have to peek into the attributes of the > first > > > > <binaryArrayData> to get the info > > > > > > > > I need this right now for the MSData RAMP adapter code, > so I'll > > encode > > > > it as a <userParam> until a decision has been made on the > > specification. > > > > > > > > > > > > Darren > > > > > > > > > > |
From: Angel P. <an...@ma...> - 2008-02-13 02:41:11
|
On Feb 12, 2008 8:06 PM, Matt Chambers <mat...@va...> wrote: > It's reasonable that a user of the format would want to store structured > information for a limited number of peaks (or store a variable number of > values in one field, e.g. multidimensional array) so the binary data > might be laid out in a user-defined pattern: > m/z (same precision as main array) > errr, I don't get what you mean here. Does this mean that you ran a peak detection alg and have a much reduced set of data points? If so this is a new mzML file from the one prior to peak detection. count of charge assignments (2 bytes) > 1 per m/z? again # of array indexes the same as above charge assignment 1 ... charge assignment N (2 bytes each) can be encoded as a separate array for each charge with 0/1 > > isotope profile ID (4 bytes; unique in a spectrum) > I am ignorant of what this is ;) > isotope number of peak (2 bytes; monoisotope=0) > Also don;t have a clue about this. > peak label (variable length, 0 terminated) > meh.. May be long encode length, but will have the same # of elements as above, if indeed you were refering to some peak detection alg. producing the m/z array above. > > This structure would still be binary and base64 encoded, so netcdf et > al. are not necessary. Sorry to not be clear. I was not putting forth a use of netcdf , just that they have an alternate binary storage scheme that mzML that is coordinate based, hence the n-dimensional data can indeed have different lengths per axis. Allowing the secondary data arrays to have a > different length leaves the format open to user-defined craziness like > this, and I think that's a good thing. Definitely you wouldn't want to > define one of these structures for every data point if you're dealing of > data with a decent amount of noise! > > -Matt > > > Angel Pizarro wrote: > > On Feb 12, 2008 5:46 PM, Brian Pratt <bri...@in... > > <mailto:bri...@in...>> wrote: > > > > I think that's not quite right - arrayLength needs to remain an > > attribute of > > BinaryDataArray since not all BinaryDataArray elements in a > > spectrum will > > necessarily contain the same number of entries as an mz or > > intensity array, > > > > > > Such as .... ? I asked for examples of this and never got a reply. > > > > Related question, if they are different lengths how would you go about > > assigning a value to a particular index (or set of indexes) that the > > value refers to in another binary array? > > > > My point is that it would be infinitely easier to just repeat values > > (MRM transitions values, retention time values, or even nil values) > > that pertain to more than one index so you always have a 1:1 > > correspondence across arrays. Of course I am making the assumption > > that all binary arrays within a single spectrum element are related to > > each other in some manner, so if this does not hold true, please > > someone tell me, as I am fairly ignorant on the mass spec acquisition > > modes. > > > > The alternative representation would be coordinate systems and > > multidimensional data arrays /a la/ netcdf or HDF5, but we are too > > far along the route that we have laid out to even consider a change > > this radical. BTW, I did do some mzData (v 1.05) to netcdf conversion > > and the netcdf files are even bigger, at a gain of built in index into > > the data arrays within and across spectra. > > > > -angel > > > > > > since not all BinaryDataArray elements are guaranteed (as I > > understand mzML, > > which is but dimly) to be mz or intensity. You'll need to write > > it again as > > an attribute of spectrum, something like mzintPairsCount if you > > don't like > > PeaksCount. > > > > -----Original Message----- > > From: psi...@li... > > <mailto:psi...@li...> > > [mailto:psi...@li... > > <mailto:psi...@li...>] On Behalf Of > > Eric > > Deutsch > > Sent: Tuesday, February 12, 2008 1:28 PM > > To: Mass spectrometry standard development > > Cc: Eric Deutsch > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > > > So there seems to be broad consensus (4 for 4;) that moving the > > arrayLength up a little higher is a good idea. So instead of: > > > > <spectrum id="S19" scanNumber="19" msLevel="1"> > > <spectrumDescription> > > ... > > </spectrumDescription> > > <binaryDataArray arrayLength="1313" encodedLength="5433" > > dataProcessingRef="Xcalibur Processing"> > > ... > > <binary>AAAAwDsGeUAAAAD...</binary> > > </binaryDataArray> > > <binaryDataArray arrayLength="1313" encodedLength="4892"> > > ... > > <binary>AAAAAIBJxk...</binary> > > </binaryDataArray> > > </spectrum> > > > > We will have: !!!!!!!!!!!!!!!!!! > > > > <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> > > <spectrumDescription> > > ... > > </spectrumDescription> > > <binaryDataArray encodedLength="5433" > > dataProcessingRef="Xcalibur Processing"> > > ... > > <binary>AAAAwDsGeUAAAAD...</binary> > > </binaryDataArray> > > <binaryDataArray encodedLength="4892"> > > ... > > <binary>AAAAAIBJxk...</binary> > > </binaryDataArray> > > </spectrum> > > > > > > Agreed? > > > > > > > > > -----Original Message----- > > > From: psi...@li... > > <mailto:psi...@li...> > > [mailto:psidev-ms-dev- <mailto:psidev-ms-dev-> > > > bo...@li... > > <mailto:bo...@li...>] On Behalf Of Matthew > Chambers > > > Sent: Wednesday, February 06, 2008 10:49 AM > > > To: Mass spectrometry standard development > > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > > > I agree that the primary data arrays should probably be treated as > > > special in the schema so it's clear that they are paired values > and > > thus > > > peak count could move into the spectrum element or > > spectrumDescription. > > > There should still be options to have additional arrays that > aren't > > the > > > same as the main arrays (for example, an additional set of > > arrays, one > > > for a subset of the m/zs and the other for peak charge > information). > > > > > > -Matt > > > > > > > > > Kessner, Darren E. wrote: > > > > Any other comments regarding <binaryArrayData> lengths? > > > > > > > > > > > >> (from Rune) > > > >> If they have to be equal size, then > > > >> that size ought to be specified in the spectrumDescription. > > > >> > > > > > > > > I agree -- I would like to encode the length in <spectrum> > > somewhere > > > > (either attribute or cvParam) so that: > > > > 1) it's clear that the arrays are of equal size > > > > 2) Readers don't have to peek into the attributes of the first > > > > <binaryArrayData> to get the info > > > > > > > > I need this right now for the MSData RAMP adapter code, so I'll > > encode > > > > it as a <userParam> until a decision has been made on the > > specification. > > > > > > > > > > > > Darren > > > > > > > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Kessner, D. E. <Dar...@cs...> - 2008-02-13 01:23:13
|
I'd like to vote for B also. I think the meaning is clearer to the human reader, both in the mzML, and in the code that reads it. Darren -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Joshua Tasman Sent: Tuesday, February 12, 2008 2:16 PM To: Mass spectrometry standard development Cc: Eric Deutsch Subject: Re: [Psidev-ms-dev] Some additional 'Unknown instrument' CV parameter thoughts Hi Eric, You missed my strong vote for "B)". -Josh Eric Deutsch wrote: > Hi everyone, I'm trying to see if we can get to some consensus on some > of these ongoing threads. Regarding the "unknown instrument" problem, I > think there has been some confusion, so let me see if I can clarify and > ask for a final round of opinions. I agree with Fredrik's comments > below that his examples below are *not* what is intended. Here is what I > believe Lennart intended: > > A) > <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" > value=""/> > > Or the other alternative is to create a term for unknown: > > B) > <cvParam cvLabel="MS" accession="MS:1099931" name="unknown instrument > model" value=""/> > (where the number is obviously made up by me right now, but would be in > the CV) > > So those are the choices. Putting something in the value attribute is > not an option as Fredrik concludes below. > > Benefits of A) > - No need to litter the CV with "xxx unknown" terms > - Happenstance very easy for the existing validator software to > accommodate > - Somewhat counterintuitive and thus dissuades laziness > Drawbacks of A) > - Somewhat counterintuitive and awkward > > Benefits of B) > - Very intuitive and straightforward: the concept of what instrument > generated these spectra is captured by the concept "sorry, I just don't > know which instrument it was" > Drawbacks of B) > - Opens the door to perhaps needing to sprinkle other unknowns in the CV > - Is a little more inviting to users to be lazy and claim they don't > know, when with a little more effort they could find out and report > properly (because "unknown" is not an *obvious* option) > - Would require more development in the validator to properly handle a > special term like this. > > Based on the feedback I saw so far, Lennart, Luisa and Angel like A. > Matt seemed more in favor of B. No clear reads on others. > > I myself prefer B. To me it feels like A is a convenient but > counterintuitive trick to working around the problem. B feels like the > right solution even if it facilitates laziness. I don't think that will > be a big problem. I'm sure we can come up with some syntax for the > validator to permit or disallow "ambiguity terms" as desired. > > So, what say ye? > > > > >> From: psi...@li... > [mailto:psidev-ms-dev- >> Hi Lennart, Josh, Matt and others, >> >> If the top level term is allowed it will be possible to define not > only >> instrument value='unknown', but also instruments that are not in the > CV >> by putting something in the value field: >> <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" >> value="The new mass spec not in CV"/> >> <cvParam cvLabel="MS" accession="MS:1000031" name="instrument model" >> value="unknown"/> >> Instead of the intended: >> <cvParam cvLabel="MS" accession="MS:1000189" name="q-tof ultima" >> value=""/> >> I'm not so sure that this is wanted. Especially since unknown could be >> written as 'not known', 'not specified' etcetera. It make sense to > have >> a CV term for 'unknown', but it would be quite a few 'unknown' terms > to >> add to the CV to get one for each required category in the mzML >> schema...At some places it would be enough with just 'unknown' >> (source,detector etc), but at other places it must be specified what > is >> unknown! >> >> Anyway, I am still for usage of top level elements :-) , see line 16 > at: >> http://trac.thep.lu.se/trac/fp6- >> prodac/browser/trunk/mzML/FF_070504_MSMS_5B.mzML >> >> cheers >> >> Fredrik >> >> Joshua Tasman skrev: >>> I'm with Matt on this one, and like his solution. There are >> unfortunately lots of real use cases (combining dta, mgfs) where the >> information will really be unknown, and we should accurately represent > the >> lack of information. If it's not too much effort to add a little more >> code to the validator, I would much prefer the accurate addition of an >> "unknown" term. There has been so much effort getting the CV and > document >> to line up with reality, it looks very strange to me to force this >> ontological 'hack' by allowing the category to appear as a value, as > Matt >> has said. >>> Josh >>> >>> >>> Matthew Chambers wrote: >>> >>>> Lennart Martens wrote: >>>> >>>>> Hi Matt, and Colleagues, >>>>> >>>>> >>>>> >>>>> >>>>>> I don't really prefer one to the other very much, but I don't see > how >>>>>> the parent term would be easier to validate ("all but X children > of a >>>>>> term" doesn't make sense to me, do you mean "all children of a > term >>>>>> except X"?) >>>>>> >>>>>> >>>>> You are right; I provided bad shorthand for: 'all children of a > term, >>>>> except X (and Y, and Z, ... -- potentially). >>>>> >>>>> The reason why it it is easier to validate is due to the way the >>>>> validator mapping file is designed, e.g. (example verbatim from >> current >>>>> 0.99.1 mapping file): >>>>> >>>>> <CvTerm termAccession="MS:1000031" useTerm="false" >>>>> termName="instrument model" isRepeatable="false" >>>>> scope="/mzML/instrumentList/instrument" allowChildren="true" >>>>> cvIdentifier="MS"></CvTerm> >>>>> >>>>> this means that although all children of term 'MS:1000031 -- >> instrument >>>>> model' are allowed (allowChildren="true"), the term itself is not >>>>> allowed (useTerm="false"). By flipping this latter boolean, we can >> allow >>>>> the parent term, thus separating between MIAPE requirements > (current >>>>> configuration) and the 'usable mzML requirements' (flipped boolean > as >>>>> explained above) -- for the instrument model at least. >>>>> >>>>> >>>> OK, so it's an implementation thing. That's fine. >>>> >>>> >>>>>> What about data converted from DTAs or MGFs >>>>>> where the user doesn't even remember (or never knew) what kind of >>>>>> instrument it came from? >>>>>> >>>>>> >>>>> When the instrument is really unknown (which is unfortunate and >>>>> constitutes dramatic metadata loss whichever way you look at it), > the >>>>> proposed scenario (usage of toplevel term) provides solace. For > all >>>>> other scenarios (where an incentive to adapt convertor software or >>>>> report the development of a new instrument is concerned), the > relative >>>>> obscurity of the 'fix' might contribute to 'going the extra mile' >>>>> (upgrading the convertor, mailing in the new instrument name). >>>>> >>>>> >>>> While the toplevel term does provide some solace, it is obscure > enough >>>> that a casual user might look at it and think that something was > wrong >>>> because it does not intuitively make sense for the category to > appear >> as >>>> a value. What about this alternative: provide an "unknown > instrument" >>>> term with a unique accession #, but make the term name something > like >>>> "unknown (instrument type not specified or not in CV)". That would > be >>>> intuitive but still eye-catching (and it would be the eye-catching > part >>>> that implementors would want to minimize, because it makes them > look >>>> bad). ;) >>>> >>>> -Matt >>>> >>>> > ----------------------------------------------------------------------- >> -- >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> Psidev-ms-dev mailing list >>>> Psi...@li... >>>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>>> >>> > ------------------------------------------------------------------------ >> - >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Psidev-ms-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >> > ------------------------------------------------------------------------ > - >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is STRICTLY PROHIBITED. If you have received this message in error, please notify us immediately by calling (310) 423-6428 and destroy the related message. Thank You for your cooperation. |
From: Matt C. <mat...@va...> - 2008-02-13 01:06:14
|
It's reasonable that a user of the format would want to store structured information for a limited number of peaks (or store a variable number of values in one field, e.g. multidimensional array) so the binary data might be laid out in a user-defined pattern: m/z (same precision as main array) count of charge assignments (2 bytes) charge assignment 1 ... charge assignment N (2 bytes each) isotope profile ID (4 bytes; unique in a spectrum) isotope number of peak (2 bytes; monoisotope=0) peak label (variable length, 0 terminated) This structure would still be binary and base64 encoded, so netcdf et al. are not necessary. Allowing the secondary data arrays to have a different length leaves the format open to user-defined craziness like this, and I think that's a good thing. Definitely you wouldn't want to define one of these structures for every data point if you're dealing of data with a decent amount of noise! -Matt Angel Pizarro wrote: > On Feb 12, 2008 5:46 PM, Brian Pratt <bri...@in... > <mailto:bri...@in...>> wrote: > > I think that's not quite right - arrayLength needs to remain an > attribute of > BinaryDataArray since not all BinaryDataArray elements in a > spectrum will > necessarily contain the same number of entries as an mz or > intensity array, > > > Such as .... ? I asked for examples of this and never got a reply. > > Related question, if they are different lengths how would you go about > assigning a value to a particular index (or set of indexes) that the > value refers to in another binary array? > > My point is that it would be infinitely easier to just repeat values > (MRM transitions values, retention time values, or even nil values) > that pertain to more than one index so you always have a 1:1 > correspondence across arrays. Of course I am making the assumption > that all binary arrays within a single spectrum element are related to > each other in some manner, so if this does not hold true, please > someone tell me, as I am fairly ignorant on the mass spec acquisition > modes. > > The alternative representation would be coordinate systems and > multidimensional data arrays /a la/ netcdf or HDF5, but we are too > far along the route that we have laid out to even consider a change > this radical. BTW, I did do some mzData (v 1.05) to netcdf conversion > and the netcdf files are even bigger, at a gain of built in index into > the data arrays within and across spectra. > > -angel > > > since not all BinaryDataArray elements are guaranteed (as I > understand mzML, > which is but dimly) to be mz or intensity. You'll need to write > it again as > an attribute of spectrum, something like mzintPairsCount if you > don't like > PeaksCount. > > -----Original Message----- > From: psi...@li... > <mailto:psi...@li...> > [mailto:psi...@li... > <mailto:psi...@li...>] On Behalf Of > Eric > Deutsch > Sent: Tuesday, February 12, 2008 1:28 PM > To: Mass spectrometry standard development > Cc: Eric Deutsch > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > So there seems to be broad consensus (4 for 4;) that moving the > arrayLength up a little higher is a good idea. So instead of: > > <spectrum id="S19" scanNumber="19" msLevel="1"> > <spectrumDescription> > ... > </spectrumDescription> > <binaryDataArray arrayLength="1313" encodedLength="5433" > dataProcessingRef="Xcalibur Processing"> > ... > <binary>AAAAwDsGeUAAAAD...</binary> > </binaryDataArray> > <binaryDataArray arrayLength="1313" encodedLength="4892"> > ... > <binary>AAAAAIBJxk...</binary> > </binaryDataArray> > </spectrum> > > We will have: !!!!!!!!!!!!!!!!!! > > <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> > <spectrumDescription> > ... > </spectrumDescription> > <binaryDataArray encodedLength="5433" > dataProcessingRef="Xcalibur Processing"> > ... > <binary>AAAAwDsGeUAAAAD...</binary> > </binaryDataArray> > <binaryDataArray encodedLength="4892"> > ... > <binary>AAAAAIBJxk...</binary> > </binaryDataArray> > </spectrum> > > > Agreed? > > > > > -----Original Message----- > > From: psi...@li... > <mailto:psi...@li...> > [mailto:psidev-ms-dev- <mailto:psidev-ms-dev-> > > bo...@li... > <mailto:bo...@li...>] On Behalf Of Matthew Chambers > > Sent: Wednesday, February 06, 2008 10:49 AM > > To: Mass spectrometry standard development > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > I agree that the primary data arrays should probably be treated as > > special in the schema so it's clear that they are paired values and > thus > > peak count could move into the spectrum element or > spectrumDescription. > > There should still be options to have additional arrays that aren't > the > > same as the main arrays (for example, an additional set of > arrays, one > > for a subset of the m/zs and the other for peak charge information). > > > > -Matt > > > > > > Kessner, Darren E. wrote: > > > Any other comments regarding <binaryArrayData> lengths? > > > > > > > > >> (from Rune) > > >> If they have to be equal size, then > > >> that size ought to be specified in the spectrumDescription. > > >> > > > > > > I agree -- I would like to encode the length in <spectrum> > somewhere > > > (either attribute or cvParam) so that: > > > 1) it's clear that the arrays are of equal size > > > 2) Readers don't have to peek into the attributes of the first > > > <binaryArrayData> to get the info > > > > > > I need this right now for the MSData RAMP adapter code, so I'll > encode > > > it as a <userParam> until a decision has been made on the > specification. > > > > > > > > > Darren > > > > > |
From: Eric D. <ede...@sy...> - 2008-02-13 01:06:08
|
I also think that opening the door to arrays of differing lengths is a complexity is best not done at this time. Any additional arrays that I can think of, S/N or charge or similar could all have the same number of elements. Yes, I suppose I could imagine having something like: m/z axis 1 intensity values corresponding to axis 1 m/z axis 2 for a subset of points in axis 1 charge values for axis 2 But I think that this is complexity that we have survived thus far without and we best leave out at this point. So, I propose we limit all binaryDataArrays lengths within one <spectrum> to be the same. Any dissent? Please provide a specific example of XML you'd like to see instead. Thanks, Eric ________________________________ From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: Tuesday, February 12, 2008 4:42 PM To: Mass spectrometry standard development Subject: Re: [Psidev-ms-dev] binaryArrayData lengths On Feb 12, 2008 5:46 PM, Brian Pratt <bri...@in...> wrote: I think that's not quite right - arrayLength needs to remain an attribute of BinaryDataArray since not all BinaryDataArray elements in a spectrum will necessarily contain the same number of entries as an mz or intensity array, Such as .... ? I asked for examples of this and never got a reply. Related question, if they are different lengths how would you go about assigning a value to a particular index (or set of indexes) that the value refers to in another binary array? My point is that it would be infinitely easier to just repeat values (MRM transitions values, retention time values, or even nil values) that pertain to more than one index so you always have a 1:1 correspondence across arrays. Of course I am making the assumption that all binary arrays within a single spectrum element are related to each other in some manner, so if this does not hold true, please someone tell me, as I am fairly ignorant on the mass spec acquisition modes. The alternative representation would be coordinate systems and multidimensional data arrays a la netcdf or HDF5, but we are too far along the route that we have laid out to even consider a change this radical. BTW, I did do some mzData (v 1.05) to netcdf conversion and the netcdf files are even bigger, at a gain of built in index into the data arrays within and across spectra. -angel since not all BinaryDataArray elements are guaranteed (as I understand mzML, which is but dimly) to be mz or intensity. You'll need to write it again as an attribute of spectrum, something like mzintPairsCount if you don't like PeaksCount. -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Eric Deutsch Sent: Tuesday, February 12, 2008 1:28 PM To: Mass spectrometry standard development Cc: Eric Deutsch Subject: Re: [Psidev-ms-dev] binaryArrayData lengths So there seems to be broad consensus (4 for 4;) that moving the arrayLength up a little higher is a good idea. So instead of: <spectrum id="S19" scanNumber="19" msLevel="1"> <spectrumDescription> ... </spectrumDescription> <binaryDataArray arrayLength="1313" encodedLength="5433" dataProcessingRef="Xcalibur Processing"> ... <binary>AAAAwDsGeUAAAAD...</binary> </binaryDataArray> <binaryDataArray arrayLength="1313" encodedLength="4892"> ... <binary>AAAAAIBJxk...</binary> </binaryDataArray> </spectrum> We will have: !!!!!!!!!!!!!!!!!! <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> <spectrumDescription> ... </spectrumDescription> <binaryDataArray encodedLength="5433" dataProcessingRef="Xcalibur Processing"> ... <binary>AAAAwDsGeUAAAAD...</binary> </binaryDataArray> <binaryDataArray encodedLength="4892"> ... <binary>AAAAAIBJxk...</binary> </binaryDataArray> </spectrum> Agreed? > -----Original Message----- > From: psi...@li... [mailto:psidev-ms-dev- > bo...@li...] On Behalf Of Matthew Chambers > Sent: Wednesday, February 06, 2008 10:49 AM > To: Mass spectrometry standard development > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > I agree that the primary data arrays should probably be treated as > special in the schema so it's clear that they are paired values and thus > peak count could move into the spectrum element or spectrumDescription. > There should still be options to have additional arrays that aren't the > same as the main arrays (for example, an additional set of arrays, one > for a subset of the m/zs and the other for peak charge information). > > -Matt > > > Kessner, Darren E. wrote: > > Any other comments regarding <binaryArrayData> lengths? > > > > > >> (from Rune) > >> If they have to be equal size, then > >> that size ought to be specified in the spectrumDescription. > >> > > > > I agree -- I would like to encode the length in <spectrum> somewhere > > (either attribute or cvParam) so that: > > 1) it's clear that the arrays are of equal size > > 2) Readers don't have to peek into the attributes of the first > > <binaryArrayData> to get the info > > > > I need this right now for the MSData RAMP adapter code, so I'll encode > > it as a <userParam> until a decision has been made on the specification. > > > > > > Darren > > > ------------------------------------------------------------------------ - > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Angel P. <an...@ma...> - 2008-02-13 00:41:26
|
On Feb 12, 2008 5:46 PM, Brian Pratt <bri...@in...> wrote: > I think that's not quite right - arrayLength needs to remain an attribute > of > BinaryDataArray since not all BinaryDataArray elements in a spectrum will > necessarily contain the same number of entries as an mz or intensity > array, Such as .... ? I asked for examples of this and never got a reply. Related question, if they are different lengths how would you go about assigning a value to a particular index (or set of indexes) that the value refers to in another binary array? My point is that it would be infinitely easier to just repeat values (MRM transitions values, retention time values, or even nil values) that pertain to more than one index so you always have a 1:1 correspondence across arrays. Of course I am making the assumption that all binary arrays within a single spectrum element are related to each other in some manner, so if this does not hold true, please someone tell me, as I am fairly ignorant on the mass spec acquisition modes. The alternative representation would be coordinate systems and multidimensional data arrays *a la* netcdf or HDF5, but we are too far along the route that we have laid out to even consider a change this radical. BTW, I did do some mzData (v 1.05) to netcdf conversion and the netcdf files are even bigger, at a gain of built in index into the data arrays within and across spectra. -angel > > since not all BinaryDataArray elements are guaranteed (as I understand > mzML, > which is but dimly) to be mz or intensity. You'll need to write it again > as > an attribute of spectrum, something like mzintPairsCount if you don't like > PeaksCount. > > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On Behalf Of Eric > Deutsch > Sent: Tuesday, February 12, 2008 1:28 PM > To: Mass spectrometry standard development > Cc: Eric Deutsch > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > So there seems to be broad consensus (4 for 4;) that moving the > arrayLength up a little higher is a good idea. So instead of: > > <spectrum id="S19" scanNumber="19" msLevel="1"> > <spectrumDescription> > ... > </spectrumDescription> > <binaryDataArray arrayLength="1313" encodedLength="5433" > dataProcessingRef="Xcalibur Processing"> > ... > <binary>AAAAwDsGeUAAAAD...</binary> > </binaryDataArray> > <binaryDataArray arrayLength="1313" encodedLength="4892"> > ... > <binary>AAAAAIBJxk...</binary> > </binaryDataArray> > </spectrum> > > We will have: !!!!!!!!!!!!!!!!!! > > <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> > <spectrumDescription> > ... > </spectrumDescription> > <binaryDataArray encodedLength="5433" > dataProcessingRef="Xcalibur Processing"> > ... > <binary>AAAAwDsGeUAAAAD...</binary> > </binaryDataArray> > <binaryDataArray encodedLength="4892"> > ... > <binary>AAAAAIBJxk...</binary> > </binaryDataArray> > </spectrum> > > > Agreed? > > > > > -----Original Message----- > > From: psi...@li... > [mailto:psidev-ms-dev- > > bo...@li...] On Behalf Of Matthew Chambers > > Sent: Wednesday, February 06, 2008 10:49 AM > > To: Mass spectrometry standard development > > Subject: Re: [Psidev-ms-dev] binaryArrayData lengths > > > > I agree that the primary data arrays should probably be treated as > > special in the schema so it's clear that they are paired values and > thus > > peak count could move into the spectrum element or > spectrumDescription. > > There should still be options to have additional arrays that aren't > the > > same as the main arrays (for example, an additional set of arrays, one > > for a subset of the m/zs and the other for peak charge information). > > > > -Matt > > > > > > Kessner, Darren E. wrote: > > > Any other comments regarding <binaryArrayData> lengths? > > > > > > > > >> (from Rune) > > >> If they have to be equal size, then > > >> that size ought to be specified in the spectrumDescription. > > >> > > > > > > I agree -- I would like to encode the length in <spectrum> somewhere > > > (either attribute or cvParam) so that: > > > 1) it's clear that the arrays are of equal size > > > 2) Readers don't have to peek into the attributes of the first > > > <binaryArrayData> to get the info > > > > > > I need this right now for the MSData RAMP adapter code, so I'll > encode > > > it as a <userParam> until a decision has been made on the > specification. > > > > > > > > > Darren > > > > > > > ------------------------------------------------------------------------ > - > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Matthew C. <mat...@va...> - 2008-02-12 22:57:03
|
We could go with "primaryArrayLength" which is generic in case a future update to the schema adds support for other types of primary data array pairs (like time vs. intensity for chromatograms), or go for the unambiguous "mzIntArrayLength" to be specific now (and if chromatograms are added to the schema in the future, it might be with an analogous "timeIntArrayLength" attribute). -Matt Eric Deutsch wrote: > I agree with Darren. For profile (aka continuous) mode data, each > element in the array is not a peak, so I would not want to label it > such. > > We use the term encodedLength to refer to the length of the string after > base64 encoding. It seems like a natural thing to call this concept > arrayLength. Something like arrayElementCount could work, too. > > But I'm currently still in favor of arrayLength unless someone has a > more elegant name. > > Thanks, > Eric > > > >> -----Original Message----- >> From: psi...@li... >> > [mailto:psidev-ms-dev- > >> bo...@li...] On Behalf Of Kessner, Darren E. >> Sent: Tuesday, February 12, 2008 1:49 PM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] binaryArrayData lengths >> >> I have an objection to the use of "peak" or "peakCount", since this >> > has > >> the alternate meaning of "local maximum" (as opposed to "data point"). >> >> Darren >> >> >> -----Original Message----- >> From: psi...@li... >> [mailto:psi...@li...] On Behalf Of Mike >> Coleman >> Sent: Tuesday, February 12, 2008 1:42 PM >> To: Mass spectrometry standard development >> Subject: Re: [Psidev-ms-dev] binaryArrayData lengths >> >> I'm in favor of this change. Would the interpretation be a little >> more obvious is this were called something like "peakCount" instead of >> "arrayLength"? >> >> Mike >> >> >> On Feb 12, 2008 3:27 PM, Eric Deutsch <ede...@sy...> >> wrote: >> >>> <spectrum id="S19" scanNumber="19" msLevel="1" arrayLength="1313"> >>> >> > ------------------------------------------------------------------------ > >> - >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> IMPORTANT WARNING: This message is intended for the use of the person >> > or > >> entity to which it is addressed and may contain information that is >> privileged and confidential, the disclosure of which is governed by >> applicable law. If the reader of this message is not the intended >> recipient, or the employee or agent responsible for delivering it to >> > the > >> intended recipient, you are hereby notified that any dissemination, >> distribution or copying of this information is STRICTLY PROHIBITED. >> >> If you have received this message in error, please notify us >> > immediately > >> by calling (310) 423-6428 and destroy the related message. Thank You >> > for > >> your cooperation. >> >> >> > ------------------------------------------------------------------------ > - > >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > |