You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(3) |
Dec
|
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
(1) |
Aug
(5) |
Sep
|
Oct
(5) |
Nov
(1) |
Dec
(2) |
2005 |
Jan
(2) |
Feb
(5) |
Mar
|
Apr
(1) |
May
(5) |
Jun
(2) |
Jul
(3) |
Aug
(7) |
Sep
(18) |
Oct
(22) |
Nov
(10) |
Dec
(15) |
2006 |
Jan
(15) |
Feb
(8) |
Mar
(16) |
Apr
(8) |
May
(2) |
Jun
(5) |
Jul
(3) |
Aug
(1) |
Sep
(34) |
Oct
(21) |
Nov
(14) |
Dec
(2) |
2007 |
Jan
|
Feb
(17) |
Mar
(10) |
Apr
(25) |
May
(11) |
Jun
(30) |
Jul
(1) |
Aug
(38) |
Sep
|
Oct
(119) |
Nov
(18) |
Dec
(3) |
2008 |
Jan
(34) |
Feb
(202) |
Mar
(57) |
Apr
(76) |
May
(44) |
Jun
(33) |
Jul
(33) |
Aug
(32) |
Sep
(41) |
Oct
(49) |
Nov
(84) |
Dec
(216) |
2009 |
Jan
(102) |
Feb
(126) |
Mar
(112) |
Apr
(26) |
May
(91) |
Jun
(54) |
Jul
(39) |
Aug
(29) |
Sep
(16) |
Oct
(18) |
Nov
(12) |
Dec
(23) |
2010 |
Jan
(29) |
Feb
(7) |
Mar
(11) |
Apr
(22) |
May
(9) |
Jun
(13) |
Jul
(7) |
Aug
(10) |
Sep
(9) |
Oct
(20) |
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
(4) |
Mar
(27) |
Apr
(15) |
May
(23) |
Jun
(13) |
Jul
(15) |
Aug
(11) |
Sep
(23) |
Oct
(18) |
Nov
(10) |
Dec
(7) |
2012 |
Jan
(23) |
Feb
(19) |
Mar
(7) |
Apr
(20) |
May
(16) |
Jun
(4) |
Jul
(6) |
Aug
(6) |
Sep
(14) |
Oct
(16) |
Nov
(31) |
Dec
(23) |
2013 |
Jan
(14) |
Feb
(19) |
Mar
(7) |
Apr
(25) |
May
(8) |
Jun
(5) |
Jul
(5) |
Aug
(6) |
Sep
(20) |
Oct
(19) |
Nov
(10) |
Dec
(12) |
2014 |
Jan
(6) |
Feb
(15) |
Mar
(6) |
Apr
(4) |
May
(16) |
Jun
(6) |
Jul
(4) |
Aug
(2) |
Sep
(3) |
Oct
(3) |
Nov
(7) |
Dec
(3) |
2015 |
Jan
(3) |
Feb
(8) |
Mar
(14) |
Apr
(3) |
May
(17) |
Jun
(9) |
Jul
(4) |
Aug
(2) |
Sep
|
Oct
(13) |
Nov
|
Dec
(6) |
2016 |
Jan
(8) |
Feb
(1) |
Mar
(20) |
Apr
(16) |
May
(11) |
Jun
(6) |
Jul
(5) |
Aug
|
Sep
(2) |
Oct
(5) |
Nov
(7) |
Dec
(2) |
2017 |
Jan
(10) |
Feb
(3) |
Mar
(17) |
Apr
(7) |
May
(5) |
Jun
(11) |
Jul
(4) |
Aug
(12) |
Sep
(9) |
Oct
(7) |
Nov
(2) |
Dec
(4) |
2018 |
Jan
(7) |
Feb
(2) |
Mar
(5) |
Apr
(6) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(1) |
Sep
(9) |
Oct
(5) |
Nov
(3) |
Dec
(5) |
2019 |
Jan
(10) |
Feb
|
Mar
(4) |
Apr
(4) |
May
(2) |
Jun
(8) |
Jul
(2) |
Aug
(2) |
Sep
|
Oct
(2) |
Nov
(9) |
Dec
(1) |
2020 |
Jan
(3) |
Feb
(1) |
Mar
(2) |
Apr
|
May
(3) |
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(1) |
2021 |
Jan
|
Feb
|
Mar
|
Apr
(5) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Randy J. <rkj...@in...> - 2006-10-06 13:48:48
|
In the mass spectrometry community there is a long history of building spectral databases which benefit from direct readability. Historically these have been plain ASCII representations including things like JCAMP-DX, etc. I think this list would agree that it would be better to use a HUPO format if for a peptide database. mzData could provide desirable additional instrument parameter information and provide a consistent mechanism for dealing with MS data across the proteomics community. To choose a numeric representation which causes groups like the NIST to use another format to receive and deliver data would be a loss. Instrument vendors are now providing exports to mzData, and I think it is critical that these exports be usable to submit data to mass spectral databases like those used by the MS community for years. If the cost is a little more code in the parser to deal with one more 'choice' element (of which we have many), then that seems small compared to the consequence of the NIST not being able to use the standard to deliver results to the community and thus requiring us to have a completely difference parser to read yet another MS format. Randy === Steve wrote: ... In our library, for example, we want the users to see the values that we put there, so we use ASCII. It would be very desirable for us if the same were offered in the XML's - otherwise we will have to go non-standard. ... -Steve Stein === Later Mike wrote: that touches on this issue. Also, an example on that page suggests another possibility for the encoding of peaklists that I prefer to those discussed so far: <peaklist> <peak mz="234.56" i="789" /> <peak mz="3456.43" i="2" /> <peak mz="3457.22" i="234" /> </peaklist> This would have the virtue of being highly accessible to eyeball and quick-and-dirty scripts as well. It would also clearly compress well. And it keeps the peak data within the realm of XML. It would be conceivable, I think, to use XSLT to create a table of peak data or even an SVG image of the spectrum, for example, since everything would be living in XML-land. > ...A standard that provides n>1 ways > to state the same thing is n times as difficult to implement and maintain, > which reduces vendor enthusiasm by a factor of n (squared?), which hinders > widespread adoption. ... I generally agree with this, and in particular, I suspect that if the specification allowed both representations, possibly most vendors would only produce base64 output. For this reason, if the textual representation is preferred, maybe the base64 alternative should be deprecated and marked for removal in a future version. However, I think that there is still an advantage to having the textual alternative in the specification, even if instrument vendors never produce it. It would allow those of us who prefer the textual format to do convert to it in a standard way, in a way that coordinates with the mzData standard. |
From: Mike C. <tu...@gm...> - 2006-10-06 06:48:25
|
On 10/5/06, Brian Pratt <bri...@in...> wrote: >...the unsuitability of XML for eyeballing what is essentially columnar data, ... I do think "eyeballability" is important, but I also feel uneasy placing the key spectrum data beyond the reach of XML in an XML spectrum format. In essence, in the current version the XML encodes spectrum metadata--the peaks themselves become an afterthought, hidden away in a relatively inaccessible appendix. This would be easier to justify if this were image data, for which there is no reasonable textual representation. But in this case there is a trivial representation, and the code to read and write it is probably simpler than for the base64-encoded case. There's some discussion here http://c2.com/cgi/wiki?IsolateEachDatum that touches on this issue. Also, an example on that page suggests another possibility for the encoding of peaklists that I prefer to those discussed so far: <peaklist> <peak mz="234.56" i="789" /> <peak mz="3456.43" i="2" /> <peak mz="3457.22" i="234" /> </peaklist> This would have the virtue of being highly accessible to eyeball and quick-and-dirty scripts as well. It would also clearly compress well. And it keeps the peak data within the realm of XML. It would be conceivable, I think, to use XSLT to create a table of peak data or even an SVG image of the spectrum, for example, since everything would be living in XML-land. > ...A standard that provides n>1 ways > to state the same thing is n times as difficult to implement and maintain, > which reduces vendor enthusiasm by a factor of n (squared?), which hinders > widespread adoption. ... I generally agree with this, and in particular, I suspect that if the specification allowed both representations, possibly most vendors would only produce base64 output. For this reason, if the textual representation is preferred, maybe the base64 alternative should be deprecated and marked for removal in a future version. However, I think that there is still an advantage to having the textual alternative in the specification, even if instrument vendors never produce it. It would allow those of us who prefer the textual format to do convert to it in a standard way, in a way that coordinates with the mzData standard. Mike |
From: Angel P. <an...@ma...> - 2006-10-05 19:20:18
|
I have to second Brian on this one. From the operational and reporting requirements, having both ascii and binary representations just adds confusion. Better to address the problem of perceived complexity and general usage through tool development efforts. Also, this case: > It is the simple case of 'represent a single tandem MS spectrum of a single > peptide at only the precision of the m/z calibration' that is harder than it > needs to be with the current representation. > is not used outside of post analysis verification of the spectra (e.g. was the assignement of spectra valid, where the right peaks used for quant, etc.) Very low-throughput and NOT viewed outside of the analysis context. This is just my perception though, so if someone has an example please speak up. angel Brian Pratt wrote: > I'm strongly opposed to the change. In addtion to the previously > discussed concerns about accuracy and the fundamental pointlessness > due to the unsuitability of XML for eyeballing what is essentially > columnar data, there's an additional and perhaps deeper practical concern: > > A data exchange standard that provides many ways to express the same > idea is headed for the rocks. Vendors will tend to implement only the > parts of the standard that interest them and the ecosystem quickly > breaks down (I speak from experience with interchange standards in the > internet security and circuit board manufacturing software industries, > it's a phenomenon not peculiar to any one field of endeavor). A > standard that provides n>1 ways to state the same thing is n times as > difficult to implement and maintain, which reduces vendor enthusiasm > by a factor of n (squared?), which hinders widespread adoption. > > As we sometimes say in the States, "If it ain't broke, don't fix it." > > Brian Pratt > > > ------------------------------------------------------------------------ > *From:* psi...@li... > [mailto:psi...@li...] *On Behalf Of > *Pierre-Alain Binz > *Sent:* Thursday, October 05, 2006 5:10 AM > *To:* Randy Julian > *Cc:* psi...@li... > *Subject:* Re: [Psidev-ms-dev] FW: Why base64? > > I am for the possibility to represent a spectrum/peaklist/even > chromatogram in more than one manner ONLY if these representations > are easy and straighforward to generate and to parse AND if there > is a good (or better blocking) reason to do so. We need to avoid > optional things that make any implementation subject to > interpretation and missunderstanding. > So yes only if the two formats are strictly and clearly described > and discriminated (specification issue) > > Pierre-Alain > > Randy Julian wrote: >> There was concern in the NBT review of the mzData manuscript that the format >> was not able specifically designed for either quantitation or 'raw' data. >> Quite the opposite is true - it handles these better than it handles a 'peak >> list'. >> >> Given the broad scope we are going for, I think mzData 2.0 needs to cover >> both of Mike's suggestions. >> >> The representation should allow an ASCII list representation, _and_ a base64 >> list option. Within each of these, the _desired_ precision should be used. >> If you want to make some kind of 21CFR11 claim regarding GLP or GCP for >> clinical data (metabolites, proteins or biomarker analyses) then the ability >> to represent 'raw' data is critical and part of the current design. >> >> It is the simple case of 'represent a single tandem MS spectrum of a single >> peptide at only the precision of the m/z calibration' that is harder than it >> needs to be with the current representation. >> >> During the Washington PSI meeting a proposal was made to re-introduce the >> ASCII data representation that was dropped at the PSI meeting in Nice. What >> does everyone think of this idea? >> >> Randy >> >> -----Original Message----- >> From: psi...@li... >> [mailto:psi...@li...] On Behalf Of Mike >> Coleman >> Sent: Wednesday, October 04, 2006 3:13 PM >> To: Angel Pizarro >> Cc: Psi...@li... >> Subject: Re: [Psidev-ms-dev] Why base64? >> >> [This message seems to have been bounced by Sourceforge, so I'm >> resending it. I'm sorry to see that apparently they are having >> serious email problems these days. See today's Slashdot article at >> http://it.slashdot.org/article.pl?sid=06/10/04/1324214. (Apparently >> the problem isn't limited to email coming from gmail accounts.) ] >> >> On 9/28/06, Mike Coleman <tu...@gm...> wrote: >> >>> Makes sense. To put it in other words, there are two questions here: >>> >>> 1. Are the values represented as base64-encoded bitstrings or as ASCII >>> >> text? >> >>> 2. Should the values be rounded to the precision of the instrument >>> (probably plus a digit, etc.), or should an arbitrary number of >>> figures be used? Again, this isn't about losing information, as we're >>> only discussing rounding away noise. >>> >>> These two questions are entirely orthogonal, as far as I can see, and >>> it would be possible to allow both options for both questions, if this >>> were seen as being worthwhile. The one interaction is that if you use >>> the ASCII text encoding, rounding the figures will make the mzData >>> file smaller. >>> >>> Regarding ambiguity, the ASCII text representation would allow >>> differing whitespace (which produce no semantic difference). I guess >>> the base64 encoding also allows differing surrounding whitespace. >>> >>> With respect to the base64 encoding, one corner case comes to mind. >>> Are special IEEE values like NaN, the infinities, negative zero, etc., >>> allowed? If so, what should the interpretation be? >>> >>> Mike >>> >>> >>> The example code I mentioned: >>> >>> /* gcc -g -O2 -ffloat-store -o ieee-test ieee-test.c */ >>> >>> /* strtof is GNU/C99 */ >>> #define _GNU_SOURCE >>> >>> #include <assert.h> >>> #include <errno.h> >>> #include <limits.h> >>> #include <stdio.h> >>> #include <stdlib.h> >>> >>> >>> union bits { >>> unsigned int u; >>> float f; >>> }; >>> >>> >>> int >>> main() { >>> unsigned int i; >>> union bits x, x2; >>> int zeros_seen = 0; >>> >>> assert(sizeof x.u == sizeof x.f); >>> assert(&x.u == &x.f); >>> >>> >>> >>> for (i=0; ; i++) { >>> char buf[128]; >>> >>> if (i == 0) >>> if (++zeros_seen > 1) >>> break; >>> >>> #if 0 >>> if (!(i % 100000)) >>> putc('.', stderr); >>> #endif >>> >>> x.u = i; >>> if (x.f != x.f) >>> continue; /* skip error values */ >>> >>> sprintf(buf, "%.8e", x.f); >>> >>> errno = 0; >>> x2.f = strtof(buf, 0); >>> if (errno == ERANGE) { >>> printf("strtof error for %s\n", buf); >>> continue; >>> } >>> >>> if (x2.u != x.u) >>> printf("bit difference for %s (%u != %u)\n", buf, x2.u, x.u); >>> } >>> } >>> >>> >> >> ------------------------------------------------------------------------- >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the chance to share your >> opinions on IT & business topics through brief surveys -- and earn cash >> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> >> ------------------------------------------------------------------------- >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the chance to share your >> opinions on IT & business topics through brief surveys -- and earn cash >> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > -- > -- > Dr. Pierre-Alain Binz > Swiss Institute of Bioinformatics > Proteome Informatics Group > 1, Rue Michel Servet > CH-1211 Geneve 4 > Switzerland > - - - - - - - - - - - - - - - - - > Tel: +41-22-379 50 50 > Fax: +41-22-379 58 58 > Pie...@is... > http://www.expasy.org/people/Pierre-Alain.Binz.html > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > ------------------------------------------------------------------------ > > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > -- Angel Pizarro Director, Bioinformatics Facility Institute for Translational Medicine and Therapeutics University of Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 P: 215-573-3736 F: 215-573-9004 E: an...@ma... |
From: Angel P. <an...@ma...> - 2006-10-05 17:36:46
|
The PSI Fall 2006 working group meeting in Washington, D.C. was a rousing success story for the Mass Spec and Proteomics Informatics working groups. First and foremost, the working group chairs would like to thanks everyone in attendance, as all put forth an unprecedented effort across the numerous activities. For those of you that were not able to attend, here are some highlights: The general philosophy was "get things done" and to that end, the attendees were split up into multiple smaller groups with specific deliverables. These included a thorough review and sign off of the mass spec engine spreadsheet, work on the much talked about merge of the mzData and mzXML formats, the analysisXML UML model, and the beginnings of the ontology for use with analysisXML. Fantastic progress was made on all of these fronts: mzData/mzXML ----------------------------------- The document of the difference between the formats produced and presented by Kent last meeting was used to begin merging of the schema. The data arrays where worked out for the most part, as was annotation and generalization of the instrument, protocol and parameter annotations. The work is slated to be finished by the end of the year, mostly by use of email and a satellite meeting in Seattle. Join the next conference call for further details. Use cases of instrument modes (MRM, LC-MS, LC-MALDI MS/MS, neutral loss scans, etc) were also submitted to be mapped on the model. The ability to computationally validate MS interchange data for MIAPE compliance was also discussed. The working group is considering encoding MIAPE concepts and terms into a CV or ontology, which could be used for future software validation of XML instance documents. AnalysisXML UML model ----------------------------------- Modeling was started using FuGE as a basis. A provisional model was created and turned into XML schema using the AndroMDA tools. Since time was limiting factor, the development effort did not pay careful attention to documentation and diagram formatting, thus the model is undergoing a bit of clean up before release to the rest of the WG. AnaysisXML content and CV ----------------------------------- Based on the search engine spreadsheet generated during this summer, the group did a rigorous review of the content that AnalysisXML should carry and mapped the current search engine outputs with MIAPE requirements and MCP guidelines for reporting mass spec search engine parameters, even adding a few that where missing. A first proposal for CV terms has been generated. The group, was able to agree on a large subset of the parameter names and meanings, which are being added to the PSI ontology. Vendors will be asked for vendor-specific terms. Not covered was quantitative parameters. Currently Jim Shofstal is cleaning up the document prior to sending to the rest of the WG. A discussion was raised about a possibility to homogenize the Accession codes used in the various engines. A main difficulty comes from the interpretation of the fasta header lines by the various tools. A proposal was to study the fasta format generated by Phenyx that is structured in a way it clearly labels different information types such as AC, Description, taxonomy, PTMs, etc. Stay tuned for details regarding this. Once again, thanks everyone for being in attendance and for all of the hard effort allowing such an amazing amount of progress in the short time we had together. Cheers, from the PSI-MS and PSI-PI working group chairs and secretary: Pierre-Alain Binz David Creasy Phil Jones Randy Julian Angel Pizarro |
From: Brian P. <bri...@in...> - 2006-10-05 15:59:23
|
I'm strongly opposed to the change. In addtion to the previously discussed concerns about accuracy and the fundamental pointlessness due to the unsuitability of XML for eyeballing what is essentially columnar data, there's an additional and perhaps deeper practical concern: A data exchange standard that provides many ways to express the same idea is headed for the rocks. Vendors will tend to implement only the parts of the standard that interest them and the ecosystem quickly breaks down (I speak from experience with interchange standards in the internet security and circuit board manufacturing software industries, it's a phenomenon not peculiar to any one field of endeavor). A standard that provides n>1 ways to state the same thing is n times as difficult to implement and maintain, which reduces vendor enthusiasm by a factor of n (squared?), which hinders widespread adoption. As we sometimes say in the States, "If it ain't broke, don't fix it." Brian Pratt _____ From: psi...@li... [mailto:psi...@li...] On Behalf Of Pierre-Alain Binz Sent: Thursday, October 05, 2006 5:10 AM To: Randy Julian Cc: psi...@li... Subject: Re: [Psidev-ms-dev] FW: Why base64? I am for the possibility to represent a spectrum/peaklist/even chromatogram in more than one manner ONLY if these representations are easy and straighforward to generate and to parse AND if there is a good (or better blocking) reason to do so. We need to avoid optional things that make any implementation subject to interpretation and missunderstanding. So yes only if the two formats are strictly and clearly described and discriminated (specification issue) Pierre-Alain Randy Julian wrote: There was concern in the NBT review of the mzData manuscript that the format was not able specifically designed for either quantitation or 'raw' data. Quite the opposite is true - it handles these better than it handles a 'peak list'. Given the broad scope we are going for, I think mzData 2.0 needs to cover both of Mike's suggestions. The representation should allow an ASCII list representation, _and_ a base64 list option. Within each of these, the _desired_ precision should be used. If you want to make some kind of 21CFR11 claim regarding GLP or GCP for clinical data (metabolites, proteins or biomarker analyses) then the ability to represent 'raw' data is critical and part of the current design. It is the simple case of 'represent a single tandem MS spectrum of a single peptide at only the precision of the m/z calibration' that is harder than it needs to be with the current representation. During the Washington PSI meeting a proposal was made to re-introduce the ASCII data representation that was dropped at the PSI meeting in Nice. What does everyone think of this idea? Randy -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Mike Coleman Sent: Wednesday, October 04, 2006 3:13 PM To: Angel Pizarro Cc: Psi...@li... Subject: Re: [Psidev-ms-dev] Why base64? [This message seems to have been bounced by Sourceforge, so I'm resending it. I'm sorry to see that apparently they are having serious email problems these days. See today's Slashdot article at http://it.slashdot.org/article.pl?sid=06/10/04/1324214. (Apparently the problem isn't limited to email coming from gmail accounts.) ] On 9/28/06, Mike Coleman <mailto:tu...@gm...> <tu...@gm...> wrote: Makes sense. To put it in other words, there are two questions here: 1. Are the values represented as base64-encoded bitstrings or as ASCII text? 2. Should the values be rounded to the precision of the instrument (probably plus a digit, etc.), or should an arbitrary number of figures be used? Again, this isn't about losing information, as we're only discussing rounding away noise. These two questions are entirely orthogonal, as far as I can see, and it would be possible to allow both options for both questions, if this were seen as being worthwhile. The one interaction is that if you use the ASCII text encoding, rounding the figures will make the mzData file smaller. Regarding ambiguity, the ASCII text representation would allow differing whitespace (which produce no semantic difference). I guess the base64 encoding also allows differing surrounding whitespace. With respect to the base64 encoding, one corner case comes to mind. Are special IEEE values like NaN, the infinities, negative zero, etc., allowed? If so, what should the interpretation be? Mike The example code I mentioned: /* gcc -g -O2 -ffloat-store -o ieee-test ieee-test.c */ /* strtof is GNU/C99 */ #define _GNU_SOURCE #include <assert.h> #include <errno.h> #include <limits.h> #include <stdio.h> #include <stdlib.h> union bits { unsigned int u; float f; }; int main() { unsigned int i; union bits x, x2; int zeros_seen = 0; assert(sizeof x.u == sizeof x.f); assert(&x.u == &x.f); for (i=0; ; i++) { char buf[128]; if (i == 0) if (++zeros_seen > 1) break; #if 0 if (!(i % 100000)) putc('.', stderr); #endif x.u = i; if (x.f != x.f) continue; /* skip error values */ sprintf(buf, "%.8e", x.f); errno = 0; x2.f = strtof(buf, 0); if (errno == ERANGE) { printf("strtof error for %s\n", buf); continue; } if (x2.u != x.u) printf("bit difference for %s (%u != %u)\n", buf, x2.u, x.u); } } ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php <http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV> &p=sourceforge&CID=DEVDEV _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php <http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV> &p=sourceforge&CID=DEVDEV _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- -- Dr. Pierre-Alain Binz Swiss Institute of Bioinformatics Proteome Informatics Group 1, Rue Michel Servet CH-1211 Geneve 4 Switzerland - - - - - - - - - - - - - - - - - Tel: +41-22-379 50 50 Fax: +41-22-379 58 58 Pie...@is... http://www.expasy.org/people/Pierre-Alain.Binz.html |
From: Pierre-Alain B. <pie...@is...> - 2006-10-05 12:14:04
|
I am for the possibility to represent a spectrum/peaklist/even chromatogram in more than one manner ONLY if these representations are easy and straighforward to generate and to parse AND if there is a good (or better blocking) reason to do so. We need to avoid optional things that make any implementation subject to interpretation and missunderstanding. So yes only if the two formats are strictly and clearly described and discriminated (specification issue) Pierre-Alain Randy Julian wrote: >There was concern in the NBT review of the mzData manuscript that the format >was not able specifically designed for either quantitation or 'raw' data. >Quite the opposite is true - it handles these better than it handles a 'peak >list'. > >Given the broad scope we are going for, I think mzData 2.0 needs to cover >both of Mike's suggestions. > >The representation should allow an ASCII list representation, _and_ a base64 >list option. Within each of these, the _desired_ precision should be used. >If you want to make some kind of 21CFR11 claim regarding GLP or GCP for >clinical data (metabolites, proteins or biomarker analyses) then the ability >to represent 'raw' data is critical and part of the current design. > >It is the simple case of 'represent a single tandem MS spectrum of a single >peptide at only the precision of the m/z calibration' that is harder than it >needs to be with the current representation. > >During the Washington PSI meeting a proposal was made to re-introduce the >ASCII data representation that was dropped at the PSI meeting in Nice. What >does everyone think of this idea? > >Randy > >-----Original Message----- >From: psi...@li... >[mailto:psi...@li...] On Behalf Of Mike >Coleman >Sent: Wednesday, October 04, 2006 3:13 PM >To: Angel Pizarro >Cc: Psi...@li... >Subject: Re: [Psidev-ms-dev] Why base64? > >[This message seems to have been bounced by Sourceforge, so I'm >resending it. I'm sorry to see that apparently they are having >serious email problems these days. See today's Slashdot article at >http://it.slashdot.org/article.pl?sid=06/10/04/1324214. (Apparently >the problem isn't limited to email coming from gmail accounts.) ] > >On 9/28/06, Mike Coleman <tu...@gm...> wrote: > > >>Makes sense. To put it in other words, there are two questions here: >> >>1. Are the values represented as base64-encoded bitstrings or as ASCII >> >> >text? > > >>2. Should the values be rounded to the precision of the instrument >>(probably plus a digit, etc.), or should an arbitrary number of >>figures be used? Again, this isn't about losing information, as we're >>only discussing rounding away noise. >> >>These two questions are entirely orthogonal, as far as I can see, and >>it would be possible to allow both options for both questions, if this >>were seen as being worthwhile. The one interaction is that if you use >>the ASCII text encoding, rounding the figures will make the mzData >>file smaller. >> >>Regarding ambiguity, the ASCII text representation would allow >>differing whitespace (which produce no semantic difference). I guess >>the base64 encoding also allows differing surrounding whitespace. >> >>With respect to the base64 encoding, one corner case comes to mind. >>Are special IEEE values like NaN, the infinities, negative zero, etc., >>allowed? If so, what should the interpretation be? >> >>Mike >> >> >>The example code I mentioned: >> >>/* gcc -g -O2 -ffloat-store -o ieee-test ieee-test.c */ >> >>/* strtof is GNU/C99 */ >>#define _GNU_SOURCE >> >>#include <assert.h> >>#include <errno.h> >>#include <limits.h> >>#include <stdio.h> >>#include <stdlib.h> >> >> >>union bits { >> unsigned int u; >> float f; >>}; >> >> >>int >>main() { >> unsigned int i; >> union bits x, x2; >> int zeros_seen = 0; >> >> assert(sizeof x.u == sizeof x.f); >> assert(&x.u == &x.f); >> >> >> >> for (i=0; ; i++) { >> char buf[128]; >> >> if (i == 0) >> if (++zeros_seen > 1) >> break; >> >>#if 0 >> if (!(i % 100000)) >> putc('.', stderr); >>#endif >> >> x.u = i; >> if (x.f != x.f) >> continue; /* skip error values */ >> >> sprintf(buf, "%.8e", x.f); >> >> errno = 0; >> x2.f = strtof(buf, 0); >> if (errno == ERANGE) { >> printf("strtof error for %s\n", buf); >> continue; >> } >> >> if (x2.u != x.u) >> printf("bit difference for %s (%u != %u)\n", buf, x2.u, x.u); >> } >>} >> >> >> > >------------------------------------------------------------------------- >Take Surveys. Earn Cash. Influence the Future of IT >Join SourceForge.net's Techsay panel and you'll get the chance to share your >opinions on IT & business topics through brief surveys -- and earn cash >http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >_______________________________________________ >Psidev-ms-dev mailing list >Psi...@li... >https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > >------------------------------------------------------------------------- >Take Surveys. Earn Cash. Influence the Future of IT >Join SourceForge.net's Techsay panel and you'll get the chance to share your >opinions on IT & business topics through brief surveys -- and earn cash >http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >_______________________________________________ >Psidev-ms-dev mailing list >Psi...@li... >https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > -- -- Dr. Pierre-Alain Binz Swiss Institute of Bioinformatics Proteome Informatics Group 1, Rue Michel Servet CH-1211 Geneve 4 Switzerland - - - - - - - - - - - - - - - - - Tel: +41-22-379 50 50 Fax: +41-22-379 58 58 Pie...@is... http://www.expasy.org/people/Pierre-Alain.Binz.html |
From: Randy J. <rkj...@in...> - 2006-10-05 11:27:53
|
There was concern in the NBT review of the mzData manuscript that the format was not able specifically designed for either quantitation or 'raw' data. Quite the opposite is true - it handles these better than it handles a 'peak list'. Given the broad scope we are going for, I think mzData 2.0 needs to cover both of Mike's suggestions. The representation should allow an ASCII list representation, _and_ a base64 list option. Within each of these, the _desired_ precision should be used. If you want to make some kind of 21CFR11 claim regarding GLP or GCP for clinical data (metabolites, proteins or biomarker analyses) then the ability to represent 'raw' data is critical and part of the current design. It is the simple case of 'represent a single tandem MS spectrum of a single peptide at only the precision of the m/z calibration' that is harder than it needs to be with the current representation. During the Washington PSI meeting a proposal was made to re-introduce the ASCII data representation that was dropped at the PSI meeting in Nice. What does everyone think of this idea? Randy -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Mike Coleman Sent: Wednesday, October 04, 2006 3:13 PM To: Angel Pizarro Cc: Psi...@li... Subject: Re: [Psidev-ms-dev] Why base64? [This message seems to have been bounced by Sourceforge, so I'm resending it. I'm sorry to see that apparently they are having serious email problems these days. See today's Slashdot article at http://it.slashdot.org/article.pl?sid=06/10/04/1324214. (Apparently the problem isn't limited to email coming from gmail accounts.) ] On 9/28/06, Mike Coleman <tu...@gm...> wrote: > Makes sense. To put it in other words, there are two questions here: > > 1. Are the values represented as base64-encoded bitstrings or as ASCII text? > > 2. Should the values be rounded to the precision of the instrument > (probably plus a digit, etc.), or should an arbitrary number of > figures be used? Again, this isn't about losing information, as we're > only discussing rounding away noise. > > These two questions are entirely orthogonal, as far as I can see, and > it would be possible to allow both options for both questions, if this > were seen as being worthwhile. The one interaction is that if you use > the ASCII text encoding, rounding the figures will make the mzData > file smaller. > > Regarding ambiguity, the ASCII text representation would allow > differing whitespace (which produce no semantic difference). I guess > the base64 encoding also allows differing surrounding whitespace. > > With respect to the base64 encoding, one corner case comes to mind. > Are special IEEE values like NaN, the infinities, negative zero, etc., > allowed? If so, what should the interpretation be? > > Mike > > > The example code I mentioned: > > /* gcc -g -O2 -ffloat-store -o ieee-test ieee-test.c */ > > /* strtof is GNU/C99 */ > #define _GNU_SOURCE > > #include <assert.h> > #include <errno.h> > #include <limits.h> > #include <stdio.h> > #include <stdlib.h> > > > union bits { > unsigned int u; > float f; > }; > > > int > main() { > unsigned int i; > union bits x, x2; > int zeros_seen = 0; > > assert(sizeof x.u == sizeof x.f); > assert(&x.u == &x.f); > > > > for (i=0; ; i++) { > char buf[128]; > > if (i == 0) > if (++zeros_seen > 1) > break; > > #if 0 > if (!(i % 100000)) > putc('.', stderr); > #endif > > x.u = i; > if (x.f != x.f) > continue; /* skip error values */ > > sprintf(buf, "%.8e", x.f); > > errno = 0; > x2.f = strtof(buf, 0); > if (errno == ERANGE) { > printf("strtof error for %s\n", buf); > continue; > } > > if (x2.u != x.u) > printf("bit difference for %s (%u != %u)\n", buf, x2.u, x.u); > } > } > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Mike C. <tu...@gm...> - 2006-10-04 19:12:38
|
[This message seems to have been bounced by Sourceforge, so I'm resending it. I'm sorry to see that apparently they are having serious email problems these days. See today's Slashdot article at http://it.slashdot.org/article.pl?sid=06/10/04/1324214. (Apparently the problem isn't limited to email coming from gmail accounts.) ] On 9/28/06, Mike Coleman <tu...@gm...> wrote: > Makes sense. To put it in other words, there are two questions here: > > 1. Are the values represented as base64-encoded bitstrings or as ASCII text? > > 2. Should the values be rounded to the precision of the instrument > (probably plus a digit, etc.), or should an arbitrary number of > figures be used? Again, this isn't about losing information, as we're > only discussing rounding away noise. > > These two questions are entirely orthogonal, as far as I can see, and > it would be possible to allow both options for both questions, if this > were seen as being worthwhile. The one interaction is that if you use > the ASCII text encoding, rounding the figures will make the mzData > file smaller. > > Regarding ambiguity, the ASCII text representation would allow > differing whitespace (which produce no semantic difference). I guess > the base64 encoding also allows differing surrounding whitespace. > > With respect to the base64 encoding, one corner case comes to mind. > Are special IEEE values like NaN, the infinities, negative zero, etc., > allowed? If so, what should the interpretation be? > > Mike > > > The example code I mentioned: > > /* gcc -g -O2 -ffloat-store -o ieee-test ieee-test.c */ > > /* strtof is GNU/C99 */ > #define _GNU_SOURCE > > #include <assert.h> > #include <errno.h> > #include <limits.h> > #include <stdio.h> > #include <stdlib.h> > > > union bits { > unsigned int u; > float f; > }; > > > int > main() { > unsigned int i; > union bits x, x2; > int zeros_seen = 0; > > assert(sizeof x.u == sizeof x.f); > assert(&x.u == &x.f); > > > > for (i=0; ; i++) { > char buf[128]; > > if (i == 0) > if (++zeros_seen > 1) > break; > > #if 0 > if (!(i % 100000)) > putc('.', stderr); > #endif > > x.u = i; > if (x.f != x.f) > continue; /* skip error values */ > > sprintf(buf, "%.8e", x.f); > > errno = 0; > x2.f = strtof(buf, 0); > if (errno == ERANGE) { > printf("strtof error for %s\n", buf); > continue; > } > > if (x2.u != x.u) > printf("bit difference for %s (%u != %u)\n", buf, x2.u, x.u); > } > } > |
From: Angel P. <an...@ma...> - 2006-10-03 12:41:12
|
Eric Deutsch wrote: > > Hi, when is the next teleconference? > Hi Eric, The regularly scheduled one is next week Tues @ 11AM EST These are the numbers I dial, but I'll dig up the email to see if there is a separate west coast number phone: 718-354-1169 passcode: 8885686# -angel |
From: Eric D. <ede...@sy...> - 2006-10-03 03:15:40
|
Hi, when is the next teleconference? Thanks, Eric |
From: Pierre-Alain B. <pie...@is...> - 2006-09-29 13:02:59
|
Hi all, I'm sure Jim will answer to this as soon as he is back from Washington (PSI meeting). Is that the person you were thinking of David? For your info, the merging work between mzData and mzXML has formely started in Washington during the PSI workshop earlier this week. This will be most probably drive the next version of mzData (which will change its name most probably due to the merge) Cheers Pierre-Alain David Creasy wrote: >Hi Alex, >Yes, it's definitely incorrect. I'll pass on the message to the >appropriate person and I'm confident they will fix it in a future >release. I'm not aware of any other mzData producers doing it this way. > >David Creasy. > >Brian Pratt wrote: > > >>I agree, that doesn't make sense. It sounds like there's a bug in the module that wrote the data. >> >>- Brian Pratt >>www.insilicos.com >> >> >> >>>-----Original Message----- >>>From: psi...@li... >>>[mailto:psi...@li...] On >>>Behalf Of Alexandre Masselot >>>Sent: Thursday, September 28, 2006 6:10 AM >>>To: psi...@li... >>>Subject: [Psidev-ms-dev] <precursor msLevel="1" or >>>msLevel="2" for msms data in the mzdata format (LTQ data) >>> >>>Hello, >>>something raised a problem in our code and into my understanding of >>>mzdata format for msms peaklist >>> >>>We have been writing (and still writing...) some opensource >>>perl module >>>InSilicoSpectro (CPAN & http://insilicospectro.vital-it.ch), >>>which does, >>>among plenty of other things MS peaklist conversion from/to >>>various formats >>> >>>In demo file, spectrumentInstrument is set to msLevel="2", then the >>>precursor info is set msLevel="1", which can seems logical. >>> >>>Then I have some LTQ data >>>spectrumentInstrument still msLevel="2", then precursor msLevel="2", >>>which either follows an other logic, no? >>> >>>This happen for all the spectra in each files. >>> >>>Anyone could give me some light would be very welcome, >>> >>>Thanks in advance >>>Alex >>> >>> >>> myo_ms2_1.05 >>> >>> <spectrum id="3"> >>> <spectrumDesc> >>> <spectrumSettings> >>> <acqSpecification spectrumType="discrete" >>>methodOfCombination="sum" count="1"> >>> <acquisition acqNumber="3"/> >>> </acqSpecification> >>> <spectrumInstrument *msLevel="2" >>>*mzRangeStart="100.000000" mzRangeStop="1350.000000"> >>> <cvParam cvLabel="psi" accession="PSI:1000036" >>>name="ScanMode" value="MassScan"/> >>> <cvParam cvLabel="psi" accession="PSI:1000037" >>>name="Polarity" value="Positive"/> >>> <cvParam cvLabel="psi" accession="PSI:1000038" >>>name="TimeInMinutes" value="0.045000"/> >>> <userParam name="ScanType" value="full"/> >>> </spectrumInstrument> >>> </spectrumSettings> >>> <precursorList count="1"> >>> <precursor *msLevel="1" *spectrumRef="1"> >>> <ionSelection> >>> <cvParam cvLabel="psi" >>>accession="PSI:1000040" name="MassToChargeRatio" value="334.82"/> >>> </ionSelection> >>> <activation> >>> <cvParam cvLabel="psi" >>>accession="PSI:1000044" name="Method" value="CID"/> >>> <cvParam cvLabel="psi" >>>accession="PSI:1000045" name="CollisionEnergy" value="28.00"/> >>> <cvParam cvLabel="psi" >>>accession="PSI:1000046" name="EnergyUnits" value="Percent"/> >>> </activation> >>> </precursor> >>> </precursorList> >>> </spectrumDesc> >>> <mzArrayBinary> >>> <data precision="32" endian="little" >>>length="77">+hMVQ1A1K0NGFjFDmBQ5QzxKPUPc/UpDLhJPQ6jrVkMEol1D/G >>>heQygoYUNOHmdD/kloQyjwaEPm8WpDcDx2Q8otd0O6PXhDICJ7QzCjfUPYfH9D >>>kAeDQ/CHiEOI8YhDqmSJQ5TeiUOYUIxDmByPQ+iSj0Pa9Y9DYHCQQzAtkUMIe5 >>>FDGNSRQ0RDkkMkxJJDdDqTQzACl0OQj5dDdOmXQzhzmENs1ZhDzmiZQ7DOmUM4 >>>RZpDMAGbQ0h1m0MU2ptDSDqcQ0TFnUN8Mp5DSqGeQ9gdn0O+0J9D0HygQwLuoE >>> >>> >>PiMKJDctqjQxiwpEOcDaVDgIemQ1D/pkPEfadDpP2nQ4hmq0PA/6xDdOmtQ8gKr0PEM7BDysqwQzg4sUOcZ7ND7Ni4Qyo+uU> PClblD2DzmQ+gH8UM=</data> >> >> >>> </mzArrayBinary> >>> <intenArrayBinary> >>> <data precision="32" endian="little" >>>length="77">ALBORQCAC0QAYDNFAOCoRAAABkUAIOJEAADbRABgkkUAMFxFAJ >>> >>> >>AnRQCAjkQAgChEAKBoRQAQeUYAAAFEAIA/RADAPEQAoIdEAKDqRABACUUAABFEAADTRAAO2UYAwE1EwHJgSAA8HkYAgGhEAA> >>AnRACAeUUAyNtFAPwkRgAW6kYA0F9GAD+tR8B6k0gAHo1GAEAcRADgZUUAIFZF >> >> >>>AJAKRQDAh0YA4KFFAAh3RgCZBkcAMAdFAHixRQAAakQA6L1FAGC9RQBmpUYAAM >>>5EAMiuRQAAcEQAoLhEAFAmRQAAtkQA4BlFANBVRQBAAkQAALNDAPxnRgDQvUUA >>>oKlFAOBNRQBAMUUAsCpFAAC2RADAWkQAEIZFAJDQRQCwCUUAAGpEABB/RQDAlk >>>QAwEBEAIDsQwDANUQ=</data> >>> </intenArrayBinary> >>> </spectrum> >>> >>> >>> LTQ exported mzdata >>> >>> >>><spectrum id="2"> >>> <spectrumDesc> >>> <spectrumSettings> >>> <acqSpecification >>>spectrumType="CentroidMassSpectrum" methodOfCombination="sum" >>>count="1"> >>> <acquisition acqNumber="2"/> >>> </acqSpecification> >>> <spectrumInstrument *msLevel="2"* >>>mzRangeStart="110.000000" mzRangeStop="910.000000"> >>> <cvParam cvLabel="psi" accession="PSI:1000036" >>>name="ScanMode" value="MassScan"/> >>> <cvParam cvLabel="psi" accession="PSI:1000037" >>>name="Polarity" value="Positive"/> >>> <cvParam cvLabel="psi" accession="PSI:1000038" >>>name="TimeInMinutes" value="14.002800"/> >>> <cvParam cvLabel="psi" accession="PSI:1000035" >>>name="PeakProcessing" value="ContinuumMassSpectrum"/> >>> </spectrumInstrument> >>> </spectrumSettings> >>> <precursorList count = "1"> >>> <precursor *msLevel="2"* spectrumRef="1"> >>> <ionSelection> >>> <cvParam cvLabel="psi" >>>accession="PSI:1000040" name="MassToChargeRatio" value="449.728607"/> >>> <cvParam cvLabel="psi" >>>accession="PSI:1000041" name="ChargeState" value="2"/> >>> <cvParam cvLabel="psi" >>>accession="PSI:1000041" name="ChargeState" value="3"/> >>> </ionSelection> >>> <activation> >>> <cvParam cvLabel="psi" >>>accession="PSI:1000044" name="Method" value="CID"/> >>> <cvParam cvLabel="psi" >>>accession="PSI:1000045" name="CollisionEnergy" value="35.00"/> >>> </activation> >>> </precursor> >>> </precursorList> >>> </spectrumDesc> >>> <mzArrayBinary> >>> <data precision="64" endian="little" >>>length="26">AAAAgHL8aEAAAADANSppQAAAAMBNAWxAAAAAANhdbEAAAAAAch >>> >>> >>lwQAAAAOBez3FAAAAAwHsSckAAAACAKkJzQAAAAADZEnVAAAAAQDCQdUAAAADg4AR2QAAAAMBrhnZAAAAAAJdNd0AAAABgUp> >>B3QAAAAOA1sHdAAAAAQGWReEAAAACAUyx6QAAAAAACanpAAAAAIMSgekAAAABg >> >> >>>h+F6QAAAAOCg9HpAAAAAoL8Be0AAAADgphN7QAAAAOAUhHtAAAAAILUifEAAAA >>>BgRjt9QA==</data> >>> </mzArrayBinary> >>> <intenArrayBinary> >>> <data precision="64" endian="little" >>>length="26">AAAAYA24EkAAAAAASOjmPwAAAEAcNOs/AAAAIHZw4z8AAACAIQ >>> >>> >>T7PwAAAEB8qxNAAAAAAPF5D0AAAACg6X3/PwAAAMC/> +BFAAAAA4JNY7z8AAAAgKVznPwAAAMCfQvI/AAAAoDyfFUAAAABghBLpPwAAAM >> >> >>>AoJgFAAAAAYOGaGkAAAABgf5AYQAAAAMBj6gJAAAAAYHv/6T8AAADgiaUFQAAA >>>AAAb2jJAAAAA4D06IEAAAABACm9BQAAAAGBbqRVAAAAAoLdPBUAAAABgNsjpPw >>>==</data> >>> </intenArrayBinary> >>> </spectrum> >>> <spectrum id="3"> >>> <spectrumDesc> >>> <spectrumSettings> >>> <acqSpecification >>>spectrumType="CentroidMassSpectrum" methodOfCombination="sum" >>>count="1"> >>> <acquisition acqNumber="3"/> >>> </acqSpecification> >>> <spectrumInstrument *msLevel="2"* >>>mzRangeStart="95.000000" mzRangeStop="795.000000"> >>> <cvParam cvLabel="psi" accession="PSI:1000036" >>>name="ScanMode" value="MassScan"/> >>> <cvParam cvLabel="psi" accession="PSI:1000037" >>>name="Polarity" value="Positive"/> >>> <cvParam cvLabel="psi" accession="PSI:1000038" >>>name="TimeInMinutes" value="14.015817"/> >>> <cvParam cvLabel="psi" accession="PSI:1000035" >>>name="PeakProcessing" value="ContinuumMassSpectrum"/> >>> </spectrumInstrument> >>> </spectrumSettings> >>> <precursorList count = "1"> >>> <precursor *msLevel="2"* spectrumRef="2"> >>> <ionSelection> >>> <cvParam cvLabel="psi" >>>accession="PSI:1000040" name="MassToChargeRatio" value="391.069244"/> >>> <cvParam cvLabel="psi" >>>accession="PSI:1000041" name="ChargeState" value="2"/> >>> <cvParam cvLabel="psi" >>>accession="PSI:1000041" name="ChargeState" value="3"/> >>> </ionSelection> >>> <activation> >>> <cvParam cvLabel="psi" >>>accession="PSI:1000044" name="Method" value="CID"/> >>> <cvParam cvLabel="psi" >>>accession="PSI:1000045" name="CollisionEnergy" value="35.00"/> >>> </activation> >>> </precursor> >>> </precursorList> >>> </spectrumDesc> >>> <mzArrayBinary> >>> <data precision="64" endian="little" >>>length="85">AAAAQAo8XkAAAABgK8FeQAAAAABoWWBAAAAAoGKlYEAAAAAAzy >>> >>> >>NhQAAAAMAHVmFAAAAAoD+eYUAAAADAzONhQAAAACAwnmJAAAAAIHzcYkAAAADAkv5iQAAAACC9j2NAAAAAYBEkZEAAAADAZN> >>1kQAAAAGCnnmVAAAAAgJTiZUAAAADA3QJmQAAAAECVOmdAAAAAwGJPZ0AAAABA >> >> >>>XBhoQAAAAKASnGlAAAAAoMPgaUAAAACAG1tqQAAAAACIKGtAAAAAALOra0AAAA >>>CAOdxrQAAAAIAPIWxAAAAAALw2bEAAAADAtppsQAAAACCXvWxAAAAAQOWebUAA >>>AADgUeNtQAAAAGBkUG5AAAAAwCBkb0AAAABgmt9vQAAAAACVL3BAAAAAALhAcE >>>AAAABg6UtwQAAAAADAVXBAAAAAQH9ucEAAAABA0pBwQAAAACAG83BAAAAAYKYL >>>cUAAAAAA1iZxQAAAAABEMXFAAAAA4IlQcUAAAAAAWHBxQAAAAOBfjnFAAAAAgO >>>EMckAAAACA5pFyQAAAAECoqXJAAAAA4AjzckAAAACAwDdzQAAAAOA7k3NAAAAA >>>wCSmc0AAAAAAW8FzQAAAAIBSDHRAAAAAIIlWdEAAAABAaWB0QAAAAMAqj3RAAA >>>AAAH6hdEAAAABAJDx1QAAAAADEk3VAAAAAgPCxdUAAAACAKL51QAAAAAAo8XVA >>>AAAAIJ+kdkAAAABgwLp2QAAAAMDAAHdAAAAAQPNBd0AAAACg7k93QAAAAMCPW3 >>>dAAAAAADBld0AAAACgvuR3QAAAAED5fnhAAAAAwIXkeEAAAABgT4t5QAAAACBv >>>jHpAAAAAwBcSfUAAAABgOpN9QAAAACDxpH5AAAAAgBlsgUAAAACAV8GBQAAAAA >>>DUFYJAAAAAwJYshEA=</data> >>> </mzArrayBinary> >>> <intenArrayBinary> >>> <data precision="64" endian="little" >>>length="85">AAAAwMPJAUAAAABArr70PwAAAMD/xgBAAAAAoH+3/D8AAACgkT >>> >>> >>4LQAAAAICGFhNAAAAAIFnSG0AAAADgCTflPwAAAMAIB29AAAAA4IQHM0AAAAAg3CwDQAAAAOCtROM/AAAAYPO15z8AAADApR> >>9IQAAAAICUefg/AAAAQMlXMUAAAAAg8krxPwAAAEDJavk/AAAAIAaR8T8AAABA >> >> >>>uVEkQAAAAMCItglAAAAAwGV95z8AAACAF9o4QAAAAMCmzQ5AAAAAIPfSAUAAAA >>>AAsnwaQAAAAEBm+e0/AAAAgG4sAEAAAAAg0Sw7QAAAAADCngVAAAAAgFatB0AA >>>AABA5fTsPwAAAAD2lARAAAAAYFmJEEAAAADAAo4xQAAAAMBx1QJAAAAAQNaQNE >>>AAAABgJoNTQAAAAGCvGCRAAAAAwBaGIUAAAABAjE36PwAAAADyNQBAAAAAoKzt >>>IkAAAACAlW/tPwAAAIBRJO8/AAAAYNy5CUAAAABAhiBaQAAAAIDzvgpAAAAA4B >>>zgA0AAAADg3CbyPwAAAOB7xw5AAAAAAImSI0AAAAAgdOwQQAAAACAeuPo/AAAA >>>oGo07D8AAABg4JTzPwAAACBtRhVAAAAA4KLT9D8AAACAIQUhQAAAAODRDARAAA >>>AAACjGFkAAAADApNzpPwAAAEBNTvU/AAAAIBY9CEAAAABAfF0PQAAAAGBcPx1A >>>AAAAgDyQBEAAAACAMhf6PwAAAADboPI/AAAAgPUCFUAAAAAA4yA6QAAAAEDaqA >>>pAAAAAALA0BEAAAABgV+YwQAAAAACdvPY/AAAAIBdaAUAAAADgLFwJQAAAAECs >>>DQxAAAAAoDSWB0AAAAAASu0BQAAAAECuaBRAAAAAoN22IkAAAADAYNTxPwAAAG >>>D5xwJAAAAAINkzB0A=</data> >>> </intenArrayBinary> >>> </spectrum> >>> >>>-- >>>Alexandre Masselot, phD >>>Senior bioinformatician >>>www.genebio.com >>>voice: +41 22 702 99 00 >>> >>> >>> >>>-------------------------------------------------------------- >>>----------- >>>Take Surveys. Earn Cash. Influence the Future of IT >>>Join SourceForge.net's Techsay panel and you'll get the >>>chance to share your >>>opinions on IT & business topics through brief surveys -- and >>>earn cash >>>http://www.techsay.com/default.php?page=join.php&p=sourceforge >>>&CID=DEVDEV >>>_______________________________________________ >>>Psidev-ms-dev mailing list >>>Psi...@li... >>>https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >>> >>> >>> >>------------------------------------------------------------------------- >>Take Surveys. Earn Cash. Influence the Future of IT >>Join SourceForge.net's Techsay panel and you'll get the chance to share your >>opinions on IT & business topics through brief surveys -- and earn cash >>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>_______________________________________________ >>Psidev-ms-dev mailing list >>Psi...@li... >>https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> >> > > > -- -- Dr. Pierre-Alain Binz Swiss Institute of Bioinformatics Proteome Informatics Group 1, Rue Michel Servet CH-1211 Geneve 4 Switzerland - - - - - - - - - - - - - - - - - Tel: +41-22-379 50 50 Fax: +41-22-379 58 58 Pie...@is... http://www.expasy.org/people/Pierre-Alain.Binz.html |
From: David C. <dc...@ma...> - 2006-09-28 16:20:46
|
Hi Alex, Yes, it's definitely incorrect. I'll pass on the message to the appropriate person and I'm confident they will fix it in a future release. I'm not aware of any other mzData producers doing it this way. David Creasy. Brian Pratt wrote: > I agree, that doesn't make sense. It sounds like there's a bug in the module that wrote the data. > > - Brian Pratt > www.insilicos.com > >> -----Original Message----- >> From: psi...@li... >> [mailto:psi...@li...] On >> Behalf Of Alexandre Masselot >> Sent: Thursday, September 28, 2006 6:10 AM >> To: psi...@li... >> Subject: [Psidev-ms-dev] <precursor msLevel="1" or >> msLevel="2" for msms data in the mzdata format (LTQ data) >> >> Hello, >> something raised a problem in our code and into my understanding of >> mzdata format for msms peaklist >> >> We have been writing (and still writing...) some opensource >> perl module >> InSilicoSpectro (CPAN & http://insilicospectro.vital-it.ch), >> which does, >> among plenty of other things MS peaklist conversion from/to >> various formats >> >> In demo file, spectrumentInstrument is set to msLevel="2", then the >> precursor info is set msLevel="1", which can seems logical. >> >> Then I have some LTQ data >> spectrumentInstrument still msLevel="2", then precursor msLevel="2", >> which either follows an other logic, no? >> >> This happen for all the spectra in each files. >> >> Anyone could give me some light would be very welcome, >> >> Thanks in advance >> Alex >> >> >> myo_ms2_1.05 >> >> <spectrum id="3"> >> <spectrumDesc> >> <spectrumSettings> >> <acqSpecification spectrumType="discrete" >> methodOfCombination="sum" count="1"> >> <acquisition acqNumber="3"/> >> </acqSpecification> >> <spectrumInstrument *msLevel="2" >> *mzRangeStart="100.000000" mzRangeStop="1350.000000"> >> <cvParam cvLabel="psi" accession="PSI:1000036" >> name="ScanMode" value="MassScan"/> >> <cvParam cvLabel="psi" accession="PSI:1000037" >> name="Polarity" value="Positive"/> >> <cvParam cvLabel="psi" accession="PSI:1000038" >> name="TimeInMinutes" value="0.045000"/> >> <userParam name="ScanType" value="full"/> >> </spectrumInstrument> >> </spectrumSettings> >> <precursorList count="1"> >> <precursor *msLevel="1" *spectrumRef="1"> >> <ionSelection> >> <cvParam cvLabel="psi" >> accession="PSI:1000040" name="MassToChargeRatio" value="334.82"/> >> </ionSelection> >> <activation> >> <cvParam cvLabel="psi" >> accession="PSI:1000044" name="Method" value="CID"/> >> <cvParam cvLabel="psi" >> accession="PSI:1000045" name="CollisionEnergy" value="28.00"/> >> <cvParam cvLabel="psi" >> accession="PSI:1000046" name="EnergyUnits" value="Percent"/> >> </activation> >> </precursor> >> </precursorList> >> </spectrumDesc> >> <mzArrayBinary> >> <data precision="32" endian="little" >> length="77">+hMVQ1A1K0NGFjFDmBQ5QzxKPUPc/UpDLhJPQ6jrVkMEol1D/G >> heQygoYUNOHmdD/kloQyjwaEPm8WpDcDx2Q8otd0O6PXhDICJ7QzCjfUPYfH9D >> kAeDQ/CHiEOI8YhDqmSJQ5TeiUOYUIxDmByPQ+iSj0Pa9Y9DYHCQQzAtkUMIe5 >> FDGNSRQ0RDkkMkxJJDdDqTQzACl0OQj5dDdOmXQzhzmENs1ZhDzmiZQ7DOmUM4 >> RZpDMAGbQ0h1m0MU2ptDSDqcQ0TFnUN8Mp5DSqGeQ9gdn0O+0J9D0HygQwLuoE > PiMKJDctqjQxiwpEOcDaVDgIemQ1D/pkPEfadDpP2nQ4hmq0PA/6xDdOmtQ8gKr0PEM7BDysqwQzg4sUOcZ7ND7Ni4Qyo+uU> PClblD2DzmQ+gH8UM=</data> >> </mzArrayBinary> >> <intenArrayBinary> >> <data precision="32" endian="little" >> length="77">ALBORQCAC0QAYDNFAOCoRAAABkUAIOJEAADbRABgkkUAMFxFAJ > AnRQCAjkQAgChEAKBoRQAQeUYAAAFEAIA/RADAPEQAoIdEAKDqRABACUUAABFEAADTRAAO2UYAwE1EwHJgSAA8HkYAgGhEAA> > AnRACAeUUAyNtFAPwkRgAW6kYA0F9GAD+tR8B6k0gAHo1GAEAcRADgZUUAIFZF >> AJAKRQDAh0YA4KFFAAh3RgCZBkcAMAdFAHixRQAAakQA6L1FAGC9RQBmpUYAAM >> 5EAMiuRQAAcEQAoLhEAFAmRQAAtkQA4BlFANBVRQBAAkQAALNDAPxnRgDQvUUA >> oKlFAOBNRQBAMUUAsCpFAAC2RADAWkQAEIZFAJDQRQCwCUUAAGpEABB/RQDAlk >> QAwEBEAIDsQwDANUQ=</data> >> </intenArrayBinary> >> </spectrum> >> >> >> LTQ exported mzdata >> >> >> <spectrum id="2"> >> <spectrumDesc> >> <spectrumSettings> >> <acqSpecification >> spectrumType="CentroidMassSpectrum" methodOfCombination="sum" >> count="1"> >> <acquisition acqNumber="2"/> >> </acqSpecification> >> <spectrumInstrument *msLevel="2"* >> mzRangeStart="110.000000" mzRangeStop="910.000000"> >> <cvParam cvLabel="psi" accession="PSI:1000036" >> name="ScanMode" value="MassScan"/> >> <cvParam cvLabel="psi" accession="PSI:1000037" >> name="Polarity" value="Positive"/> >> <cvParam cvLabel="psi" accession="PSI:1000038" >> name="TimeInMinutes" value="14.002800"/> >> <cvParam cvLabel="psi" accession="PSI:1000035" >> name="PeakProcessing" value="ContinuumMassSpectrum"/> >> </spectrumInstrument> >> </spectrumSettings> >> <precursorList count = "1"> >> <precursor *msLevel="2"* spectrumRef="1"> >> <ionSelection> >> <cvParam cvLabel="psi" >> accession="PSI:1000040" name="MassToChargeRatio" value="449.728607"/> >> <cvParam cvLabel="psi" >> accession="PSI:1000041" name="ChargeState" value="2"/> >> <cvParam cvLabel="psi" >> accession="PSI:1000041" name="ChargeState" value="3"/> >> </ionSelection> >> <activation> >> <cvParam cvLabel="psi" >> accession="PSI:1000044" name="Method" value="CID"/> >> <cvParam cvLabel="psi" >> accession="PSI:1000045" name="CollisionEnergy" value="35.00"/> >> </activation> >> </precursor> >> </precursorList> >> </spectrumDesc> >> <mzArrayBinary> >> <data precision="64" endian="little" >> length="26">AAAAgHL8aEAAAADANSppQAAAAMBNAWxAAAAAANhdbEAAAAAAch > lwQAAAAOBez3FAAAAAwHsSckAAAACAKkJzQAAAAADZEnVAAAAAQDCQdUAAAADg4AR2QAAAAMBrhnZAAAAAAJdNd0AAAABgUp> > B3QAAAAOA1sHdAAAAAQGWReEAAAACAUyx6QAAAAAACanpAAAAAIMSgekAAAABg >> h+F6QAAAAOCg9HpAAAAAoL8Be0AAAADgphN7QAAAAOAUhHtAAAAAILUifEAAAA >> BgRjt9QA==</data> >> </mzArrayBinary> >> <intenArrayBinary> >> <data precision="64" endian="little" >> length="26">AAAAYA24EkAAAAAASOjmPwAAAEAcNOs/AAAAIHZw4z8AAACAIQ > T7PwAAAEB8qxNAAAAAAPF5D0AAAACg6X3/PwAAAMC/> +BFAAAAA4JNY7z8AAAAgKVznPwAAAMCfQvI/AAAAoDyfFUAAAABghBLpPwAAAM >> AoJgFAAAAAYOGaGkAAAABgf5AYQAAAAMBj6gJAAAAAYHv/6T8AAADgiaUFQAAA >> AAAb2jJAAAAA4D06IEAAAABACm9BQAAAAGBbqRVAAAAAoLdPBUAAAABgNsjpPw >> ==</data> >> </intenArrayBinary> >> </spectrum> >> <spectrum id="3"> >> <spectrumDesc> >> <spectrumSettings> >> <acqSpecification >> spectrumType="CentroidMassSpectrum" methodOfCombination="sum" >> count="1"> >> <acquisition acqNumber="3"/> >> </acqSpecification> >> <spectrumInstrument *msLevel="2"* >> mzRangeStart="95.000000" mzRangeStop="795.000000"> >> <cvParam cvLabel="psi" accession="PSI:1000036" >> name="ScanMode" value="MassScan"/> >> <cvParam cvLabel="psi" accession="PSI:1000037" >> name="Polarity" value="Positive"/> >> <cvParam cvLabel="psi" accession="PSI:1000038" >> name="TimeInMinutes" value="14.015817"/> >> <cvParam cvLabel="psi" accession="PSI:1000035" >> name="PeakProcessing" value="ContinuumMassSpectrum"/> >> </spectrumInstrument> >> </spectrumSettings> >> <precursorList count = "1"> >> <precursor *msLevel="2"* spectrumRef="2"> >> <ionSelection> >> <cvParam cvLabel="psi" >> accession="PSI:1000040" name="MassToChargeRatio" value="391.069244"/> >> <cvParam cvLabel="psi" >> accession="PSI:1000041" name="ChargeState" value="2"/> >> <cvParam cvLabel="psi" >> accession="PSI:1000041" name="ChargeState" value="3"/> >> </ionSelection> >> <activation> >> <cvParam cvLabel="psi" >> accession="PSI:1000044" name="Method" value="CID"/> >> <cvParam cvLabel="psi" >> accession="PSI:1000045" name="CollisionEnergy" value="35.00"/> >> </activation> >> </precursor> >> </precursorList> >> </spectrumDesc> >> <mzArrayBinary> >> <data precision="64" endian="little" >> length="85">AAAAQAo8XkAAAABgK8FeQAAAAABoWWBAAAAAoGKlYEAAAAAAzy > NhQAAAAMAHVmFAAAAAoD+eYUAAAADAzONhQAAAACAwnmJAAAAAIHzcYkAAAADAkv5iQAAAACC9j2NAAAAAYBEkZEAAAADAZN> > 1kQAAAAGCnnmVAAAAAgJTiZUAAAADA3QJmQAAAAECVOmdAAAAAwGJPZ0AAAABA >> XBhoQAAAAKASnGlAAAAAoMPgaUAAAACAG1tqQAAAAACIKGtAAAAAALOra0AAAA >> CAOdxrQAAAAIAPIWxAAAAAALw2bEAAAADAtppsQAAAACCXvWxAAAAAQOWebUAA >> AADgUeNtQAAAAGBkUG5AAAAAwCBkb0AAAABgmt9vQAAAAACVL3BAAAAAALhAcE >> AAAABg6UtwQAAAAADAVXBAAAAAQH9ucEAAAABA0pBwQAAAACAG83BAAAAAYKYL >> cUAAAAAA1iZxQAAAAABEMXFAAAAA4IlQcUAAAAAAWHBxQAAAAOBfjnFAAAAAgO >> EMckAAAACA5pFyQAAAAECoqXJAAAAA4AjzckAAAACAwDdzQAAAAOA7k3NAAAAA >> wCSmc0AAAAAAW8FzQAAAAIBSDHRAAAAAIIlWdEAAAABAaWB0QAAAAMAqj3RAAA >> AAAH6hdEAAAABAJDx1QAAAAADEk3VAAAAAgPCxdUAAAACAKL51QAAAAAAo8XVA >> AAAAIJ+kdkAAAABgwLp2QAAAAMDAAHdAAAAAQPNBd0AAAACg7k93QAAAAMCPW3 >> dAAAAAADBld0AAAACgvuR3QAAAAED5fnhAAAAAwIXkeEAAAABgT4t5QAAAACBv >> jHpAAAAAwBcSfUAAAABgOpN9QAAAACDxpH5AAAAAgBlsgUAAAACAV8GBQAAAAA >> DUFYJAAAAAwJYshEA=</data> >> </mzArrayBinary> >> <intenArrayBinary> >> <data precision="64" endian="little" >> length="85">AAAAwMPJAUAAAABArr70PwAAAMD/xgBAAAAAoH+3/D8AAACgkT > 4LQAAAAICGFhNAAAAAIFnSG0AAAADgCTflPwAAAMAIB29AAAAA4IQHM0AAAAAg3CwDQAAAAOCtROM/AAAAYPO15z8AAADApR> > 9IQAAAAICUefg/AAAAQMlXMUAAAAAg8krxPwAAAEDJavk/AAAAIAaR8T8AAABA >> uVEkQAAAAMCItglAAAAAwGV95z8AAACAF9o4QAAAAMCmzQ5AAAAAIPfSAUAAAA >> AAsnwaQAAAAEBm+e0/AAAAgG4sAEAAAAAg0Sw7QAAAAADCngVAAAAAgFatB0AA >> AABA5fTsPwAAAAD2lARAAAAAYFmJEEAAAADAAo4xQAAAAMBx1QJAAAAAQNaQNE >> AAAABgJoNTQAAAAGCvGCRAAAAAwBaGIUAAAABAjE36PwAAAADyNQBAAAAAoKzt >> IkAAAACAlW/tPwAAAIBRJO8/AAAAYNy5CUAAAABAhiBaQAAAAIDzvgpAAAAA4B >> zgA0AAAADg3CbyPwAAAOB7xw5AAAAAAImSI0AAAAAgdOwQQAAAACAeuPo/AAAA >> oGo07D8AAABg4JTzPwAAACBtRhVAAAAA4KLT9D8AAACAIQUhQAAAAODRDARAAA >> AAACjGFkAAAADApNzpPwAAAEBNTvU/AAAAIBY9CEAAAABAfF0PQAAAAGBcPx1A >> AAAAgDyQBEAAAACAMhf6PwAAAADboPI/AAAAgPUCFUAAAAAA4yA6QAAAAEDaqA >> pAAAAAALA0BEAAAABgV+YwQAAAAACdvPY/AAAAIBdaAUAAAADgLFwJQAAAAECs >> DQxAAAAAoDSWB0AAAAAASu0BQAAAAECuaBRAAAAAoN22IkAAAADAYNTxPwAAAG >> D5xwJAAAAAINkzB0A=</data> >> </intenArrayBinary> >> </spectrum> >> >> -- >> Alexandre Masselot, phD >> Senior bioinformatician >> www.genebio.com >> voice: +41 22 702 99 00 >> >> >> >> -------------------------------------------------------------- >> ----------- >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the >> chance to share your >> opinions on IT & business topics through brief surveys -- and >> earn cash >> http://www.techsay.com/default.php?page=join.php&p=sourceforge >> &CID=DEVDEV >> _______________________________________________ >> Psidev-ms-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >> > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com *** please note change of address *** |
From: Brian P. <bri...@in...> - 2006-09-28 15:49:39
|
I agree, that doesn't make sense. It sounds like there's a bug in the module that wrote the data. - Brian Pratt www.insilicos.com > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On > Behalf Of Alexandre Masselot > Sent: Thursday, September 28, 2006 6:10 AM > To: psi...@li... > Subject: [Psidev-ms-dev] <precursor msLevel="1" or > msLevel="2" for msms data in the mzdata format (LTQ data) > > Hello, > something raised a problem in our code and into my understanding of > mzdata format for msms peaklist > > We have been writing (and still writing...) some opensource > perl module > InSilicoSpectro (CPAN & http://insilicospectro.vital-it.ch), > which does, > among plenty of other things MS peaklist conversion from/to > various formats > > In demo file, spectrumentInstrument is set to msLevel="2", then the > precursor info is set msLevel="1", which can seems logical. > > Then I have some LTQ data > spectrumentInstrument still msLevel="2", then precursor msLevel="2", > which either follows an other logic, no? > > This happen for all the spectra in each files. > > Anyone could give me some light would be very welcome, > > Thanks in advance > Alex > > > myo_ms2_1.05 > > <spectrum id="3"> > <spectrumDesc> > <spectrumSettings> > <acqSpecification spectrumType="discrete" > methodOfCombination="sum" count="1"> > <acquisition acqNumber="3"/> > </acqSpecification> > <spectrumInstrument *msLevel="2" > *mzRangeStart="100.000000" mzRangeStop="1350.000000"> > <cvParam cvLabel="psi" accession="PSI:1000036" > name="ScanMode" value="MassScan"/> > <cvParam cvLabel="psi" accession="PSI:1000037" > name="Polarity" value="Positive"/> > <cvParam cvLabel="psi" accession="PSI:1000038" > name="TimeInMinutes" value="0.045000"/> > <userParam name="ScanType" value="full"/> > </spectrumInstrument> > </spectrumSettings> > <precursorList count="1"> > <precursor *msLevel="1" *spectrumRef="1"> > <ionSelection> > <cvParam cvLabel="psi" > accession="PSI:1000040" name="MassToChargeRatio" value="334.82"/> > </ionSelection> > <activation> > <cvParam cvLabel="psi" > accession="PSI:1000044" name="Method" value="CID"/> > <cvParam cvLabel="psi" > accession="PSI:1000045" name="CollisionEnergy" value="28.00"/> > <cvParam cvLabel="psi" > accession="PSI:1000046" name="EnergyUnits" value="Percent"/> > </activation> > </precursor> > </precursorList> > </spectrumDesc> > <mzArrayBinary> > <data precision="32" endian="little" > length="77">+hMVQ1A1K0NGFjFDmBQ5QzxKPUPc/UpDLhJPQ6jrVkMEol1D/G > heQygoYUNOHmdD/kloQyjwaEPm8WpDcDx2Q8otd0O6PXhDICJ7QzCjfUPYfH9D > kAeDQ/CHiEOI8YhDqmSJQ5TeiUOYUIxDmByPQ+iSj0Pa9Y9DYHCQQzAtkUMIe5 > FDGNSRQ0RDkkMkxJJDdDqTQzACl0OQj5dDdOmXQzhzmENs1ZhDzmiZQ7DOmUM4 > RZpDMAGbQ0h1m0MU2ptDSDqcQ0TFnUN8Mp5DSqGeQ9gdn0O+0J9D0HygQwLuoE PiMKJDctqjQxiwpEOcDaVDgIemQ1D/pkPEfadDpP2nQ4hmq0PA/6xDdOmtQ8gKr0PEM7BDysqwQzg4sUOcZ7ND7Ni4Qyo+uU> PClblD2DzmQ+gH8UM=</data> > </mzArrayBinary> > <intenArrayBinary> > <data precision="32" endian="little" > length="77">ALBORQCAC0QAYDNFAOCoRAAABkUAIOJEAADbRABgkkUAMFxFAJ AnRQCAjkQAgChEAKBoRQAQeUYAAAFEAIA/RADAPEQAoIdEAKDqRABACUUAABFEAADTRAAO2UYAwE1EwHJgSAA8HkYAgGhEAA> AnRACAeUUAyNtFAPwkRgAW6kYA0F9GAD+tR8B6k0gAHo1GAEAcRADgZUUAIFZF > AJAKRQDAh0YA4KFFAAh3RgCZBkcAMAdFAHixRQAAakQA6L1FAGC9RQBmpUYAAM > 5EAMiuRQAAcEQAoLhEAFAmRQAAtkQA4BlFANBVRQBAAkQAALNDAPxnRgDQvUUA > oKlFAOBNRQBAMUUAsCpFAAC2RADAWkQAEIZFAJDQRQCwCUUAAGpEABB/RQDAlk > QAwEBEAIDsQwDANUQ=</data> > </intenArrayBinary> > </spectrum> > > > LTQ exported mzdata > > > <spectrum id="2"> > <spectrumDesc> > <spectrumSettings> > <acqSpecification > spectrumType="CentroidMassSpectrum" methodOfCombination="sum" > count="1"> > <acquisition acqNumber="2"/> > </acqSpecification> > <spectrumInstrument *msLevel="2"* > mzRangeStart="110.000000" mzRangeStop="910.000000"> > <cvParam cvLabel="psi" accession="PSI:1000036" > name="ScanMode" value="MassScan"/> > <cvParam cvLabel="psi" accession="PSI:1000037" > name="Polarity" value="Positive"/> > <cvParam cvLabel="psi" accession="PSI:1000038" > name="TimeInMinutes" value="14.002800"/> > <cvParam cvLabel="psi" accession="PSI:1000035" > name="PeakProcessing" value="ContinuumMassSpectrum"/> > </spectrumInstrument> > </spectrumSettings> > <precursorList count = "1"> > <precursor *msLevel="2"* spectrumRef="1"> > <ionSelection> > <cvParam cvLabel="psi" > accession="PSI:1000040" name="MassToChargeRatio" value="449.728607"/> > <cvParam cvLabel="psi" > accession="PSI:1000041" name="ChargeState" value="2"/> > <cvParam cvLabel="psi" > accession="PSI:1000041" name="ChargeState" value="3"/> > </ionSelection> > <activation> > <cvParam cvLabel="psi" > accession="PSI:1000044" name="Method" value="CID"/> > <cvParam cvLabel="psi" > accession="PSI:1000045" name="CollisionEnergy" value="35.00"/> > </activation> > </precursor> > </precursorList> > </spectrumDesc> > <mzArrayBinary> > <data precision="64" endian="little" > length="26">AAAAgHL8aEAAAADANSppQAAAAMBNAWxAAAAAANhdbEAAAAAAch lwQAAAAOBez3FAAAAAwHsSckAAAACAKkJzQAAAAADZEnVAAAAAQDCQdUAAAADg4AR2QAAAAMBrhnZAAAAAAJdNd0AAAABgUp> B3QAAAAOA1sHdAAAAAQGWReEAAAACAUyx6QAAAAAACanpAAAAAIMSgekAAAABg > h+F6QAAAAOCg9HpAAAAAoL8Be0AAAADgphN7QAAAAOAUhHtAAAAAILUifEAAAA > BgRjt9QA==</data> > </mzArrayBinary> > <intenArrayBinary> > <data precision="64" endian="little" > length="26">AAAAYA24EkAAAAAASOjmPwAAAEAcNOs/AAAAIHZw4z8AAACAIQ T7PwAAAEB8qxNAAAAAAPF5D0AAAACg6X3/PwAAAMC/> +BFAAAAA4JNY7z8AAAAgKVznPwAAAMCfQvI/AAAAoDyfFUAAAABghBLpPwAAAM > AoJgFAAAAAYOGaGkAAAABgf5AYQAAAAMBj6gJAAAAAYHv/6T8AAADgiaUFQAAA > AAAb2jJAAAAA4D06IEAAAABACm9BQAAAAGBbqRVAAAAAoLdPBUAAAABgNsjpPw > ==</data> > </intenArrayBinary> > </spectrum> > <spectrum id="3"> > <spectrumDesc> > <spectrumSettings> > <acqSpecification > spectrumType="CentroidMassSpectrum" methodOfCombination="sum" > count="1"> > <acquisition acqNumber="3"/> > </acqSpecification> > <spectrumInstrument *msLevel="2"* > mzRangeStart="95.000000" mzRangeStop="795.000000"> > <cvParam cvLabel="psi" accession="PSI:1000036" > name="ScanMode" value="MassScan"/> > <cvParam cvLabel="psi" accession="PSI:1000037" > name="Polarity" value="Positive"/> > <cvParam cvLabel="psi" accession="PSI:1000038" > name="TimeInMinutes" value="14.015817"/> > <cvParam cvLabel="psi" accession="PSI:1000035" > name="PeakProcessing" value="ContinuumMassSpectrum"/> > </spectrumInstrument> > </spectrumSettings> > <precursorList count = "1"> > <precursor *msLevel="2"* spectrumRef="2"> > <ionSelection> > <cvParam cvLabel="psi" > accession="PSI:1000040" name="MassToChargeRatio" value="391.069244"/> > <cvParam cvLabel="psi" > accession="PSI:1000041" name="ChargeState" value="2"/> > <cvParam cvLabel="psi" > accession="PSI:1000041" name="ChargeState" value="3"/> > </ionSelection> > <activation> > <cvParam cvLabel="psi" > accession="PSI:1000044" name="Method" value="CID"/> > <cvParam cvLabel="psi" > accession="PSI:1000045" name="CollisionEnergy" value="35.00"/> > </activation> > </precursor> > </precursorList> > </spectrumDesc> > <mzArrayBinary> > <data precision="64" endian="little" > length="85">AAAAQAo8XkAAAABgK8FeQAAAAABoWWBAAAAAoGKlYEAAAAAAzy NhQAAAAMAHVmFAAAAAoD+eYUAAAADAzONhQAAAACAwnmJAAAAAIHzcYkAAAADAkv5iQAAAACC9j2NAAAAAYBEkZEAAAADAZN> 1kQAAAAGCnnmVAAAAAgJTiZUAAAADA3QJmQAAAAECVOmdAAAAAwGJPZ0AAAABA > XBhoQAAAAKASnGlAAAAAoMPgaUAAAACAG1tqQAAAAACIKGtAAAAAALOra0AAAA > CAOdxrQAAAAIAPIWxAAAAAALw2bEAAAADAtppsQAAAACCXvWxAAAAAQOWebUAA > AADgUeNtQAAAAGBkUG5AAAAAwCBkb0AAAABgmt9vQAAAAACVL3BAAAAAALhAcE > AAAABg6UtwQAAAAADAVXBAAAAAQH9ucEAAAABA0pBwQAAAACAG83BAAAAAYKYL > cUAAAAAA1iZxQAAAAABEMXFAAAAA4IlQcUAAAAAAWHBxQAAAAOBfjnFAAAAAgO > EMckAAAACA5pFyQAAAAECoqXJAAAAA4AjzckAAAACAwDdzQAAAAOA7k3NAAAAA > wCSmc0AAAAAAW8FzQAAAAIBSDHRAAAAAIIlWdEAAAABAaWB0QAAAAMAqj3RAAA > AAAH6hdEAAAABAJDx1QAAAAADEk3VAAAAAgPCxdUAAAACAKL51QAAAAAAo8XVA > AAAAIJ+kdkAAAABgwLp2QAAAAMDAAHdAAAAAQPNBd0AAAACg7k93QAAAAMCPW3 > dAAAAAADBld0AAAACgvuR3QAAAAED5fnhAAAAAwIXkeEAAAABgT4t5QAAAACBv > jHpAAAAAwBcSfUAAAABgOpN9QAAAACDxpH5AAAAAgBlsgUAAAACAV8GBQAAAAA > DUFYJAAAAAwJYshEA=</data> > </mzArrayBinary> > <intenArrayBinary> > <data precision="64" endian="little" > length="85">AAAAwMPJAUAAAABArr70PwAAAMD/xgBAAAAAoH+3/D8AAACgkT 4LQAAAAICGFhNAAAAAIFnSG0AAAADgCTflPwAAAMAIB29AAAAA4IQHM0AAAAAg3CwDQAAAAOCtROM/AAAAYPO15z8AAADApR> 9IQAAAAICUefg/AAAAQMlXMUAAAAAg8krxPwAAAEDJavk/AAAAIAaR8T8AAABA > uVEkQAAAAMCItglAAAAAwGV95z8AAACAF9o4QAAAAMCmzQ5AAAAAIPfSAUAAAA > AAsnwaQAAAAEBm+e0/AAAAgG4sAEAAAAAg0Sw7QAAAAADCngVAAAAAgFatB0AA > AABA5fTsPwAAAAD2lARAAAAAYFmJEEAAAADAAo4xQAAAAMBx1QJAAAAAQNaQNE > AAAABgJoNTQAAAAGCvGCRAAAAAwBaGIUAAAABAjE36PwAAAADyNQBAAAAAoKzt > IkAAAACAlW/tPwAAAIBRJO8/AAAAYNy5CUAAAABAhiBaQAAAAIDzvgpAAAAA4B > zgA0AAAADg3CbyPwAAAOB7xw5AAAAAAImSI0AAAAAgdOwQQAAAACAeuPo/AAAA > oGo07D8AAABg4JTzPwAAACBtRhVAAAAA4KLT9D8AAACAIQUhQAAAAODRDARAAA > AAACjGFkAAAADApNzpPwAAAEBNTvU/AAAAIBY9CEAAAABAfF0PQAAAAGBcPx1A > AAAAgDyQBEAAAACAMhf6PwAAAADboPI/AAAAgPUCFUAAAAAA4yA6QAAAAEDaqA > pAAAAAALA0BEAAAABgV+YwQAAAAACdvPY/AAAAIBdaAUAAAADgLFwJQAAAAECs > DQxAAAAAoDSWB0AAAAAASu0BQAAAAECuaBRAAAAAoN22IkAAAADAYNTxPwAAAG > D5xwJAAAAAINkzB0A=</data> > </intenArrayBinary> > </spectrum> > > -- > Alexandre Masselot, phD > Senior bioinformatician > www.genebio.com > voice: +41 22 702 99 00 > > > > -------------------------------------------------------------- > ----------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the > chance to share your > opinions on IT & business topics through brief surveys -- and > earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge > &CID=DEVDEV > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Alexandre M. <ol...@ge...> - 2006-09-28 13:11:17
|
Hello, something raised a problem in our code and into my understanding of mzdata format for msms peaklist We have been writing (and still writing...) some opensource perl module InSilicoSpectro (CPAN & http://insilicospectro.vital-it.ch), which does, among plenty of other things MS peaklist conversion from/to various forma= ts In demo file, spectrumentInstrument is set to msLevel=3D"2", then the precursor info is set msLevel=3D"1", which can seems logical. Then I have some LTQ data spectrumentInstrument still msLevel=3D"2", then precursor msLevel=3D"2", which either follows an other logic, no? This happen for all the spectra in each files. Anyone could give me some light would be very welcome, Thanks in advance Alex myo_ms2_1.05 <spectrum id=3D"3"> <spectrumDesc> <spectrumSettings> <acqSpecification spectrumType=3D"discrete" methodOfCombination=3D"sum" count=3D"1"> <acquisition acqNumber=3D"3"/> </acqSpecification> <spectrumInstrument *msLevel=3D"2" *mzRangeStart=3D"100.000000" mzRangeStop=3D"1350.000000"> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000036= " name=3D"ScanMode" value=3D"MassScan"/> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000037= " name=3D"Polarity" value=3D"Positive"/> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000038= " name=3D"TimeInMinutes" value=3D"0.045000"/> <userParam name=3D"ScanType" value=3D"full"/> </spectrumInstrument> </spectrumSettings> <precursorList count=3D"1"> <precursor *msLevel=3D"1" *spectrumRef=3D"1"> <ionSelection> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000040" name=3D"MassToChargeRatio" value=3D"334.82"/> </ionSelection> <activation> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000044" name=3D"Method" value=3D"CID"/> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000045" name=3D"CollisionEnergy" value=3D"28.00"/> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000046" name=3D"EnergyUnits" value=3D"Percent"/> </activation> </precursor> </precursorList> </spectrumDesc> <mzArrayBinary> <data precision=3D"32" endian=3D"little" length=3D"77">+hMVQ1A1K0NGFjFDmBQ5QzxKPUPc/UpDLhJPQ6jrVkMEol1D/GheQygoYUN= OHmdD/kloQyjwaEPm8WpDcDx2Q8otd0O6PXhDICJ7QzCjfUPYfH9DkAeDQ/CHiEOI8YhDqmSJ= Q5TeiUOYUIxDmByPQ+iSj0Pa9Y9DYHCQQzAtkUMIe5FDGNSRQ0RDkkMkxJJDdDqTQzACl0OQj= 5dDdOmXQzhzmENs1ZhDzmiZQ7DOmUM4RZpDMAGbQ0h1m0MU2ptDSDqcQ0TFnUN8Mp5DSqGeQ9= gdn0O+0J9D0HygQwLuoEPiMKJDctqjQxiwpEOcDaVDgIemQ1D/pkPEfadDpP2nQ4hmq0PA/6x= DdOmtQ8gKr0PEM7BDysqwQzg4sUOcZ7ND7Ni4Qyo+uUPClblD2DzmQ+gH8UM=3D</data> </mzArrayBinary> <intenArrayBinary> <data precision=3D"32" endian=3D"little" length=3D"77">ALBORQCAC0QAYDNFAOCoRAAABkUAIOJEAADbRABgkkUAMFxFAJAnRQCAjkQ= AgChEAKBoRQAQeUYAAAFEAIA/RADAPEQAoIdEAKDqRABACUUAABFEAADTRAAO2UYAwE1EwHJg= SAA8HkYAgGhEAAAnRACAeUUAyNtFAPwkRgAW6kYA0F9GAD+tR8B6k0gAHo1GAEAcRADgZUUAI= FZFAJAKRQDAh0YA4KFFAAh3RgCZBkcAMAdFAHixRQAAakQA6L1FAGC9RQBmpUYAAM5EAMiuRQ= AAcEQAoLhEAFAmRQAAtkQA4BlFANBVRQBAAkQAALNDAPxnRgDQvUUAoKlFAOBNRQBAMUUAsCp= FAAC2RADAWkQAEIZFAJDQRQCwCUUAAGpEABB/RQDAlkQAwEBEAIDsQwDANUQ=3D</data> </intenArrayBinary> </spectrum> LTQ exported mzdata <spectrum id=3D"2"> <spectrumDesc> <spectrumSettings> <acqSpecification spectrumType=3D"CentroidMassSpectrum" methodOfCombination=3D"sum" count=3D= "1"> <acquisition acqNumber=3D"2"/> </acqSpecification> <spectrumInstrument *msLevel=3D"2"* mzRangeStart=3D"110.000000" mzRangeStop=3D"910.000000"> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000036= " name=3D"ScanMode" value=3D"MassScan"/> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000037= " name=3D"Polarity" value=3D"Positive"/> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000038= " name=3D"TimeInMinutes" value=3D"14.002800"/> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000035= " name=3D"PeakProcessing" value=3D"ContinuumMassSpectrum"/> </spectrumInstrument> </spectrumSettings> <precursorList count =3D "1"> <precursor *msLevel=3D"2"* spectrumRef=3D"1"> <ionSelection> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000040" name=3D"MassToChargeRatio" value=3D"449.728607"= /> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000041" name=3D"ChargeState" value=3D"2"/> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000041" name=3D"ChargeState" value=3D"3"/> </ionSelection> <activation> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000044" name=3D"Method" value=3D"CID"/> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000045" name=3D"CollisionEnergy" value=3D"35.00"/> </activation> </precursor> </precursorList> </spectrumDesc> <mzArrayBinary> <data precision=3D"64" endian=3D"little" length=3D"26">AAAAgHL8aEAAAADANSppQAAAAMBNAWxAAAAAANhdbEAAAAAAchlwQAAAAOB= ez3FAAAAAwHsSckAAAACAKkJzQAAAAADZEnVAAAAAQDCQdUAAAADg4AR2QAAAAMBrhnZAAAAA= AJdNd0AAAABgUpB3QAAAAOA1sHdAAAAAQGWReEAAAACAUyx6QAAAAAACanpAAAAAIMSgekAAA= ABgh+F6QAAAAOCg9HpAAAAAoL8Be0AAAADgphN7QAAAAOAUhHtAAAAAILUifEAAAABgRjt9QA= =3D=3D</data> </mzArrayBinary> <intenArrayBinary> <data precision=3D"64" endian=3D"little" length=3D"26">AAAAYA24EkAAAAAASOjmPwAAAEAcNOs/AAAAIHZw4z8AAACAIQT7PwAAAEB= 8qxNAAAAAAPF5D0AAAACg6X3/PwAAAMC/+BFAAAAA4JNY7z8AAAAgKVznPwAAAMCfQvI/AAAA= oDyfFUAAAABghBLpPwAAAMAoJgFAAAAAYOGaGkAAAABgf5AYQAAAAMBj6gJAAAAAYHv/6T8AA= ADgiaUFQAAAAAAb2jJAAAAA4D06IEAAAABACm9BQAAAAGBbqRVAAAAAoLdPBUAAAABgNsjpPw= =3D=3D</data> </intenArrayBinary> </spectrum> <spectrum id=3D"3"> <spectrumDesc> <spectrumSettings> <acqSpecification spectrumType=3D"CentroidMassSpectrum" methodOfCombination=3D"sum" count=3D= "1"> <acquisition acqNumber=3D"3"/> </acqSpecification> <spectrumInstrument *msLevel=3D"2"* mzRangeStart=3D"95.000000" mzRangeStop=3D"795.000000"> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000036= " name=3D"ScanMode" value=3D"MassScan"/> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000037= " name=3D"Polarity" value=3D"Positive"/> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000038= " name=3D"TimeInMinutes" value=3D"14.015817"/> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000035= " name=3D"PeakProcessing" value=3D"ContinuumMassSpectrum"/> </spectrumInstrument> </spectrumSettings> <precursorList count =3D "1"> <precursor *msLevel=3D"2"* spectrumRef=3D"2"> <ionSelection> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000040" name=3D"MassToChargeRatio" value=3D"391.069244"= /> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000041" name=3D"ChargeState" value=3D"2"/> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000041" name=3D"ChargeState" value=3D"3"/> </ionSelection> <activation> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000044" name=3D"Method" value=3D"CID"/> <cvParam cvLabel=3D"psi" accession=3D"PSI:1000045" name=3D"CollisionEnergy" value=3D"35.00"/> </activation> </precursor> </precursorList> </spectrumDesc> <mzArrayBinary> <data precision=3D"64" endian=3D"little" length=3D"85">AAAAQAo8XkAAAABgK8FeQAAAAABoWWBAAAAAoGKlYEAAAAAAzyNhQAAAAMA= HVmFAAAAAoD+eYUAAAADAzONhQAAAACAwnmJAAAAAIHzcYkAAAADAkv5iQAAAACC9j2NAAAAA= YBEkZEAAAADAZN1kQAAAAGCnnmVAAAAAgJTiZUAAAADA3QJmQAAAAECVOmdAAAAAwGJPZ0AAA= ABAXBhoQAAAAKASnGlAAAAAoMPgaUAAAACAG1tqQAAAAACIKGtAAAAAALOra0AAAACAOdxrQA= AAAIAPIWxAAAAAALw2bEAAAADAtppsQAAAACCXvWxAAAAAQOWebUAAAADgUeNtQAAAAGBkUG5= AAAAAwCBkb0AAAABgmt9vQAAAAACVL3BAAAAAALhAcEAAAABg6UtwQAAAAADAVXBAAAAAQH9u= cEAAAABA0pBwQAAAACAG83BAAAAAYKYLcUAAAAAA1iZxQAAAAABEMXFAAAAA4IlQcUAAAAAAW= HBxQAAAAOBfjnFAAAAAgOEMckAAAACA5pFyQAAAAECoqXJAAAAA4AjzckAAAACAwDdzQAAAAO= A7k3NAAAAAwCSmc0AAAAAAW8FzQAAAAIBSDHRAAAAAIIlWdEAAAABAaWB0QAAAAMAqj3RAAAA= AAH6hdEAAAABAJDx1QAAAAADEk3VAAAAAgPCxdUAAAACAKL51QAAAAAAo8XVAAAAAIJ+kdkAA= AABgwLp2QAAAAMDAAHdAAAAAQPNBd0AAAACg7k93QAAAAMCPW3dAAAAAADBld0AAAACgvuR3Q= AAAAED5fnhAAAAAwIXkeEAAAABgT4t5QAAAACBvjHpAAAAAwBcSfUAAAABgOpN9QAAAACDxpH= 5AAAAAgBlsgUAAAACAV8GBQAAAAADUFYJAAAAAwJYshEA=3D</data> </mzArrayBinary> <intenArrayBinary> <data precision=3D"64" endian=3D"little" length=3D"85">AAAAwMPJAUAAAABArr70PwAAAMD/xgBAAAAAoH+3/D8AAACgkT4LQAAAAIC= GFhNAAAAAIFnSG0AAAADgCTflPwAAAMAIB29AAAAA4IQHM0AAAAAg3CwDQAAAAOCtROM/AAAA= YPO15z8AAADApR9IQAAAAICUefg/AAAAQMlXMUAAAAAg8krxPwAAAEDJavk/AAAAIAaR8T8AA= ABAuVEkQAAAAMCItglAAAAAwGV95z8AAACAF9o4QAAAAMCmzQ5AAAAAIPfSAUAAAAAAsnwaQA= AAAEBm+e0/AAAAgG4sAEAAAAAg0Sw7QAAAAADCngVAAAAAgFatB0AAAABA5fTsPwAAAAD2lAR= AAAAAYFmJEEAAAADAAo4xQAAAAMBx1QJAAAAAQNaQNEAAAABgJoNTQAAAAGCvGCRAAAAAwBaG= IUAAAABAjE36PwAAAADyNQBAAAAAoKztIkAAAACAlW/tPwAAAIBRJO8/AAAAYNy5CUAAAABAh= iBaQAAAAIDzvgpAAAAA4BzgA0AAAADg3CbyPwAAAOB7xw5AAAAAAImSI0AAAAAgdOwQQAAAAC= AeuPo/AAAAoGo07D8AAABg4JTzPwAAACBtRhVAAAAA4KLT9D8AAACAIQUhQAAAAODRDARAAAA= AACjGFkAAAADApNzpPwAAAEBNTvU/AAAAIBY9CEAAAABAfF0PQAAAAGBcPx1AAAAAgDyQBEAA= AACAMhf6PwAAAADboPI/AAAAgPUCFUAAAAAA4yA6QAAAAEDaqApAAAAAALA0BEAAAABgV+YwQ= AAAAACdvPY/AAAAIBdaAUAAAADgLFwJQAAAAECsDQxAAAAAoDSWB0AAAAAASu0BQAAAAECuaB= RAAAAAoN22IkAAAADAYNTxPwAAAGD5xwJAAAAAINkzB0A=3D</data> </intenArrayBinary> </spectrum> --=20 Alexandre Masselot, phD Senior bioinformatician www.genebio.com voice: +41 22 702 99 00 |
From: Angel P. <an...@ma...> - 2006-09-24 15:11:05
|
I guess I am using "lossy" too loosely (say that 10 time fast). I meant that the conversion of a double or single decimal to the significant figures with respect to the limit of instrument detection we all know and love to see in plain text formats is a lossy translation. I was not implying that going from byte strings to the equivalent ascii translation was a lossy operation. Sorry for the confusion. But send me the C code and I will post it on the docstore. -angel Mike Coleman wrote: > Another way to send the value is to send an ASCII representation of a > decimal number that will, upon being converted using strtof(3), result > in the identical single-precision value. (That decimal number is > *not* typically mathematically equal to the single-precision value, > it's just closer to it than to any other single-precision value.) > > This really *is* a completely lossless representation. > > There are different ways to generate these decimal numbers. It is > sufficient (if not necessarily optimal) to simply use printf(3) with > sufficient precision (e.g., "%.8e"). This will work with > implementations that do correct rounding. Linux (meaning GNU libc) > has done this correctly since at least nine years ago--I would assume > the vendors are doing it right, though this should be confirmed. > > I'm including a small C program that demonstrates what I'm talking > about. It does an exhaustive check for the single-precision case. It > takes a couple of hours to complete, but if you're going to see an > error, it will probably occur pretty quickly. (If you see any errors, > I'd like to know.) > > This doesn't change the fact that 0.1 doesn't have an exact IEEE 754 > representation. That is a separate issue (and one that a base64 > encoding does not address either). > |
From: Mike C. <tu...@gm...> - 2006-09-24 02:25:52
|
[My two previous attempts to send this appear to have failed. My apologies if anyone is seeing multiple copies. I also omitted the C program mentioned below, in case that might be tripping a spam filter. Drop me an email if you want a copy. --Mike] This is dry stuff, but I think it's important to see that IEEE 754 values can be transmitted in decimal form (via mzData) without any loss of precision whatsoever. Let me give a more concrete scenario. In our case, assume we have an IEEE 754 single-precision value in our instrument computer. We want to use mzData to transmit that value to another computer, so that ultimately the latter computer will contain the identical single-precision value. One way to do this is the current method of capturing the 32-bit representation and sending it across using the base64 encoding. Another way to send the value is to send an ASCII representation of a decimal number that will, upon being converted using strtof(3), result in the identical single-precision value. (That decimal number is *not* typically mathematically equal to the single-precision value, it's just closer to it than to any other single-precision value.) This really *is* a completely lossless representation. There are different ways to generate these decimal numbers. It is sufficient (if not necessarily optimal) to simply use printf(3) with sufficient precision (e.g., "%.8e"). This will work with implementations that do correct rounding. Linux (meaning GNU libc) has done this correctly since at least nine years ago--I would assume the vendors are doing it right, though this should be confirmed. I'm including a small C program that demonstrates what I'm talking about. It does an exhaustive check for the single-precision case. It takes a couple of hours to complete, but if you're going to see an error, it will probably occur pretty quickly. (If you see any errors, I'd like to know.) This doesn't change the fact that 0.1 doesn't have an exact IEEE 754 representation. That is a separate issue (and one that a base64 encoding does not address either). As far as the cost of conversion, I agree that it is likely larger than the cost of the base64 encoding. I don't have the libraries at hand to try it out, but I'm sure it would be detectable for large sets of spectra. That notwithstanding, we and everyone else who uses a format like ms2 or dta are already paying this cost, and it doesn't seem particularly onerous. CPU cycles are pretty cheap--human cycles (that transparency might save) are very dear. Mike |
From: Randy J. <rkj...@in...> - 2006-09-23 18:08:25
|
My thought on this is that we should generalize the "data" section to be a 'class' with a 'type'. Kent will go into this in much more detail in DC, but the basic idea, as started in the teleconferences, is that the instrument could be described as a process (protocols) with inputs and outputs (parameterSets). An output should be defined as a class which could take on any number of 'types' which can include all of the XSD data types if we so choose. Generalization like this means that we can define a protocol which takes the base64 data as an input parameter and produces a lossy text representation as an output. The parameters of this protocol and its description would tell us how the conversion was done and what the remaining precision is (significant figures, etc.). In the version of dataXML we will be discussing in DC, the input parameterSet for the above protocol could be another dataXML document with the base64 encoded spectra inside, located at a specified URL and the protocol producing the "peaklist" could simply 'refer' to the source input rather than duplicating it. By this method, the use case where we want a human readable peaklist available in text format derived from an original 'raw' instrument acquisition file located in some repository somewhere can be achieved with the 'peaklist' document containing the least amount of redundant information possible. If we get this right, an XQuery compatible link to the 'original' data can be made allowing almost RDF-like transversal of documents across repositories. The cost of this flexibility is a more abstract schema (which we will review in DC), and much heavier reliance on the ontology. The result does not look like XML, but like RDF implemented in XML. This is all hard to digest without specific examples, so those coming to DC should be prepared to work through their favorite use case to make sure it's all working they way we want. If it is, the good news it that the language bindings and therefore the utilities will come very fast since the API's can be generated directly off the base UML (with the mandatory prestidigitation). Randy -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: Friday, September 22, 2006 4:37 PM To: psi...@li... Subject: Re: [Psidev-ms-dev] Why base64? Jimmy Eng wrote: > I believe base64 encoding makes more sense for some large class of > applications that will hopefully be digesting these files but I'm sure > everyone can see the obvious benefits of plain text encoding of peak > lists. > > The question I have is regarding the representation of space delimited > lists as Lewis and Randy have drawn up. Does this address the needs > of Michael, Steve, and Akhilesh and others? Hopefully they'll all > chime in. My concern would be that having a horizontal, space > separate list of numbers, where m/z and intensity will possibly be > written in separate lists of floats and ints, doesn't really serve the > notion of readability. Lots of folks are used to looking at lists of > peaks as ordered in .mgf or .dta files and I'm not sure if a > horizontal list of numbers (especially if it's 2 lists, one for m/z > and one for intensity) gives you that same sense of readability. I > don't really see any regular use case scenarios where people would be > scrolling over to the 68th m/z in the list and then somehow counting > over to the location of the 68th intensity to get its value. > > So _if_ this really doesn't address the needs of the folks who have > concerns about the base64 encoding and would like like to see plain > text, speak up. The last thing the format needs is more complexity > in the form of another optional way of representing the data that only > a handful of people will ever end up using. > > - Jimmy > > All excellent points. Let me see if I can recap the set of arguments: 1) For high-throughput and computational task, base64 encoding is fast, robust and reasonable with respect to size 2) Text formats are not useful unless they are formatted in an easily digestible fashion 3) Point #2 often conflicts with point #1 4) Ambiguity in a format is universally seen as a "bad thing" The best suggestion I could think of would be to just go ahead and officially endorse our current standard operating procedures. By this I mean first and foremost, that the official format be restricted to binary encoded data arrays. This is the format officially supported by hardware and software vendors. Second, that we endorse one of the /de facto/ plain text formats (MS2 or MGF) as the best way to encode plain text data, *and *(this is the important bit) the official PSI API's provide export to the endorsed plain text format. Notice that I didn't say import, since this operation is a lossy one, as covered in other posts. Or if we do provide import routines, they come with the large caveat that the transformation may have been lossy. The problem I see with this is that the I do not know if MS2 or MGF handle data other than MS2 or from multiple analyzers and detectors. They also generally have a much more restricted set of annotations good idea? bad idea? Something to discuss in DC at least... -angel ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |
From: Angel P. <an...@ma...> - 2006-09-22 20:37:18
|
Jimmy Eng wrote: > I believe base64 encoding makes more sense for some large class of > applications that will hopefully be digesting these files but I'm sure > everyone can see the obvious benefits of plain text encoding of peak > lists. > > The question I have is regarding the representation of space delimited > lists as Lewis and Randy have drawn up. Does this address the needs > of Michael, Steve, and Akhilesh and others? Hopefully they'll all > chime in. My concern would be that having a horizontal, space > separate list of numbers, where m/z and intensity will possibly be > written in separate lists of floats and ints, doesn't really serve the > notion of readability. Lots of folks are used to looking at lists of > peaks as ordered in .mgf or .dta files and I'm not sure if a > horizontal list of numbers (especially if it's 2 lists, one for m/z > and one for intensity) gives you that same sense of readability. I > don't really see any regular use case scenarios where people would be > scrolling over to the 68th m/z in the list and then somehow counting > over to the location of the 68th intensity to get its value. > > So _if_ this really doesn't address the needs of the folks who have > concerns about the base64 encoding and would like like to see plain > text, speak up. The last thing the format needs is more complexity > in the form of another optional way of representing the data that only > a handful of people will ever end up using. > > - Jimmy > > All excellent points. Let me see if I can recap the set of arguments: 1) For high-throughput and computational task, base64 encoding is fast, robust and reasonable with respect to size 2) Text formats are not useful unless they are formatted in an easily digestible fashion 3) Point #2 often conflicts with point #1 4) Ambiguity in a format is universally seen as a "bad thing" The best suggestion I could think of would be to just go ahead and officially endorse our current standard operating procedures. By this I mean first and foremost, that the official format be restricted to binary encoded data arrays. This is the format officially supported by hardware and software vendors. Second, that we endorse one of the /de facto/ plain text formats (MS2 or MGF) as the best way to encode plain text data, *and *(this is the important bit) the official PSI API's provide export to the endorsed plain text format. Notice that I didn't say import, since this operation is a lossy one, as covered in other posts. Or if we do provide import routines, they come with the large caveat that the transformation may have been lossy. The problem I see with this is that the I do not know if MS2 or MGF handle data other than MS2 or from multiple analyzers and detectors. They also generally have a much more restricted set of annotations good idea? bad idea? Something to discuss in DC at least... -angel |
From: Brian P. <bri...@in...> - 2006-09-20 22:00:24
|
Hi Michael, Not sure I follow you... isn't "0.1" decimal? BTW that http://mail.python.org/pipermail/python-dev/2004-March/043742.html discussion ends on this note: "> Remember that every binary floating-point number has an exact decimal > representation (though the reverse, of course, is not true). Yup." So no, you can't always make the roundtrip without introducing error. More importantly, you can't always read an ASCII decimal value and compute with it without introducing error. And, as I mentioned before, that decimal->binary conversion isn't cheap. - Brian > -----Original Message----- > From: psi...@li... > [mailto:psi...@li...] On > Behalf Of Coleman, Michael > Sent: Wednesday, September 20, 2006 2:44 PM > To: bri...@in...; psi...@li... > Subject: Re: [Psidev-ms-dev] Why base64? > > > Brian Pratt: > > > ...with properly implemented numeric I/O > > > routines (in libc), you can have a 1-1 mapping between the > > > internal and > > > ASCII representation, so that it is possible to round trip without > > > introducing error. > > > > Well, no, but this is something a lot of folks don't realize. > > For (a previously cited by Randy) example consider "0.1" - see > > http://www.yoda.arachsys.com/csharp/floatingpoint.html for an > > explanation. > > I think we're talking about two different things. As you > say, 0.1 does > not have an exact IEEE 754 representation. I'm talking about > conversion > between decimal and IEEE 754. Intuitively, for each IEEE 754 double, > there are a set of decimal numbers closer to it than to any other > double. Of that set, one will have the shortest decimal > representation, > after all trailing zeros have been truncated. (There may be > two, which > is handled by round-to-even.) This representation can in turn be > uniquely mapped back to the double. I think that something > like this is > specified by IEEE 754, but I can't find an exact reference on the web. > Java specifies this: > > > http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Double.html# > toString(d > ouble) > > and here's a discussion that seems to reference it > > > http://mail.python.org/pipermail/python-dev/2004-March/043742.html > > I probably don't have the details exactly right, but I > believe the basic > idea is correct. The effect of this is that it is possible to use a > decimal representation without introducing any error. My preference, > though, would still be to round away the noise digits. > > Mike > > > > -------------------------------------------------------------- > ----------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the > chance to share your > opinions on IT & business topics through brief surveys -- and > earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge > &CID=DEVDEV > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |
From: Coleman, M. <MK...@St...> - 2006-09-20 21:44:53
|
> Brian Pratt: > > ...with properly implemented numeric I/O > > routines (in libc), you can have a 1-1 mapping between the=20 > > internal and > > ASCII representation, so that it is possible to round trip without > > introducing error.=20 >=20 > Well, no, but this is something a lot of folks don't realize.=20 > For (a previously cited by Randy) example consider "0.1" - see > http://www.yoda.arachsys.com/csharp/floatingpoint.html for an=20 > explanation. I think we're talking about two different things. As you say, 0.1 does not have an exact IEEE 754 representation. I'm talking about conversion between decimal and IEEE 754. Intuitively, for each IEEE 754 double, there are a set of decimal numbers closer to it than to any other double. Of that set, one will have the shortest decimal representation, after all trailing zeros have been truncated. (There may be two, which is handled by round-to-even.) This representation can in turn be uniquely mapped back to the double. I think that something like this is specified by IEEE 754, but I can't find an exact reference on the web. Java specifies this: =20 http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Double.html#toString(d ouble) and here's a discussion that seems to reference it =09 http://mail.python.org/pipermail/python-dev/2004-March/043742.html I probably don't have the details exactly right, but I believe the basic idea is correct. The effect of this is that it is possible to use a decimal representation without introducing any error. My preference, though, would still be to round away the noise digits. Mike |
From: Brian P. <bri...@in...> - 2006-09-20 19:21:52
|
> If I understand correctly, with properly implemented numeric I/O > routines (in libc), you can have a 1-1 mapping between the > internal and > ASCII representation, so that it is possible to round trip without > introducing error. Well, no, but this is something a lot of folks don't realize. For (a previously cited by Randy) example consider "0.1" - see http://www.yoda.arachsys.com/csharp/floatingpoint.html for an explanation. > One additional note: We seem to be assuming that mass specs > all already > do IEEE FP. Is this actually true? AFAIK, yes. That's a wheel that nobody has cared to reinvent for some time now. - Brian > -----Original Message----- > From: Coleman, Michael [mailto:MK...@St...] > Sent: Wednesday, September 20, 2006 11:17 AM > To: bri...@in...; psi...@li... > Subject: RE: [Psidev-ms-dev] Why base64? > > > Brian Pratt: > > > Accuracy: Mass spec data in its raw form is generally stored > > in binary formats, since mass specs are front ended by binary > > computers. Conversion to and from base 10 human readable > > representations introduces error. It's best to hold the data at its > > original precision and translate out to human readable format > > at whatever precision is deemed useful for eyeballing. > > This is a complicated topic and I don't claim to be an expert by any > means. Here's my understanding. > > Error is present, and we want to avoid amplifying it. If, > for example, > the instrument has an internal IEEE FP value 1234.56789012345 and we > know that its precision is only +/- 0.1, then there's no particular > benefit (nor harm) to reporting this as anything beyond 1234.6 or > 1234.57. The 0.00089012345 is more or less noise. > > As a practical matter, it might be more efficient to move the > IEEE bits > directly from the instrument to the mzData file. A cost of > doing this, > though, is that this format is not human-readable. > > An alternative would be to fully represent the IEEE bits as a number. > If I understand correctly, with properly implemented numeric I/O > routines (in libc), you can have a 1-1 mapping between the > internal and > ASCII representation, so that it is possible to round trip without > introducing error. This *would* make the textual > representation larger, > and it's not clear that it really makes sense to do this, > because of the > noise issue (above). > > One additional note: We seem to be assuming that mass specs > all already > do IEEE FP. Is this actually true? > > > > File size: Sure, you can make files smaller by throwing away > > precision, but as you begin to desire higher precision > base64 quickly > > becomes much more efficient. > > Just to confirm, I agree that discarding *real* precision is > unacceptable. (By "real", I mean what's being physically > measured, not > bits that are an artifact of the IEEE representation.) > > Mike > |
From: Geer, L. \(NIH/NLM/NCBI\) [E] <le...@nc...> - 2006-09-20 18:25:57
|
Hi, Jimmy, Sorry, should have said "whitespace delimited" instead of "space delimited" where XML considers whitespace to be a carriage return, a linefeed, a tab, and/or a space. As Michael implies, this means the numbers can sit on different lines and that there is no reason the numbers could be grouped so the first number is m/z, the second is intensity, etc. Lewis > -----Original Message----- > From: Jimmy Eng [mailto:jk...@gm...]=20 > Sent: Wednesday, September 20, 2006 1:12 PM > To: psi...@li... > Subject: Re: [Psidev-ms-dev] Why base64? >=20 > I believe base64 encoding makes more sense for some large class of > applications that will hopefully be digesting these files but I'm sure > everyone can see the obvious benefits of plain text encoding of peak > lists. >=20 > The question I have is regarding the representation of space delimited > lists as Lewis and Randy have drawn up. Does this address the needs > of Michael, Steve, and Akhilesh and others? Hopefully they'll all > chime in. My concern would be that having a horizontal, space > separate list of numbers, where m/z and intensity will possibly be > written in separate lists of floats and ints, doesn't really serve the > notion of readability. Lots of folks are used to looking at lists of > peaks as ordered in .mgf or .dta files and I'm not sure if a > horizontal list of numbers (especially if it's 2 lists, one for m/z > and one for intensity) gives you that same sense of readability. I > don't really see any regular use case scenarios where people would be > scrolling over to the 68th m/z in the list and then somehow counting > over to the location of the 68th intensity to get its value. >=20 > So _if_ this really doesn't address the needs of the folks who have > concerns about the base64 encoding and would like like to see plain > text, speak up. The last thing the format needs is more complexity > in the form of another optional way of representing the data that only > a handful of people will ever end up using. >=20 > - Jimmy >=20 >=20 > On 9/20/06, Randy Julian <rkj...@in...> wrote: > > Hi, > > > > This works quite nicely! > > > > <?xml version=3D"1.0" encoding=3D"UTF-8"?> > > <xs:schema xmlns:xs=3D"http://www.w3.org/2001/XMLSchema" > > elementFormDefault=3D"qualified" = attributeFormDefault=3D"unqualified"> > > <xs:element name=3D"root"> > > <xs:complexType> > > <xs:sequence> > > <xs:element name=3D"MyList"> > > <xs:simpleType> > > <xs:list > > itemType=3D"xs:float"/> > > </xs:simpleType> > > </xs:element> > > </xs:sequence> > > </xs:complexType> > > </xs:element> > > </xs:schema> > > > > Validates: > > > > <?xml version=3D"1.0" encoding=3D"UTF-8"?> > > <root xmlns:xsi=3D"http://www.w3.org/2001/XMLSchema-instance" > > xsi:noNamespaceSchemaLocation=3D"list.xsd"> > > <MyList>1.1 1.2 1.3</MyList> > > </root> > > > > Any thoughts about the use of this in the schema? > > > > Randy > > > > -----Original Message----- > > From: Geer, Lewis (NIH/NLM/NCBI) [E]=20 > [mailto:le...@nc...] > > Sent: Wednesday, September 20, 2006 10:27 AM > > To: Randy Julian; psi...@li... > > Subject: RE: [Psidev-ms-dev] Why base64? > > > > Hi, > > > > XML-schema does allow space delimited lists: > > > > <xsd:simpleType name=3D"listOfMyIntType"> > > <xsd:list itemType=3D"integer"/> > > </xsd:simpleType> > > > > <listOfMyInt>20003 15037 95977 95945</listOfMyInt> > > > > Lewis > > > > > > > -----Original Message----- > > > From: Randy Julian [mailto:rkj...@in...] > > > Sent: Wednesday, September 20, 2006 10:12 AM > > > To: psi...@li... > > > Subject: Re: [Psidev-ms-dev] Why base64? > > > > > > This is a very interesting question which has come up several > > > times before. > > > As we work to develop dataXML (mzData 2.0) we should take=20 > all of these > > > concerns into consideration. > > > > > > Originally, mzData had both a binary and regular XML notation > > > for both data > > > vectors. The XML-schema data types where tested by most of > > > the vendors who > > > did not see the file size compression benefits you mention > > > because they did > > > not feel they had the ability to round either of the vectors > > > in the way you > > > suggest. Since the use case: 'user opens mzData file with > > > notepad and see > > > peaks' was not viewed as a major request, the vendors > > > unanimously voted the > > > non-binary arrays out for size and performance reasons (see > > > the meeting > > > notes from the PSI meeting in Nice). > > > > > > The loss of readability may now have larger consequences than > > > we considered > > > back then. Steve Stein's comments are good ones. I we=20 > now have broad > > > enough adoption that we want to be able to open the file and > > > see the numbers > > > written out in XML, then we should reconsider the validity of > > > the use case. > > > To do this with mzData 1.05 you would have to use the > > > supplemental data > > > vector (the alternative Angel suggested). > > > > > > The supplemental data vectors hold any type of XSD data=20 > type including > > > normal XML. However in mzData 1.05, the binary vectors are > > > not optional, so > > > you have to populate them to comply with the spec - even if > > > you repeat the > > > information in the supplemental vector. > > > > > > The suggested 'white space separated list' is not a valid XML > > > data type, so > > > if we want to keep with the XSD standard for validation, the > > > peak lists have > > > to be in markup like: > > > > > > <peak> > > > <mz> > > > <float>0.1</float> > > > </mz> > > > <inten> > > > <float>100.1</float> > > > </inten> > > > </peak> > > > > > > or something similar. Other semantics could reduce the > > > verbosity, but the > > > basic idea is that we can only use valid XSD data types. > > > > > > As we move to dataXML, we will need to store other data > > > objects besides mass > > > spectra (MRM chromatograms for example), so we will have to > > > come up with a > > > more general data section regardless of the data types > > > allowed. During this > > > design phase we should decide what data types we want. > > > > > > As a historical note, the previous (current) LC-MS standard > > > format uses > > > netCDF as the data representation which is fully binary=20 > and utterly > > > unreadable in any respect without an API. Thus this > > > situation has existed > > > in mass spectrometry for quite some time. The readability of > > > these files > > > has never been viewed as a serious weakness, although the > > > 1.5-2x increase in > > > file size over the original vendor file was the source of constant > > > complaint. > > > > > > Just as a note for your comment #3, this is not so straight > > > forward. If the > > > instrument collects data using an Intel chip, floating-point > > > raw data will > > > most likely have a IEEE-754 representation. So any time you > > > have a number > > > in a file like 0.1, the internal representation was > > > originally different > > > (0.1 cannot be exactly represented in IEEE-754). When you > > > read from the file > > > into an IEEE standard format, it will not be 0.1 in any of > > > the math you do. > > > > > > Let the PSI-MS team know what requirements you would like to > > > see the HUPO > > > standards meet. If there is strong user support for missing > > > features, the > > > team will include them in the development roadmap. > > > > > > Let's keep the discussion of improvements going! > > > > > > Randy > > > > > > > > > -----Original Message----- > > > From: psi...@li... > > > [mailto:psi...@li...] On > > > Behalf Of Coleman, > > > Michael > > > Sent: Tuesday, September 19, 2006 4:39 PM > > > To: psi...@li... > > > Subject: [Psidev-ms-dev] Why base64? > > > > > > Hi, > > > > > > Does anyone know why base64 encoding is being used for peak mz and > > > intensity values in the mzData format? It appears to me that > > > there are > > > three significant disadvantages to doing so: > > > > > > 1. Loss of readability. One of the primary reasons to=20 > use XML in the > > > first place is that it is human-readable--one can in=20 > principle inspect > > > and understand its contents with any text editor. > > > Base64-encoding peak > > > data destroys this transparency. (It also makes it more=20 > difficult to > > > write scripts to process the data.) > > > > > > 2. Increased file size. At least for our spectra, it=20 > appears that a > > > compressed (gzip/etc) ms2 file is about 15% smaller than the > > > equivalent > > > mzData file with the single-precision (32-bit) encoding, and > > > 22% smaller > > > than the double-precision version. The *uncompressed* > > > single-precision > > > mzData file is about about 15% smaller than the=20 > uncompressed ms2 file; > > > the double-precision version is almost twice as large. =20 > (These figures > > > are for 'gzip' default compression.) > > > > > > (Currently our ms2 files have mz values rounded to one > > > decimal place and > > > intensity values with about 4-5 significant places.) > > > > > > 3. Potential loss of precision information. For example, with > > > single-precision encoding, a value originally given as > > > 12345.1 might be > > > encoded as 12345.0996. It's not easy to see from that > > > encoding that the > > > original value was given with one decimal place. =20 > Worse-still, if the > > > original value is significant to more than 7-or-so digits=20 > and it gets > > > 32-bit encoded, precision will be lost, probably in a way not > > > immediately apparent to the user. (32-bit encoding will=20 > probably be a > > > temptation, given the size of the 64-bit encoding.) > > > > > > Even if base64-encoding cannot be dropped at this point,=20 > it seems like > > > it would be useful to add a "no encode" option, which would > > > present peak > > > data as the obvious whitespace-separated list of numeric values. > > > > > > Am I missing something here? I could not find any=20 > discussion of this > > > issue on the list. > > > > > > --Mike > > > > > > > > > Mike Coleman, Scientific Programmer, +1 816 926 4419 > > > Stowers Institute for Biomedical Research > > > 1000 E. 50th St., Kansas City, MO 64110, USA > > > > > > -------------------------------------------------------------- > > > ----------- > > > Take Surveys. Earn Cash. Influence the Future of IT > > > Join SourceForge.net's Techsay panel and you'll get the > > > chance to share your > > > opinions on IT & business topics through brief surveys -- and > > > earn cash > > > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge > > > &CID=3DDEVDEV > > > _______________________________________________ > > > Psidev-ms-dev mailing list > > > Psi...@li... > > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > > > > -------------------------------------------------------------- > > > ----------- > > > Take Surveys. Earn Cash. Influence the Future of IT > > > Join SourceForge.net's Techsay panel and you'll get the > > > chance to share your > > > opinions on IT & business topics through brief surveys -- and > > > earn cash > > > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge > > > &CID=3DDEVDEV > > > _______________________________________________ > > > Psidev-ms-dev mailing list > > > Psi...@li... > > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > > > > > >=20 > -------------------------------------------------------------- > ----------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the=20 > chance to share your > > opinions on IT & business topics through brief surveys --=20 > and earn cash > >=20 > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge > &CID=3DDEVDEV > > _______________________________________________ > > Psidev-ms-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > >=20 > -------------------------------------------------------------- > ----------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the=20 > chance to share your > opinions on IT & business topics through brief surveys -- and=20 > earn cash > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge > &CID=3DDEVDEV > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev >=20 |
From: Coleman, M. <MK...@St...> - 2006-09-20 18:18:16
|
> Randy Julian: > The XML-schema data types where tested by most of the vendors who > did not see the file size compression benefits you mention because they did > not feel they had the ability to round either of the vectors in the way you > suggest. I'm not unsympathetic to this practical concern. The most important thing would be to allow the textual representation as an equal variant (i.e., not buried in the supplemental data section). I'm not sure I see why generating the textual representation would be difficult, though. My guess is that the vendors will continue to use their own proprietary formats to do the initial recording of data, only translating to mzData as a final step. If the final step is carried out on a platform with a real libc, this looks like it would be straightforward. > Just as a note for your comment #3, this is not so straight=20 > forward. If the instrument collects data using an Intel chip, floating-point=20 > raw data will most likely have a IEEE-754 representation. So any time you=20 > have a number in a file like 0.1, the internal representation was=20 > originally different (0.1 cannot be exactly represented in IEEE-754). When you=20 > read from the file into an IEEE standard format, it will not be 0.1 in any of=20 > the math you do. I agree that this is complicated. As far as the mzData standard goes, probably the biggest thing that would help here would be a way for the data producer to indicate, in the mzData file, their idea of the accuracy of the measurements. If I understand correctly, currently this is implied, or communicated outside of the mzData file. Please let me add that I think the mzData format is a great improvement over the array of formats that it's meant to replace. I'd like this representation issue to be resolved in the best way possible, but it's certainly minor in the overall scheme of things. Mike |
From: Coleman, M. <MK...@St...> - 2006-09-20 18:17:56
|
Jimmy makes an excellent point: some textual representations would be more useful than others. As far as I know, whitespace is whitespace in XML, so I would hope that producers of mzData files would choose whitespace to enhance the readability of the file. I wasn't really thinking it through, but the "ideal" representation I've been assuming in my head would look something like this <peaklist> 123.4 123 125.3 123343 127.4 23423 </peaklist> Obviously, as Brian points out, the form that splits the mz and intensity lists isn't as friendly <mzArray> 123.4 125.3 127.4 </mzArray> <intenArray> 123 123343 23423 </intenArray> I think someone mentioned something like this <peaklist> <peak><mz>123.4</mz><inten>123</inten></peak> <peak><mz>125.3</mz><inten>123343</inten></peak> <peak><mz>127.4</mz><inten>23423</inten></peak> </peaklist> which is better than the second but not as good as the first above. This seems more XML-ish than the above two format (or the current binary arrays), at the expense of being very verbose. With judicious addition of whitespace, this could made more readable (at a further cost in size) <peaklist> <peak><mz> 123.4 </mz><inten> 123 </inten></peak> <peak><mz> 125.3 </mz><inten> 123343 </inten></peak> <peak><mz> 127.4 </mz><inten> 23423 </inten></peak> </peaklist> Personally I'd be quite happy with the first form (at the top of this post). It may be slightly lacking in XML purity, but it's very readable, and it's clear what its semantics should be. The only real disadvantage I see is that it's a little different than the current mzData scheme, which breaks mz and intensity into separate lists. Mike |
From: Coleman, M. <MK...@St...> - 2006-09-20 18:17:46
|
> Brian Pratt: > Accuracy: Mass spec data in its raw form is generally stored=20 > in binary formats, since mass specs are front ended by binary > computers. Conversion to and from base 10 human readable=20 > representations introduces error. It's best to hold the data at its > original precision and translate out to human readable format=20 > at whatever precision is deemed useful for eyeballing. This is a complicated topic and I don't claim to be an expert by any means. Here's my understanding. Error is present, and we want to avoid amplifying it. If, for example, the instrument has an internal IEEE FP value 1234.56789012345 and we know that its precision is only +/- 0.1, then there's no particular benefit (nor harm) to reporting this as anything beyond 1234.6 or 1234.57. The 0.00089012345 is more or less noise. As a practical matter, it might be more efficient to move the IEEE bits directly from the instrument to the mzData file. A cost of doing this, though, is that this format is not human-readable. An alternative would be to fully represent the IEEE bits as a number. If I understand correctly, with properly implemented numeric I/O routines (in libc), you can have a 1-1 mapping between the internal and ASCII representation, so that it is possible to round trip without introducing error. This *would* make the textual representation larger, and it's not clear that it really makes sense to do this, because of the noise issue (above). One additional note: We seem to be assuming that mass specs all already do IEEE FP. Is this actually true? > File size: Sure, you can make files smaller by throwing away=20 > precision, but as you begin to desire higher precision base64 quickly > becomes much more efficient. Just to confirm, I agree that discarding *real* precision is unacceptable. (By "real", I mean what's being physically measured, not bits that are an artifact of the IEEE representation.) Mike |