From: Brian P. <bri...@in...> - 2006-10-05 15:59:23
|
I'm strongly opposed to the change. In addtion to the previously discussed concerns about accuracy and the fundamental pointlessness due to the unsuitability of XML for eyeballing what is essentially columnar data, there's an additional and perhaps deeper practical concern: A data exchange standard that provides many ways to express the same idea is headed for the rocks. Vendors will tend to implement only the parts of the standard that interest them and the ecosystem quickly breaks down (I speak from experience with interchange standards in the internet security and circuit board manufacturing software industries, it's a phenomenon not peculiar to any one field of endeavor). A standard that provides n>1 ways to state the same thing is n times as difficult to implement and maintain, which reduces vendor enthusiasm by a factor of n (squared?), which hinders widespread adoption. As we sometimes say in the States, "If it ain't broke, don't fix it." Brian Pratt _____ From: psi...@li... [mailto:psi...@li...] On Behalf Of Pierre-Alain Binz Sent: Thursday, October 05, 2006 5:10 AM To: Randy Julian Cc: psi...@li... Subject: Re: [Psidev-ms-dev] FW: Why base64? I am for the possibility to represent a spectrum/peaklist/even chromatogram in more than one manner ONLY if these representations are easy and straighforward to generate and to parse AND if there is a good (or better blocking) reason to do so. We need to avoid optional things that make any implementation subject to interpretation and missunderstanding. So yes only if the two formats are strictly and clearly described and discriminated (specification issue) Pierre-Alain Randy Julian wrote: There was concern in the NBT review of the mzData manuscript that the format was not able specifically designed for either quantitation or 'raw' data. Quite the opposite is true - it handles these better than it handles a 'peak list'. Given the broad scope we are going for, I think mzData 2.0 needs to cover both of Mike's suggestions. The representation should allow an ASCII list representation, _and_ a base64 list option. Within each of these, the _desired_ precision should be used. If you want to make some kind of 21CFR11 claim regarding GLP or GCP for clinical data (metabolites, proteins or biomarker analyses) then the ability to represent 'raw' data is critical and part of the current design. It is the simple case of 'represent a single tandem MS spectrum of a single peptide at only the precision of the m/z calibration' that is harder than it needs to be with the current representation. During the Washington PSI meeting a proposal was made to re-introduce the ASCII data representation that was dropped at the PSI meeting in Nice. What does everyone think of this idea? Randy -----Original Message----- From: psi...@li... [mailto:psi...@li...] On Behalf Of Mike Coleman Sent: Wednesday, October 04, 2006 3:13 PM To: Angel Pizarro Cc: Psi...@li... Subject: Re: [Psidev-ms-dev] Why base64? [This message seems to have been bounced by Sourceforge, so I'm resending it. I'm sorry to see that apparently they are having serious email problems these days. See today's Slashdot article at http://it.slashdot.org/article.pl?sid=06/10/04/1324214. (Apparently the problem isn't limited to email coming from gmail accounts.) ] On 9/28/06, Mike Coleman <mailto:tu...@gm...> <tu...@gm...> wrote: Makes sense. To put it in other words, there are two questions here: 1. Are the values represented as base64-encoded bitstrings or as ASCII text? 2. Should the values be rounded to the precision of the instrument (probably plus a digit, etc.), or should an arbitrary number of figures be used? Again, this isn't about losing information, as we're only discussing rounding away noise. These two questions are entirely orthogonal, as far as I can see, and it would be possible to allow both options for both questions, if this were seen as being worthwhile. The one interaction is that if you use the ASCII text encoding, rounding the figures will make the mzData file smaller. Regarding ambiguity, the ASCII text representation would allow differing whitespace (which produce no semantic difference). I guess the base64 encoding also allows differing surrounding whitespace. With respect to the base64 encoding, one corner case comes to mind. Are special IEEE values like NaN, the infinities, negative zero, etc., allowed? If so, what should the interpretation be? Mike The example code I mentioned: /* gcc -g -O2 -ffloat-store -o ieee-test ieee-test.c */ /* strtof is GNU/C99 */ #define _GNU_SOURCE #include <assert.h> #include <errno.h> #include <limits.h> #include <stdio.h> #include <stdlib.h> union bits { unsigned int u; float f; }; int main() { unsigned int i; union bits x, x2; int zeros_seen = 0; assert(sizeof x.u == sizeof x.f); assert(&x.u == &x.f); for (i=0; ; i++) { char buf[128]; if (i == 0) if (++zeros_seen > 1) break; #if 0 if (!(i % 100000)) putc('.', stderr); #endif x.u = i; if (x.f != x.f) continue; /* skip error values */ sprintf(buf, "%.8e", x.f); errno = 0; x2.f = strtof(buf, 0); if (errno == ERANGE) { printf("strtof error for %s\n", buf); continue; } if (x2.u != x.u) printf("bit difference for %s (%u != %u)\n", buf, x2.u, x.u); } } ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php <http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV> &p=sourceforge&CID=DEVDEV _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php <http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV> &p=sourceforge&CID=DEVDEV _______________________________________________ Psidev-ms-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev -- -- Dr. Pierre-Alain Binz Swiss Institute of Bioinformatics Proteome Informatics Group 1, Rue Michel Servet CH-1211 Geneve 4 Switzerland - - - - - - - - - - - - - - - - - Tel: +41-22-379 50 50 Fax: +41-22-379 58 58 Pie...@is... http://www.expasy.org/people/Pierre-Alain.Binz.html |