From: Talapady N B. <bh...@ni...> - 2006-10-06 14:35:43
|
Hi, I fully agree. Rigid standards usually stay only on 'paper' and they foster chaos. 'import/export' codes are the breading grounds for multiple standards. Best regards, T N Bhat ----- Original Message ----- From: "Geer, Lewis (NIH/NLM/NCBI) [E]" <le...@nc...> To: <psi...@li...> Sent: Friday, October 06, 2006 10:27 AM Subject: Re: [Psidev-ms-dev] FW: Why base64? > Hi, > > I guess the general experience at NCBI is to make standards as flexible > as possible while making them as explicit, easy to read, and validatible > as possible. The pain of having multiple representations within the > same standard is much less than the pain of having multiple standards, > which can happen if a particular standard is too rigid. > > The "easy to read" requirement means by both machine and human -- human > readable probably being the most important because of all of the endless > debugging required when reading and writing files. It seems much more > fun writing new applications than dealing with import/export code! > > Lewis > > -----Original Message----- > From: Angel Pizarro [mailto:an...@ma...] > Sent: Thursday, October 05, 2006 3:17 PM > To: psi...@li... > Subject: Re: [Psidev-ms-dev] FW: Why base64? > > > I have to second Brian on this one. From the operational and reporting > requirements, having both ascii and binary representations just adds > confusion. Better to address the problem of perceived complexity and > general usage through tool development efforts. > > Also, this case: > > It is the simple case of 'represent a single tandem MS spectrum of a > > single peptide at only the precision of the m/z calibration' that is > > harder than it needs to be with the current representation. > > > is not used outside of post analysis verification of the spectra (e.g. > was the assignement of spectra valid, where the right peaks used for > quant, etc.) Very low-throughput and NOT viewed outside of the analysis > context. > > This is just my perception though, so if someone has an example please > speak up. > > angel > > > Brian Pratt wrote: > > I'm strongly opposed to the change. In addtion to the previously > > discussed concerns about accuracy and the fundamental pointlessness > > due to the unsuitability of XML for eyeballing what is essentially > > columnar data, there's an additional and perhaps deeper practical > concern: > > > > A data exchange standard that provides many ways to express the same > > idea is headed for the rocks. Vendors will tend to implement only the > > > parts of the standard that interest them and the ecosystem quickly > > breaks down (I speak from experience with interchange standards in the > > > internet security and circuit board manufacturing software industries, > > > it's a phenomenon not peculiar to any one field of endeavor). A > > standard that provides n>1 ways to state the same thing is n times as > > difficult to implement and maintain, which reduces vendor enthusiasm > > by a factor of n (squared?), which hinders widespread adoption. > > > > As we sometimes say in the States, "If it ain't broke, don't fix it." > > > > Brian Pratt > > > > > > > ------------------------------------------------------------------------ > > *From:* psi...@li... > > [mailto:psi...@li...] *On Behalf Of > > *Pierre-Alain Binz > > *Sent:* Thursday, October 05, 2006 5:10 AM > > *To:* Randy Julian > > *Cc:* psi...@li... > > *Subject:* Re: [Psidev-ms-dev] FW: Why base64? > > > > I am for the possibility to represent a spectrum/peaklist/even > > chromatogram in more than one manner ONLY if these representations > > are easy and straighforward to generate and to parse AND if there > > is a good (or better blocking) reason to do so. We need to avoid > > optional things that make any implementation subject to > > interpretation and missunderstanding. > > So yes only if the two formats are strictly and clearly described > > and discriminated (specification issue) > > > > Pierre-Alain > > > > Randy Julian wrote: > >> There was concern in the NBT review of the mzData manuscript that > the format > >> was not able specifically designed for either quantitation or > 'raw' data. > >> Quite the opposite is true - it handles these better than it > handles a 'peak > >> list'. > >> > >> Given the broad scope we are going for, I think mzData 2.0 needs > to cover > >> both of Mike's suggestions. > >> > >> The representation should allow an ASCII list representation, > _and_ a base64 > >> list option. Within each of these, the _desired_ precision > should be used. > >> If you want to make some kind of 21CFR11 claim regarding GLP or > GCP for > >> clinical data (metabolites, proteins or biomarker analyses) then > the ability > >> to represent 'raw' data is critical and part of the current > >> design. > >> > >> It is the simple case of 'represent a single tandem MS spectrum > of a single > >> peptide at only the precision of the m/z calibration' that is > harder than it > >> needs to be with the current representation. > >> > >> During the Washington PSI meeting a proposal was made to > re-introduce the > >> ASCII data representation that was dropped at the PSI meeting in > Nice. What > >> does everyone think of this idea? > >> > >> Randy > >> > >> -----Original Message----- > >> From: psi...@li... > >> [mailto:psi...@li...] On Behalf Of > Mike > >> Coleman > >> Sent: Wednesday, October 04, 2006 3:13 PM > >> To: Angel Pizarro > >> Cc: Psi...@li... > >> Subject: Re: [Psidev-ms-dev] Why base64? > >> > >> [This message seems to have been bounced by Sourceforge, so I'm > >> resending it. I'm sorry to see that apparently they are having > >> serious email problems these days. See today's Slashdot article > at > >> http://it.slashdot.org/article.pl?sid=06/10/04/1324214. > (Apparently > >> the problem isn't limited to email coming from gmail accounts.) ] > >> > >> On 9/28/06, Mike Coleman <tu...@gm...> wrote: > >> > >>> Makes sense. To put it in other words, there are two questions > >>> here: > >>> > >>> 1. Are the values represented as base64-encoded bitstrings or > >>> as ASCII > >>> > >> text? > >> > >>> 2. Should the values be rounded to the precision of the > instrument > >>> (probably plus a digit, etc.), or should an arbitrary number of > >>> figures be used? Again, this isn't about losing information, as > we're > >>> only discussing rounding away noise. > >>> > >>> These two questions are entirely orthogonal, as far as I can > see, and > >>> it would be possible to allow both options for both questions, > if this > >>> were seen as being worthwhile. The one interaction is that if > you use > >>> the ASCII text encoding, rounding the figures will make the > mzData > >>> file smaller. > >>> > >>> Regarding ambiguity, the ASCII text representation would allow > >>> differing whitespace (which produce no semantic difference). I > guess > >>> the base64 encoding also allows differing surrounding > >>> whitespace. > >>> > >>> With respect to the base64 encoding, one corner case comes to > mind. > >>> Are special IEEE values like NaN, the infinities, negative zero, > etc., > >>> allowed? If so, what should the interpretation be? > >>> > >>> Mike > >>> > >>> > >>> The example code I mentioned: > >>> > >>> /* gcc -g -O2 -ffloat-store -o ieee-test ieee-test.c */ > >>> > >>> /* strtof is GNU/C99 */ > >>> #define _GNU_SOURCE > >>> > >>> #include <assert.h> > >>> #include <errno.h> > >>> #include <limits.h> > >>> #include <stdio.h> > >>> #include <stdlib.h> > >>> > >>> > >>> union bits { > >>> unsigned int u; > >>> float f; > >>> }; > >>> > >>> > >>> int > >>> main() { > >>> unsigned int i; > >>> union bits x, x2; > >>> int zeros_seen = 0; > >>> > >>> assert(sizeof x.u == sizeof x.f); > >>> assert(&x.u == &x.f); > >>> > >>> > >>> > >>> for (i=0; ; i++) { > >>> char buf[128]; > >>> > >>> if (i == 0) > >>> if (++zeros_seen > 1) > >>> break; > >>> > >>> #if 0 > >>> if (!(i % 100000)) > >>> putc('.', stderr); > >>> #endif > >>> > >>> x.u = i; > >>> if (x.f != x.f) > >>> continue; /* skip error values */ > >>> > >>> sprintf(buf, "%.8e", x.f); > >>> > >>> errno = 0; > >>> x2.f = strtof(buf, 0); > >>> if (errno == ERANGE) { > >>> printf("strtof error for %s\n", buf); > >>> continue; > >>> } > >>> > >>> if (x2.u != x.u) > >>> printf("bit difference for %s (%u != %u)\n", buf, x2.u, > x.u); > >>> } > >>> } > >>> > >>> > >> > >> > ------------------------------------------------------------------------ > - > >> Take Surveys. Earn Cash. Influence the Future of IT > >> Join SourceForge.net's Techsay panel and you'll get the chance to > share your > >> opinions on IT & business topics through brief surveys -- and > earn cash > >> > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDE > V > >> _______________________________________________ > >> Psidev-ms-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >> > >> > >> > ------------------------------------------------------------------------ > - > >> Take Surveys. Earn Cash. Influence the Future of IT > >> Join SourceForge.net's Techsay panel and you'll get the chance to > share your > >> opinions on IT & business topics through brief surveys -- and > earn cash > >> > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDE > V > >> _______________________________________________ > >> Psidev-ms-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > >> > >> > > > > -- > > -- > > Dr. Pierre-Alain Binz > > Swiss Institute of Bioinformatics > > Proteome Informatics Group > > 1, Rue Michel Servet > > CH-1211 Geneve 4 > > Switzerland > > - - - - - - - - - - - - - - - - - > > Tel: +41-22-379 50 50 > > Fax: +41-22-379 58 58 > > Pie...@is... > > http://www.expasy.org/people/Pierre-Alain.Binz.html > > > > ---------------------------------------------------------------------- > > -- > > > > ---------------------------------------------------------------------- > > --- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to > share your > > opinions on IT & business topics through brief surveys -- and earn > cash > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDE > V > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Psidev-ms-dev mailing list Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > > > -- > Angel Pizarro > Director, Bioinformatics Facility > Institute for Translational Medicine and Therapeutics University of > Pennsylvania 806 BRB II/III 421 Curie Blvd. Philadelphia, PA 19104-6160 > > P: 215-573-3736 > F: 215-573-9004 > E: an...@ma... > > > ------------------------------------------------------------------------ > - > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your opinions on IT & business topics through brief surveys -- and earn > cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDE > V > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Psidev-ms-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev > |