Thread: [Psidev-ms-dev] FW: Why base64?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

There was concern in the NBT review of the mzData manuscript that the format
was not able specifically designed for either quantitation or 'raw' data.
Quite the opposite is true - it handles these better than it handles a 'peak
list'.

Given the broad scope we are going for, I think mzData 2.0 needs to cover
both of Mike's suggestions.

The representation should allow an ASCII list representation, _and_ a base64
list option.  Within each of these, the _desired_ precision should be used.
If you want to make some kind of 21CFR11 claim regarding GLP or GCP for
clinical data (metabolites, proteins or biomarker analyses) then the ability
to represent 'raw' data is critical and part of the current design.

It is the simple case of 'represent a single tandem MS spectrum of a single
peptide at only the precision of the m/z calibration' that is harder than it
needs to be with the current representation.

During the Washington PSI meeting a proposal was made to re-introduce the
ASCII data representation that was dropped at the PSI meeting in Nice.  What
does everyone think of this idea?

Randy

-----Original Message-----
From: psi...@li...
[mailto:psi...@li...] On Behalf Of Mike
Coleman
Sent: Wednesday, October 04, 2006 3:13 PM
To: Angel Pizarro
Cc: Psi...@li...
Subject: Re: [Psidev-ms-dev] Why base64?

[This message seems to have been bounced by Sourceforge, so I'm
resending it.  I'm sorry to see that apparently they are having
serious email problems these days.  See today's Slashdot article at
http://it.slashdot.org/article.pl?sid=06/10/04/1324214.  (Apparently
the problem isn't limited to email coming from gmail accounts.) ]

On 9/28/06, Mike Coleman <tu...@gm...> wrote:
> Makes sense.  To put it in other words, there are two questions here:
>
> 1.  Are the values represented as base64-encoded bitstrings or as ASCII
text?
>
> 2.  Should the values be rounded to the precision of the instrument
> (probably plus a digit, etc.), or should an arbitrary number of
> figures be used?  Again, this isn't about losing information, as we're
> only discussing rounding away noise.
>
> These two questions are entirely orthogonal, as far as I can see, and
> it would be possible to allow both options for both questions, if this
> were seen as being worthwhile.  The one interaction is that if you use
> the ASCII text encoding, rounding the figures will make the mzData
> file smaller.
>
> Regarding ambiguity, the ASCII text representation would allow
> differing whitespace (which produce no semantic difference).  I guess
> the base64 encoding also allows differing surrounding whitespace.
>
> With respect to the base64 encoding, one corner case comes to mind.
> Are special IEEE values like NaN, the infinities, negative zero, etc.,
> allowed?  If so, what should the interpretation be?
>
> Mike
>
>
> The example code I mentioned:
>
> /* gcc -g -O2 -ffloat-store -o ieee-test ieee-test.c */
>
> /* strtof is GNU/C99 */
> #define _GNU_SOURCE
>
> #include <assert.h>
> #include <errno.h>
> #include <limits.h>
> #include <stdio.h>
> #include <stdlib.h>
>
>
> union bits {
>   unsigned int u;
>   float f;
> };
>
>
> int
> main() {
>   unsigned int i;
>   union bits x, x2;
>   int zeros_seen = 0;
>
>   assert(sizeof x.u == sizeof x.f);
>   assert(&x.u == &x.f);
>
>
>
>   for (i=0; ; i++) {
>     char buf[128];
>
>     if (i == 0)
>       if (++zeros_seen > 1)
>         break;
>
> #if 0
>     if (!(i % 100000))
>       putc('.', stderr);
> #endif
>
>     x.u = i;
>     if (x.f != x.f)
>       continue;                 /* skip error values */
>
>     sprintf(buf, "%.8e", x.f);
>
>     errno = 0;
>     x2.f = strtof(buf, 0);
>     if (errno == ERANGE) {
>       printf("strtof error for %s\n", buf);
>       continue;
>     }
>
>     if (x2.u != x.u)
>       printf("bit difference for %s (%u != %u)\n", buf, x2.u, x.u);
>   }
> }
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Psidev-ms-dev mailing list
Psi...@li...
https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev