Re: [Psidev-ms-dev] Why base64?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

[My two previous attempts to send this appear to have failed.  My
apologies if anyone is seeing multiple copies.  I also omitted the C
program mentioned below, in case that might be tripping a spam filter.
 Drop me an email if you want a copy.  --Mike]

This is dry stuff, but I think it's important to see that IEEE 754
values can be transmitted in decimal form (via mzData) without any
loss of precision whatsoever.  Let me give a more concrete scenario.

In our case, assume we have an IEEE 754 single-precision value in our
instrument computer.  We want to use mzData to transmit that value to
another computer, so that ultimately the latter computer will contain
the identical single-precision value.  One way to do this is the
current method of capturing the 32-bit representation and sending it
across using the base64 encoding.

Another way to send the value is to send an ASCII representation of a
decimal number that will, upon being converted using strtof(3), result
in the identical single-precision value.  (That decimal number is
*not* typically mathematically equal to the single-precision value,
it's just closer to it than to any other single-precision value.)

This really *is* a completely lossless representation.

There are different ways to generate these decimal numbers.  It is
sufficient (if not necessarily optimal) to simply use printf(3) with
sufficient precision (e.g., "%.8e").  This will work with
implementations that do correct rounding.  Linux (meaning GNU libc)
has done this correctly since at least nine years ago--I would assume
the vendors are doing it right, though this should be confirmed.

I'm including a small C program that demonstrates what I'm talking
about.  It does an exhaustive check for the single-precision case.  It
takes a couple of hours to complete, but if you're going to see an
error, it will probably occur pretty quickly.  (If you see any errors,
I'd like to know.)

This doesn't change the fact that 0.1 doesn't have an exact IEEE 754
representation.  That is a separate issue (and one that a base64
encoding does not address either).

As far as the cost of conversion, I agree that it is likely larger
than the cost of the base64 encoding.  I don't have the libraries at
hand to try it out, but I'm sure it would be detectable for large sets
of spectra.  That notwithstanding, we and everyone else who uses a
format like ms2 or dta are already paying this cost, and it doesn't
seem particularly onerous.  CPU cycles are pretty cheap--human cycles
(that transparency might save) are very dear.

Mike