From: Mike C. <tu...@gm...> - 2006-09-24 02:25:52
|
[My two previous attempts to send this appear to have failed. My apologies if anyone is seeing multiple copies. I also omitted the C program mentioned below, in case that might be tripping a spam filter. Drop me an email if you want a copy. --Mike] This is dry stuff, but I think it's important to see that IEEE 754 values can be transmitted in decimal form (via mzData) without any loss of precision whatsoever. Let me give a more concrete scenario. In our case, assume we have an IEEE 754 single-precision value in our instrument computer. We want to use mzData to transmit that value to another computer, so that ultimately the latter computer will contain the identical single-precision value. One way to do this is the current method of capturing the 32-bit representation and sending it across using the base64 encoding. Another way to send the value is to send an ASCII representation of a decimal number that will, upon being converted using strtof(3), result in the identical single-precision value. (That decimal number is *not* typically mathematically equal to the single-precision value, it's just closer to it than to any other single-precision value.) This really *is* a completely lossless representation. There are different ways to generate these decimal numbers. It is sufficient (if not necessarily optimal) to simply use printf(3) with sufficient precision (e.g., "%.8e"). This will work with implementations that do correct rounding. Linux (meaning GNU libc) has done this correctly since at least nine years ago--I would assume the vendors are doing it right, though this should be confirmed. I'm including a small C program that demonstrates what I'm talking about. It does an exhaustive check for the single-precision case. It takes a couple of hours to complete, but if you're going to see an error, it will probably occur pretty quickly. (If you see any errors, I'd like to know.) This doesn't change the fact that 0.1 doesn't have an exact IEEE 754 representation. That is a separate issue (and one that a base64 encoding does not address either). As far as the cost of conversion, I agree that it is likely larger than the cost of the base64 encoding. I don't have the libraries at hand to try it out, but I'm sure it would be detectable for large sets of spectra. That notwithstanding, we and everyone else who uses a format like ms2 or dta are already paying this cost, and it doesn't seem particularly onerous. CPU cycles are pretty cheap--human cycles (that transparency might save) are very dear. Mike |