From: Coleman, M. <MK...@St...> - 2006-09-20 18:17:46
|
> Brian Pratt: > Accuracy: Mass spec data in its raw form is generally stored=20 > in binary formats, since mass specs are front ended by binary > computers. Conversion to and from base 10 human readable=20 > representations introduces error. It's best to hold the data at its > original precision and translate out to human readable format=20 > at whatever precision is deemed useful for eyeballing. This is a complicated topic and I don't claim to be an expert by any means. Here's my understanding. Error is present, and we want to avoid amplifying it. If, for example, the instrument has an internal IEEE FP value 1234.56789012345 and we know that its precision is only +/- 0.1, then there's no particular benefit (nor harm) to reporting this as anything beyond 1234.6 or 1234.57. The 0.00089012345 is more or less noise. As a practical matter, it might be more efficient to move the IEEE bits directly from the instrument to the mzData file. A cost of doing this, though, is that this format is not human-readable. An alternative would be to fully represent the IEEE bits as a number. If I understand correctly, with properly implemented numeric I/O routines (in libc), you can have a 1-1 mapping between the internal and ASCII representation, so that it is possible to round trip without introducing error. This *would* make the textual representation larger, and it's not clear that it really makes sense to do this, because of the noise issue (above). One additional note: We seem to be assuming that mass specs all already do IEEE FP. Is this actually true? > File size: Sure, you can make files smaller by throwing away=20 > precision, but as you begin to desire higher precision base64 quickly > becomes much more efficient. Just to confirm, I agree that discarding *real* precision is unacceptable. (By "real", I mean what's being physically measured, not bits that are an artifact of the IEEE representation.) Mike |