From: Coleman, M. <MK...@St...> - 2006-09-19 20:39:31
|
Hi, Does anyone know why base64 encoding is being used for peak mz and intensity values in the mzData format? It appears to me that there are three significant disadvantages to doing so: 1. Loss of readability. One of the primary reasons to use XML in the first place is that it is human-readable--one can in principle inspect and understand its contents with any text editor. Base64-encoding peak data destroys this transparency. (It also makes it more difficult to write scripts to process the data.) 2. Increased file size. At least for our spectra, it appears that a compressed (gzip/etc) ms2 file is about 15% smaller than the equivalent mzData file with the single-precision (32-bit) encoding, and 22% smaller than the double-precision version. The *uncompressed* single-precision mzData file is about about 15% smaller than the uncompressed ms2 file; the double-precision version is almost twice as large. (These figures are for 'gzip' default compression.) (Currently our ms2 files have mz values rounded to one decimal place and intensity values with about 4-5 significant places.) 3. Potential loss of precision information. For example, with single-precision encoding, a value originally given as 12345.1 might be encoded as 12345.0996. It's not easy to see from that encoding that the original value was given with one decimal place. Worse-still, if the original value is significant to more than 7-or-so digits and it gets 32-bit encoded, precision will be lost, probably in a way not immediately apparent to the user. (32-bit encoding will probably be a temptation, given the size of the 64-bit encoding.) Even if base64-encoding cannot be dropped at this point, it seems like it would be useful to add a "no encode" option, which would present peak data as the obvious whitespace-separated list of numeric values. Am I missing something here? I could not find any discussion of this issue on the list. =20 --Mike Mike Coleman, Scientific Programmer, +1 816 926 4419 Stowers Institute for Biomedical Research 1000 E. 50th St., Kansas City, MO 64110, USA |