From: Coleman, M. <MK...@St...> - 2006-09-20 18:17:27
|
> Angel Pizarro: > I am cringing as I write this, since I really think you=20 > should not go this=20 > route, but look at the supplementary data tags. I am cringing with you. :-) Abusing the supplementary tags for this purpose is definitely out--this is an even more unpleasant option than going with a home-grown mzData extension. > ah, yes, but most probably you have to either zcat the file=20 > or unzip it in=20 > order to read the floats, then zip the whole file back again=20 > once finished, a=20 > situation not unlike decoding byte arrays and base64 strings.... Yes, having zipped files does imply having gzip/etc around, and you are correct that this is in some ways similar. A notable difference is that zip tools are already ubiquitous, standard, reliable, and well-understood by users. The scripts I'll have to write to decode mzData won't be. (Note, too, that it is not necessary to unzip and rezip in order to just read a compressed file. The 'zcat' program and its variations (there's surely a ruby module, for instance) can read the file without disturbing it.) > I'll add to those arguments that we should look at the=20 > computational costs of un/zipping whole files as opposed to stream=20 > en/decoding individual mzData spectra. I agree that zip'ing will have a greater cost than generating base64. I don't think the cost is great, and in any case, zip'ing isn't necessary unless you're hurting for disk space. =20 Disk is cheap. If I zip'ed these files, it would be as much to get the checksumming as to save the disk space. > 1) it can handle encoding of integers, single and double precision float=20 > arrays without loss of information As far as I know, a textual representation can also do this perfectly. > 2) comparable compression with zipped plain text of the same precision I agree that they're similar, within the bounds that I care about (2-3x). > 3) better performance with respect to accessing individual spectra vs. > compressed plain text If you mean that you can easily seek to a particular spectrum in a file (presuming that some index is already present), I agree that this is simpler and much faster. As far as I know, seeking in a zip file isn't really efficient. If I thought I was going to need to do this, I'd want to store the files uncompressed. (As a practical matter, I can't think of a reason we'd need to do this here.) Mike |