From: Coleman, M. <MK...@St...> - 2006-09-19 22:58:23
|
> From: Angel Pizarro > > 1. Loss of readability. ... > There actually is a space for "human readable spectra" in the=20 > mzData format,=20 I'm glad to hear that. I looked for this, but I did not see it in the spec here =09 http://psidev.sourceforge.net/ms/xml/mzdata/mzdata.html#element_mzData I was looking for something like a 'mzArray' and 'intenArray' tags, which would be the textual alternatives to 'mzArrayBinary' and 'intenArrayBinary'. Can you point me to an example? > but really who reads individual mz and intensity values? Well--I do. As a programmer I don't think it's an exaggeration to say that I'm looking at the peak lists in our ms2 files every day. I find being able to see at a glance that the peaks are basically sane, and their gross attributes (precision, count, etc.) very useful. Of course, as a programmer I can easy whip up a script to decode this file format. I suspect most users would be stymied, though, and I think that that would be unfortunate. Since these files are part of a chain of scientific argument, I think that as much as possible they ought to be transparent and as open as possible to verification by eyeball (mine and those of our scientists) and alternative pieces of software. I'm not saying that this transparency is an absolute good. Perhaps it is worth impairing so that we can have X, Y, and Z, which are considered more valuable. I'm not seeing what X, Y, and Z are, though. > > 2. Increased file size. ... > Not a fair comparison. Most of the space in an mzData file is=20 > actually taken up by the human-readable parameters and parameter=20 > values of the spectra. Sorry, I should have been clearer. The numbers I gave were just for the peak lists (base64 vs text) and nothing else--no tags, no other metadata. The rest of the mzData fields would add more overhead, but I have no objection about that part. If we implemented mzData here today, our files would be bigger if we used the base64 encoding than if we used the textual numbers (as they are in our ms2 files). > > 3. Potential loss of precision information. ... > Actually the situtation may be reversed. Thermofinnigan, for=20 > example, stores measured values coming off of the instrument=20 > as double precision floats, later formatting the numbers as=20 > needed with respect to the specific instruments limit of detection.=20 > 12345.1 may have originally been 12345.099923123 in the vendors=20 > proprietary format. Okay, but isn't '12345.1' what I really want to see in this case (assuming that the vendor is correct about the instrument's accuracy)? For this particular instance, the string '12345.1' tells me what I need to know, and a double-precision floating point value (e.g., 12345.10000000000036379) would sort of let me guess it (since double-precision has significantly more significant figures). But a single-precision value would leave me in a sort of gray area. That is, does '12345.099923123' mean '12345.1' or '12345.10' or '12345.100', for example? > I wrote an email a few days ago showing how to translate in ruby=20 > the base64 arrays I saw it and it was quite useful to me. Part of the reason I'm asking these questions is that I noticed in your examples that the base64-encoded values actually took more space than the original data. Just to reiterate my main question, it looks like using base64 will make mzData less usable and more complex, as compared to straight text. What benefits come with it that offset these drawbacks? Mike |