From: Akhilesh P. <pa...@jh...> - 2006-09-19 23:02:12
|
I agree with Mike about the human readable part and the size issues - I insist in our lab that all files to be manipulated be 'scanned' before 'crunching.' If there are no compelling reasons, I do not see why this should not be reconsidered. Akhilesh Pandey At 05:58 PM 9/19/2006, Coleman, Michael wrote: > > From: Angel Pizarro > > > > 1. Loss of readability. ... > > > There actually is a space for "human readable spectra" in the > > mzData format, > >I'm glad to hear that. I looked for this, but I did not see it in the >spec here > > >http://psidev.sourceforge.net/ms/xml/mzdata/mzdata.html#element_mzData > >I was looking for something like a 'mzArray' and 'intenArray' tags, >which would be the textual alternatives to 'mzArrayBinary' and >'intenArrayBinary'. Can you point me to an example? > > > > but really who reads individual mz and intensity values? > >Well--I do. As a programmer I don't think it's an exaggeration to say >that I'm looking at the peak lists in our ms2 files every day. I find >being able to see at a glance that the peaks are basically sane, and >their gross attributes (precision, count, etc.) very useful. > >Of course, as a programmer I can easy whip up a script to decode this >file format. I suspect most users would be stymied, though, and I think >that that would be unfortunate. Since these files are part of a chain >of scientific argument, I think that as much as possible they ought to >be transparent and as open as possible to verification by eyeball (mine >and those of our scientists) and alternative pieces of software. > >I'm not saying that this transparency is an absolute good. Perhaps it >is worth impairing so that we can have X, Y, and Z, which are considered >more valuable. I'm not seeing what X, Y, and Z are, though. > > > > > 2. Increased file size. ... > > > Not a fair comparison. Most of the space in an mzData file is > > actually taken up by the human-readable parameters and parameter > > values of the spectra. > >Sorry, I should have been clearer. The numbers I gave were just for the >peak lists (base64 vs text) and nothing else--no tags, no other >metadata. The rest of the mzData fields would add more overhead, but I >have no objection about that part. > >If we implemented mzData here today, our files would be bigger if we >used the base64 encoding than if we used the textual numbers (as they >are in our ms2 files). > > > > > 3. Potential loss of precision information. ... > > > Actually the situtation may be reversed. Thermofinnigan, for > > example, stores measured values coming off of the instrument > > as double precision floats, later formatting the numbers as > > needed with respect to the specific instruments limit of detection. > > 12345.1 may have originally been 12345.099923123 in the vendors > > proprietary format. > >Okay, but isn't '12345.1' what I really want to see in this case >(assuming that the vendor is correct about the instrument's accuracy)? >For this particular instance, the string '12345.1' tells me what I need >to know, and a double-precision floating point value (e.g., >12345.10000000000036379) would sort of let me guess it (since >double-precision has significantly more significant figures). But a >single-precision value would leave me in a sort of gray area. That is, >does '12345.099923123' mean '12345.1' or '12345.10' or '12345.100', for >example? > > > > I wrote an email a few days ago showing how to translate in ruby > > the base64 arrays > >I saw it and it was quite useful to me. Part of the reason I'm asking >these questions is that I noticed in your examples that the >base64-encoded values actually took more space than the original data. > >Just to reiterate my main question, it looks like using base64 will make >mzData less usable and more complex, as compared to straight text. What >benefits come with it that offset these drawbacks? > >Mike > > > > >------------------------------------------------------------------------- >Take Surveys. Earn Cash. Influence the Future of IT >Join SourceForge.net's Techsay panel and you'll get the chance to share your >opinions on IT & business topics through brief surveys -- and earn cash >http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >_______________________________________________ >Psidev-ms-dev mailing list >Psi...@li... >https://lists.sourceforge.net/lists/listinfo/psidev-ms-dev |