From: Matthew C. <mat...@va...> - 2009-06-11 21:09:29
|
With binary data, the same representation works for centroided and profile data points. I really hope you're not suggesting ASCII storage of profile mode data, where 9-12 bytes per X sample (12345.678901) would not be unusual? It's all the overhead of double precision floats (a constant 8 bytes) without the vastly higher dynamic range and taking much longer to parse. Using mzML to store the raw data for the libraries would be a great improvement over the status quo (assorted custom relational databases and ASCII archives?). There actually would be a standard representation. :) Coming up with a standard representation for application-specific annotations would be another challenge for a possibly separate format, but the raw data we can already handle. And with a reasonably optimized representation for profile mode, storing consensus profile spectra could become a reasonable approach for spectral libraries. Although I do wish there was an XML-friendly 8-byte text encoding standard, like the yenc encoding used on news://alt.bin.*, which we could choose instead of base64 to achieve practically no encoding bloat. -Matt Mike Coleman wrote: > On Thu, Jun 11, 2009 at 2:41 PM, Matthew > Chambers<mat...@va...> wrote: > >> However, NIST library folks have a quite straight-forward way to meet >> the "human readability" requirement: XML comments. There's no reason you >> can't put what looks like an MGF peak list in an XML comment with every >> mzML spectrum (although presumably not profile-mode ones!). >> > > I think this would be worse than the status quo. If this change is to > be made, though, may I suggest that the ASCII peaks be used in the > "real" XML and that the binary peaks go in the comments? :-) > > Mike > |