From: Angel P. <an...@ma...> - 2006-09-22 20:37:18
|
Jimmy Eng wrote: > I believe base64 encoding makes more sense for some large class of > applications that will hopefully be digesting these files but I'm sure > everyone can see the obvious benefits of plain text encoding of peak > lists. > > The question I have is regarding the representation of space delimited > lists as Lewis and Randy have drawn up. Does this address the needs > of Michael, Steve, and Akhilesh and others? Hopefully they'll all > chime in. My concern would be that having a horizontal, space > separate list of numbers, where m/z and intensity will possibly be > written in separate lists of floats and ints, doesn't really serve the > notion of readability. Lots of folks are used to looking at lists of > peaks as ordered in .mgf or .dta files and I'm not sure if a > horizontal list of numbers (especially if it's 2 lists, one for m/z > and one for intensity) gives you that same sense of readability. I > don't really see any regular use case scenarios where people would be > scrolling over to the 68th m/z in the list and then somehow counting > over to the location of the 68th intensity to get its value. > > So _if_ this really doesn't address the needs of the folks who have > concerns about the base64 encoding and would like like to see plain > text, speak up. The last thing the format needs is more complexity > in the form of another optional way of representing the data that only > a handful of people will ever end up using. > > - Jimmy > > All excellent points. Let me see if I can recap the set of arguments: 1) For high-throughput and computational task, base64 encoding is fast, robust and reasonable with respect to size 2) Text formats are not useful unless they are formatted in an easily digestible fashion 3) Point #2 often conflicts with point #1 4) Ambiguity in a format is universally seen as a "bad thing" The best suggestion I could think of would be to just go ahead and officially endorse our current standard operating procedures. By this I mean first and foremost, that the official format be restricted to binary encoded data arrays. This is the format officially supported by hardware and software vendors. Second, that we endorse one of the /de facto/ plain text formats (MS2 or MGF) as the best way to encode plain text data, *and *(this is the important bit) the official PSI API's provide export to the endorsed plain text format. Notice that I didn't say import, since this operation is a lossy one, as covered in other posts. Or if we do provide import routines, they come with the large caveat that the transformation may have been lossy. The problem I see with this is that the I do not know if MS2 or MGF handle data other than MS2 or from multiple analyzers and detectors. They also generally have a much more restricted set of annotations good idea? bad idea? Something to discuss in DC at least... -angel |