Re: [Psidev-ms-dev] honey vs vinegar

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 10/4/07, Brian Pratt <bri...@in...> wrote:
>
> These are interesting questions about how folks will use the format.  I'm
> not comfortable with the idea that the format is intended for repositories
> instead of processing.  I'd think you'd want a repository to contain
> exactly
> the same artifacts that were processed lest anyone wonder later what
> differences may have existed in the various representations of the data.

I think we agree here but are coming from different perspectives. In my mind
in order for a repository to have the most accurate representation of the
data, the standard has to be purposed for data archival and flexible
experimental annotation. Data processing routines would then take that
format and do whatever it will for peak detection, noise reduction,
base-line correction, etc. to give a final set of values (that typically go
into the search algorithms). All of the intermediate steps in the processing
should in theory be able to be represented by the same format.

I think that mzML as it stands is able to do track the data and the
processes that where applied to it, but it will certainly not be the most
efficient way to represent the data *as the processing is being done*. A
special purpose format for the algorithm at hand will always win in terms of
engineering ease / speed / performance / interoperability (within a set of
tools).

This I think is at the heart of the whole discussion, and why I think
cvParam is always getting hammered on the list. So while it seems that we
are talking cross purposes, I really don't think we are.

-angel