Re: [Psidev-ms-dev] mzQuantML update

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Dear Oliver,

I flew in to Sydney already for proteomics course.
I agree with you regarding the feature elements. Quality of feature (istope
pattern etc) can be an important information to capture. If there is big
concern on including this, we can make "optional" elements but in my
opinion, it is an essential element.

I hope to see you during HUPO.

Mi-Youn

On Mon, Sep 13, 2010 at 3:01 AM, Oliver Kohlbacher <
oli...@un...> wrote:

> Dear all,
>
> unfortunately, I could not attend the mzQuantML meeting as I am
> constantly traveling this month, so I have to add my comments
> from the off.
>
> The draft looks quite reasonable to me, except for two - rather
> important - points: Feature information and Data Processing information.
> Let me address them individually.
>
> Features:
>
> From my discussions with Mathias, who was there, I got the impression
> that dropping the features from the draft was mostly due to two points:
> it was seen as rather complex and not essential for storing the (final)
> results of analyses in public databases.
>
> Complexity is to be expected and we should not be afraid of it. The draft
> we provided prior to the meeting included feature information. While one
> can always simplify the schema (as a matter of fact, we worked on that
> on the OpenMS retreat last week), I would argue that this part of the
> schema is rather mature. For this draft we merged the schemas of three
> quantification formats form TPP and OpenMS (APML, featureXML,
> consensusXML) and both the OpenMS team and the ISB people are happy with
> this draft. I cannot perceive any undue complexity.
>
> As to whether one should store this information, I believe that not
> storing it is absolutely detrimental and would disqualify mzQuantML
> as a format for archiving published quantitative proteomics data. The
> reason for that (and more on it below) is that we can no longer trace
> back where the information for a specific peptide/protein comes from.
> Apart from some particularly stupid quantification methods (i.e. spectral
> counting), feature information allows to really go back and check the
> original data for each peptide/protein and validate that the quantification
> algorithms really did what they were supposed to do. By storing the
> original data as mzML and then the final results only, we are back
> to the current state: believe the table of differentially expressed
> proteins or don't. Checking every piece of evidence as the data is being
> processed is made impossible by this.
>
> Another - and to us equally important point - is the fact that most
> quantification packages do not process the data in a single step.
> It is not just 'mzML in and mzQuantML out'. Even for more monolithic
> applications, the quantification involves multiple stages (feature
> identification, peptide ID, protein inference, filtering for fold
> changes, various statistical analyses).
> For most of these steps, Features are essential and required information,
> so basically mzQuantML would be useless in data processing pipelines.
> AL we had gained would be yet another format because we would have to
> keep the old ones. Doesn't convince me.
>
> Data Processing:
>
> Arguing along the same lnes as above, I would like to see the
> DataProcessing portion of the mzML schema included in mzQuantML.
> In order to provide complete and traceable information on anything
> that was done to the data. In principle, the 'Materials & Methods'
> section on data processing should be reconstructible from this
> portion of the mzQuantML file. The advantages are obvious: if you
> process your mzML file with various software packages, you can
> transfer that information to the myQuantML file that contains the
> results based on these files. In my opinion, an elegant solution
> that would enable us to document the whole workflow. It could
> enable us to make also the data processing reproducible, transparent,
> and more consistent with common scientific standards.
> Perhaps the protocolList part of mzQuantML could be simply an
> extension of the dataProcessing part of mzML.
>
> Alright, just my thoughts on this. I am sorry I couldn't attend the
> meeting, but perhaps a few of you will be attending HUPO 2010 in
> Sydney. Just in case you are and are and interested in discussing
> this over a beer, please don't hesitate to send me an email.
>
> Cheers from Sydney,
>  Oliver

-- 
Mi-Youn Brusniak, Ph.D.
Computational Biology
Seattle Proteome Center
mbr...@sy...
Tel: (206) 732-1327