Re: [Psidev-qc-dev] qcML format evolution

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Mathias,

Thank you for the detailed state of affairs.

Concerning your iMonDB question, I don't have any notion of a 'set' in there, I only store data on the level of individual runs. There is however the possibility for the user to specify custom metadata through the GUI, which is stored as key-value pairs. Based on the metadata and combinations thereof the data can be filtered for the visualization.

Also, currently I represent all metrics as summary statistics over the entire run, consisting of mean, median, standard deviation, quartiles, ... Therefore the discussion on how to represent single and multiple values is very relevant, but I'm not sure what would be the ideal solution. Also, it could be interesting not to discard the full data in favor of only summary statistics to for example investigate and optimize the performance of individual scans. But this is only interesting idea at the moment, not something I'm actively working on.

You can find the current iMonDB database schema here <https://bitbucket.org/proteinspector/imondb/wiki/Database>, although this will be transformed to match the new qcML format once it has been specified.

Regards,
Wout

> On 08 Nov 2016, at 18:14, Mathias Walzer <wa...@in...> wrote:
> 
> Dear all,
> 
> as promised here how I'd propose a schema working plan and first discussion points.
> 
> But maybe first the current design main points from the top:
> Right now, we compile calculated metrics results as QualityParameters, structured under either a particular run or a set of runs. Also given can be the software that calculated the metrics or other metadata, all cv controlled. Each parameter is under cv control as well, describing the metric. The results are stored in a subelement of the parameter.
> 
> Now what I'd consider lead questions for our schema development (in suggested order of discussion and also from schema core outwards):
> 
> How do our metrics look like? ('XML-any' seems like a hard task to maintain tool-compatibility) Is our list on GitHub complete enough to start with? Would it be beneficial to also add examples to each metric?
> 
> How do we want our metrics to be represented? All defined by one cv term (hence creating an elaborate hierarchy of cv terms) or a combination of several (can we design cv terms that atomic to allow building block like features or does this hamper understanding the implications of a metric)?
> 
> Do we need to change the nomenclature for intelligibility?
> 
> Where to put additional data? A plot for instance. Or a distribution. For visualization, we probably do not want it to be necessary to re-read the whole source data again.
> 
> Do we ditch the possibility to flag results in a report? (The schema right now gives the qcML producing software the possibility to reflect in the file if a flag was raised - on whichever thresholds the user has set)
> 
> Should we change (flatten) the hierarchy? (i.e. the way metrics are categorized in per run or per set)
> 
> How to link source data? Leave it as a quality parameter of a run/set or build in dedicated elements.
> 
> What do we need to concisely present the metric (for visualization or further automated processing)? This circles back to the first question, 2nd iteration :)
> 
> These questions should take up on the feedback points from Ghent and guide us in improving the schema to a version to start building software around. But did I miss any particulars only common in metabolomics? And shall we have a regular telco/hangouts/... (bi- or tri-weekly)?
> 
> best,
> mths
> 
> ----- Original Message -----
> From: "Mathias Walzer" <wa...@in...>
> To: psi...@li...
> Sent: Tuesday, 8 November, 2016 1:41:04 PM
> Subject: Re: [Psidev-qc-dev] qcML format evolution
> 
> Dear all,
> 
> Yes, as Dave pointed out there was quite some discussion and I agree this would be a good point to start with planning for the next parts of development. Though one thing I'd like to point out is that these were on-the-fly changes to explore the feasibility.
> 
> In the following, I will summarize the feedback including changes, without taking sides (hopefully).
> I guess to keep length in reasonable measures, in a follow-up mail, I will sketch up a proposed schema working plan and further discussion points.
> As a side note, I've been in contact with George and Hannes about QC for openSWATH previously. Let me see if I can dig up our correspondence and give you a summary.
> 
> A major point of feedback was going towards the attachment construct.
> Having a quality parameter calculated, which result in more than a single value would have worked the way that the XML element of the parameter would have contained only the cv term to classify the parameter and one or more attachment elements, like a table or a plot, referring to the quality parameter element.
> From the feedback, preferred was to just leave it all with the parameter. This is now reflected in the development draft as nested content element, which can contain any.
> This leads to the second part of feedback about attachment element, that having base64 or lists of values (i.e. tables) would be insufficient. That is why there is now 'any'.
> 
> A similar point of feedback was concerning the structure, over whether a separate structure for quality parameters of sets of files would be necessary. Here, no development draft changes were made, yet. Which brings me to an interesting side question @Wout, how do you handle such input in iMonDB like the type of run or any experiment structure that might influence how to calculate  the statistics for quality assessment over the course of measurement acquisitions?
> 
> Another point of the feedback was the reluctance accepting that the information of how to display the qcML may be contained in the file. (I.e. a stylesheet, removing the implicit necessity for a dedicated viewer program.) This however, remained in the development draft, as non-mandatory element tucked away in the outermost end of the schema.
> 
> Next point and the last point of discussion in Ghent was to explore the different types of quality parameters or rather their result types to get a better grasp on how to adequately model the schema. Which resulted in the collection on GitHub (https://github.com/HUPO-PSI/qcML-development/blob/master/cv/qc-metric-collection.md).
> 
> There was no time to discuss, but I wish we had, was regarding the source file(s) denomination of a calculated metric. Right now, there is no explicit element denominating the source of the data.
> 
> And a very last point would be the naming of the elements, which was not addressed in particular, but I had a feeling this was a source of confusion at some points in time. Maybe we should put this under scrutiny as well. The overall structure of the schema is simple enough, maybe there is a better nomenclature to follow.
> 
> 
> best,
> mths
> 
> ----- Original Message -----
> From: "Stefan Tenzer" <te...@un...>
> To: "David Tabb, Prof <dt...@su...>" <dt...@su...>
> Cc: "Joerg Kuharev" <ku...@un...>, psi...@li...
> Sent: Wednesday, 2 November, 2016 9:17:22 AM
> Subject: Re: [Psidev-qc-dev] qcML format evolution
> 
> 
> Hi everybody,
> 
> 
> it could be that you spoke with Pedro Navarro (he was in my group at this time, but now unfortunately has moved on and joined Thermo as a programmer).
> 
> 
> For QC for SWATH-MS, in my opinion, it would be great to get in contact with George Rosenberger ( ros...@im... ) or Hannes Röst ( hr...@st... ), both are bioinfomaticians and experts in SWATH-MS.
> If you agree, I could of course invite them to contribute.
> 
> 
> Jörg Kuharev (Postdoc in my group) and I would be happy to contribute metrics for HDMSE/UDMSE (ion-mobility based DIA approaches).
> 
> 
> 
> 
> Best wishes,
> 
> 
> Stefan
> 
> 
> 
> 
> 
> 
> Univ.-Prof. Dr. rer. nat. Stefan Tenzer
> ______________________________________________
> 
> UNIVERSITÄTSMEDIZIN
> der Johannes Gutenberg-Universität
> Institut für Immunologie
> Core Facility für Massenspektrometrie
> Gebäude 708
> Langenbeckstr. 1, 55131 Mainz
> www.immunologie-mainz.de
> 
> Telefon: +49 (0) 6131 17-6199
> Telefax: +49 (0) 6131 17-6202
> E-mail: te...@un...
> 
> 
> Am 01.11.2016 um 13:57 schrieb Tabb, David, Prof < dt...@su... > < dt...@su... >:
> 
> 
> 
> 
> 
> Hi, all!
> 
> We had such a lot of interesting topics that arose during our meeting in April. Now that we are rapidly approaching the submission of our group announcement whitepaper (courtesy of consistent labors by Wout Bittremieux), I wanted to check in on various other questions that arose during Ghent:
> 
> 1) Mathias, you received a lot of feedback on the qcML format at our meeting. Could you describe the nature of changes that you have already made to the format along with the set of topics that still need addressing in the months to come?
> 2) Reza, we have talked about various partners in the metabolomics community who may be assisting us as we move forward with our work. Could you identify the primary people who will be most closely connected with the quality control effort?
> 3) All, I spoke with a researcher who was tightly associated with the development of SWATH technologies (a type of DIA). He was interested in seeing QC tools for the SWATH platform. Could someone help me remember his name? I’d like to see if that group has already roughed out some metrics in the DIA space.
> 4) Wout, thank you for your persistence in the announcement white paper ( https://www.overleaf.com/5671898jfvtyd ). In the month and a half together at Cape Town that we have remaining, I hope to get a better grip on the iMonDB ( https://bitbucket.org/proteinspector/imondb/wiki/Home ) framework.
> 
> Thanks!
> Dave Tabb ------------------------------------------------------------------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi_______________________________________________
> Psidev-qc-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-qc-dev
> 
> ------------------------------------------------------------------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> Psidev-qc-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-qc-dev
> 
> ------------------------------------------------------------------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> Psidev-qc-dev mailing list
> Psi...@li...
> https://lists.sourceforge.net/lists/listinfo/psidev-qc-dev