Re: [Psidev-qc-dev] Quality control in Mass-Up (and other topics)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi, all.

A couple of weeks ago, I gave a talk at Semmering, Austria, on some ideas for quality control making its presence felt more broadly, supporting experiments beyond the usual run of LC-MS/MS shotgun workflows (attached).  I am pleased to report that one of the attendees of the meeting had already made substantial in-roads to putting MALDI-TOF profiling on firmer footing in QC.  Hugo Lopez Fernandez published his Mass-Up framework (http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0752-4) with a module intended explicitly for QC in the MALDI.  It can recognize when a particular profile matches relatively few m/z values for the other replicates for a sample or when a sample has low overlap with other samples.  We probably want to cite this in Wout's manuscript (now due on Valentine's Day) as an example of a framework that we hope will communicate with qcML, once it is standardized.

Wout, congratulations on passing your defense!

Mathias Walzer, how do matters sit with qcML updates?  What steps stand between now and document submission?  We have a HUPO-PSI meeting at Beijing in less than three months!

Thanks,
Dave

-----Original Message-----
From: Hugo López Fernández [mailto:hlf...@uv...] 
Sent: Friday, 20 January 2017 13:32
To: Tabb, David, Prof <dt...@su...>
Subject: Quality control in Mass-Up (and other topics)

Hello David,

I am Hugo, we met last week in Semmering. I hope this email finds you well and that you had a good trip to back home.

As we talked in the EuBIC, I am writing you to let you know more about the quality control analysis that we have included in Mass-Up (http://sing-group.org/mass-up/). This quality control is intended to work with peak lists. We would like to incorporate quality control for raw data, specially to detect batch effects as I also commented you.

Basically, the quality control (which is explained with most details in the paper http://doi.org/10.1186/s12859-015-0752-4) can be done at two
levels: at the replicates leve and at the samples level, which includes additional information from the intra-sample m/z matching process and consensus spectrum creation (this is because our collaborators usually want to reduce replicates spectra to a unique sample "consensus" 
spectrum). You can find attached the quality control image included in the paper.

At the replicates level, the user can check basic information about each individual spectrum (i.e. peak count, m/z range, intensity ranges, etc.) and compare all spectra in the dataset. At the samples level, the user can check the performance of the intra-sample peak matching process, by comparing the percentages of presence (POP) counts (i.e.: the counts of peaks that are present in, for example, 60%, 80% or 100% of replicates) and the POPs of each sample.

In spite of being a very simple quality control it allowed us to detect some problems with datasets and we encourage our collaborators to have a quick look at this quality control metrics before any other analysis. 
Unfortunately they usually don't but we must encourage good practices, which is the reason why I am developing this other software (http://www.sing-group.org/s2p/), also presented in other poster at the EuBIC. Basically it is a software to manage, process and integrate different data sources (Mascot identifications, MALDI plates, 2D-gel spots). It probably will not revolutionize bioinformatics but it is allowing the research group to process data efficiently and in a reproducible way, a totally different scenario than wen I came here six months ago.

As I mentioned previously we also would like to include quality control metrics for MALDI-TOF raw data, with special focus in batch effect detection (which seems to a common problem here). Regarding batch effect, I would like to apply this statistic
(http://dx.doi.org/10.1093/bioinformatics/btt480) based on guided principal component analysis to detect batch effects in MALDI-TOF data (some people applied it to LC-MS metabolomic data [http://dx.doi.org/10.1016/j.talanta.2014.07.031]). I would like to develop this work this year if I get public MALDI-TOF datasets where batch effect presence has been publicly reported (I found a few reported but I could not get the data to analyze it yet).

I will be happy to answer any question you may have or to receive any feedback from you. Looking forward to see you again, in other conference or wherever.

Best regards,

     Hugo.

--
----------------------------------------------------------------
Hugo López-Fernández, PhD
Email:hlf...@uv...
----------------------------------------------------------------
ESEI: Escuela Superior de Ingeniería Informática "Politécnico" Building, Room 306 "As Lagoas" Campus
32004 - Ourense - Spain
Web:http://www.sing-group.org/~hlfernandez/
----------------------------------------------------------------