From: Juan A. V. <ju...@eb...> - 2015-04-21 10:21:25
|
Dear all, I am providing here a summary of the sessions where I was directly involved, related mainly with the activities of the PSI-PI working group and ProteomeXchange. I am not covering here the developments related to compression and mzML, and to PEFF. Of course, additions and edits are wellcome. 1) mzIdentML 1.2 There was no session devoted to mzIdentML in the meeting. Juan Antonio presented the current status of things using slides from Andy. During the meeting, it was suggested to add guidelines in the new version 1.2 about how to report: - Identification results coming from chimeric spectra. - ProteoGenomics searches using (incrementally larger) databases, ideally reporting in more detail how the FDR calculation was performed. The format does not need to change at all, only extra instructions need to be added indicating how to do it. 2) mzQuantML There was no session devoted to mzQuantML in the meeting. Simon Perkins (Liverpool University) presented the current status of things. The guidelines for reporting SRM results in mzQuantML have been recently finished and published (http://www.ncbi.nlm.nih.gov/pubmed/25884107 <http://www.ncbi.nlm.nih.gov/pubmed/25884107>). Parag asked whether mzQuantML could support the combination of SILAC and iTRAQ in the same dataset. People are now starting to do this. This is currently not supported. It was mentioned several times in the ProteomeXchange session (e.g. by Brendan) the need to trace back to the original features, to ensure that researches can take informed decissions about the reliability of the reported quantitative results. This is supported in mzQuantML. 3) mzTab Ongoing extensions for metabolomics (COSMOS Consortium) and glycomics (MIRAGE Consortium) data. There was a remote meeting and there is an agreement in how to proceeed. The specification will need to be modified slightly to have: - A core part: Metadata information that is common to all techniques. Common framework. - Extensions for the different techniques: at the moment proteomics, metabolomics and glycomics. There could be more in the future. - Juan Antonio and Timo to help and advice, but not to lead. - Proposed face to face meeting between people from MIRAGE and S. Neumann on June in Postdam. - Juan Antonio to send MIRAGE document to everyone involved in the session. 4) ProteoGenomics approaches The first day there were talks by X. Wang (Vanderbilt University, “proBAM") and Gerben Menschaert (Ghent University, “Proteogenomics in need for integration standards”). The proBAM format, presented by X. Wang was very well received. She is interested in getting the format through the PSI document process. The file format: - includes the exact positions of the origin of the peptide ID in the genome. - Detailed information about the PSM (like scores). - Serving as a well defined interface between PSM identification and downstream analyses. - The big advantage is that the current software that can read BAM files can also read proBAM files. - Junction peptides are encoded in the CIDAR string section of proBAM. In addition: - A much simpler standard format for reporting peptides in BED format was also reported by Juan Antonio (“pepBED”). It is complementary to the proBAM format. Being developed by Andy Jones/EBI in the context of a BBSRC grant. - Inter-conversion between those formats (proBAM, peptBED) and mzTab should be be possible. - Do a MIAPE MSI extension for ProteoGenomics approaches? Eric to contact Alexey Nesvizhskii to see if he would be interested (follow up on what he published in the Proteogenomics review in Nature Methods, PMID: 25357241). Gerben stressed several times the need to report in detail the analysis workflow used in ProteoGenomics pipelines. 5) New format for spectral library files? Simon Perkins (Liverpool University) summarized the situation of the current available formats: splib/spectLib NIST formats BiblioSpec/blib hlf ms2/ssl For small molecules: MassBank, mzCloud, not peptide specific collections. The main piece of the information lacking there is the lack of metadata about e.g. the origin of spectra that constitute the spectral library. Since it does not make sense to develop a format from scratch it was proposed to develop a format for the metadata associated to spectral libraries, that could be used together with any of the existing formats. - Eric and Nuno to discuss and draft the first list of requirements. Other points mentioned in the session: - Brendan: Important to differentiate between primary information and derived information. - Nuno -> Secretomes -> mixed peptides and metabolites in the same sample. 6) ProteomeXchange session There were talks by Henning, Juan Antonio, Nuno and Eric about the overall status and the current resources: PRIDE, MassIVE and PeptideAtlas/PASSEL. - The grant we applied for (BBSRC/NSF joint funding) was well reviewed but not successful. - We should try again this year (deadline for the initial letter of intent is May 31st). - Number of citations to PRIDE and reuse of data has increased a lot in 2014. - Ongoing: development of PROXI and extension of the ProteomeCentral concept to include other omics data types (e.g. metabolomics) in the context of BD2K initiatives. There were also talks from Shin (JPOST) and Brendan (Panorama). - The Japanese database JPOST will definitely go forward since they have been successful in the grant application. It will contain original results and reprocessed data. - Panorama: - Repository for Skyline documents - Does not store raw data at present (one of the requirement of ProteomeXchange). - Raw data should be in other resources: PASSEL and/or Chorus. - Robert Chalkley reported the status of things in MCP: - MCP guidelines for authors are being updated. At the moment, quantification related guidelines are not going to change significantly. Not many details are provided in those. - it was discussed the need to have these guidelines much more “tightened up”. - Mandatory deposition of raw data is happening in practise although the guidelines of the journal have not been changed yet. It is supposed to happen soon. Discussion: - Need to have a unique identified for spectra. This could be transformed into a unique URL for spectra that would be very useful for journals. - This is only possible for open peak list files. PRIDE has developed an approach that is happy to report to others. - However this is not possible for raw files (e.g. “Partial” submissions), since conversion using ProteoWizard will result in different files depending on the version of the PW software and/or the vendor libraries. - Nuno highlighted during the session to have better sample related metadata annotation. - Initially we put the emphasis in making the submission process easy. - Maybe it is now the point of tighten things, especially in the case of "partial" submissions. 7) PROXI (ProteomeXchange query interface) - Remote presentation of Yasset to describe the current status. Consensus with MassIVE in how to proceed: - Focus in identification first (v1.0). - API REST interface to be developed. - Concentrate in the interface and “forget” for now about general limitations on reporting (e.g. mapping of protein identifiers, update protein sequences) since these are independent problems. Guidelines will need to drafted about this. - Skype call to be scheduled to continue with this work. 8) GitHub migration - Mathias has created an organisation in GitHub called “HUPO-PSI”. - He will migrate the current PSI-DEV Google SVN repository under that organisation. He will send an e-mail to the mailing list once this is done. - Owners and other roles need to be decided. - The PRIDE team will migrate the mzTab Google SVN asap. - For now only the formats, example files, etc will be included, not the libraries (e.g. jmzIdentML, jmzML, etc etc), although this can be changed if there is consensus. -At the moment the PSI-MI group has created a different GitHub repository. Best regards, Juan Antonio |