From: Jones, A. <And...@li...> - 2008-05-21 13:00:15
|
This structure looks good to me. I’ve forwarded to the PSI list. Andy From: phi...@go... [mailto:phi...@go...] On Behalf Of Phil Jones @ EBI Sent: 21 May 2008 12:29 To: Jones, Andy; David Creasy; Martin Eisenacher; Angel Pizarro; Sean L Seymour Subject: Re: Minutes? Hi, OK, so the example in the minutes was a little off the top of my head. Having thought a little more carefully about it, how about something like this (The example is silly, but perhaps it scales well and is easy to understand?): <filters> <filter> <filterType> <cvParam accession="PI:000NNN" name="taxonomy-filter" cvRef="PSI-PI" value=""/> </filterType> <include> <cvParam accession="9606" name="Homo sapiens" cvRef="NEWT"/> <cvParam accession="9031" name="Gallus gallus" cvRef="NEWT"/> </include> <exclude> <cvParam accession="10088" name="Mus" cvRef="NEWT"/> </exclude> </filter> <filters> best regards, Phil. PS - We seem to be in the habit of ignoring the mailing list for these discussions? 2008/5/21 Jones, Andy <And...@li...>: Something like this I think <Filters> <Filter> <_role> <pf:cvParam accession="PI:00015" name="inclusive" cvRef="PSI-PI" value=""/> </_role> <_filterValue> <pf:cvParam accession="PI:00015" name="database taxonomy filter" cvRef="PSI-PI" value="All entries"/> </_filterValue> </Filter> </Filters> From: David Creasy [mailto:dc...@ma...] Sent: 21 May 2008 12:13 To: Phil Jones @ EBI Cc: Jones, Andy; Martin Eisenacher; Angel Pizarro; Sean L Seymour Subject: Re: Minutes? Phil, >From the minutes (and now in the schema), I don't understand how to use <Filters> <Filter> <Role> <CvParam....> </Role> <Values / Options??> <CvParam/> <CvParam/> </Values> </Filter> </Filters> In the example file I've put up, I probably don't have what you intended: <Filters> <Filter> <_role> <pf:cvParam accession="PI:00015" name="database taxonomy filter" cvRef="PSI-PI" value="All entries"/> </_role> <_filterValue> <pf:cvParam accession="" name="" cvRef=""/> </_filterValue> </Filter> </Filters> Can you give an example. (I recollect the requirement for include and exclude lists) Thanks, David Jones, Andy wrote: Okay, done it's posted here: http://code.google.com/p/psi-pi/source/browse/trunk/schema/InheritedFromFuGE.xls And I've added a link in the related issue (19) in the issues list, Cheers Andy From: phi...@go... [mailto:phi...@go...] On Behalf Of Phil Jones @ EBI Sent: 21 May 2008 11:57 To: Jones, Andy Cc: Martin Eisenacher; David Creasy; Angel Pizarro; Sean L Seymour Subject: Re: Minutes? Hi Andy, I would suggest sticking it into the SVN repository and linking to the head version from the PSI-PI pages so that the link will always point to the latest version. Best regards, Phil. 2008/5/21 Jones, Andy <And...@li...>: Hi all, I've created an Excel file showing for each of the global element in the current schema what is inherited from FuGE (attached). It would be useful to spend a few minutes of the call discussing it. It's not obvious to me where I can post this on the PSI-PI site, I don't seem to be able to attach a file to a specific issue – any ideas? Cheers Andy From: Martin Eisenacher [mailto:mar...@ru...] Sent: 21 May 2008 08:40 To: Jones, Andy; 'David Creasy' Cc: 'Angel Pizarro'; 'Phil Jones @ EBI'; 'Sean L Seymour' Subject: AW: Minutes? I vote for dropping the „_something" style. Everytime I saw a PSI newby (but knowing XML) looking at an AnalysisXML use case, he/she was confused, whether the "_" is coding something. Bye Martin Von: Jones, Andy [mailto:And...@li...] Gesendet: Monday, May 19, 2008 5:57 PM An: David Creasy Cc: Angel Pizarro; Phil Jones @ EBI; Martin Eisenacher; Sean L Seymour Betreff: RE: Minutes? I think I've made all the changes (and one or two related changes on cardinalities etc.). I haven't had time to go through the whole schema checking all cardinalities so you'll likely find a few more than are not right. Feel free to mail me any more changes – it's v easy to do! > Also, <_analysisSoftwares> can only have one software package. and maybe it should be AnalysisSoftwareList? I've made this change although we should take a systematic decision about naming. There is a FuGE-related reason for having the "_something" style names in that in the object model these were derived from associations between UML classes, whereas all other XML elements were derived from a UML class itself. Given that the model is getting further away from being a true FuGE extension (and is unlikely to be able to make use of any FuGE related software) we can probably get rid of all the "_something" style names. Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 19 May 2008 16:00 To: Jones, Andy Cc: Angel Pizarro; Phil Jones @ EBI; Martin Eisenacher; Sean L Seymour Subject: Re: Minutes? Also, <_analysisSoftwares> can only have one software package. and maybe it should be AnalysisSoftwareList? <SpectrumIdentificationProtocol Software_ref should be: <SpectrumIdentificationProtocol AnalysisSoftware_ref (Sorry, I'll shut up soon!) David Creasy wrote: Thanks. Also, can only have one cvparam inside each PeptideEvidence. Not sure if this is intentional, and I can't currently think why you might want more... Jones, Andy wrote: I'll try get a new version out by the end of the day to fix these... cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 19 May 2008 14:34 To: Jones, Andy Cc: Angel Pizarro; Phil Jones @ EBI; Martin Eisenacher; Sean L Seymour Subject: Re: Minutes? Thanks Andy, I no longer seem to be able to have a ProteinDetectionList and a SpectrumIdentificationList in the same AnalysisXML document. Have I misunderstood something? This is no good: <AnalysisData> <SpectrumIdentificationList identifier="Peptides"> ... <.SpectrumIdentificationList> <ProteinDetectionList ...? ... </ProteinDetectionList> </AnalysisData> And how do I add two ontologies: psi-pi and mod? And a couple of minor points: ProtocolDetectionResultSet_ref should be ProteinDetectionResultSet_ref ? - Changed "Sets" to "Lists" But missed a couple of references: SpectrumIdentificationSet_ref ProtocolDetectionResultSet_ref Thanks, David Jones, Andy wrote: Ok new version of the schema attached with changes from yesterday's call. Main changes: Some changes to the collection classes reachable from the AnalysisXML root element "DataCollection" à "_dataCollection" which has specific references to types of data e.g. InputFile, SpectralData and AnalysisResults; this prevents general FuGE classes for Internal and ExternalData from being used and makes it simpler to understand where particular data files should be specified Added _analysisSampleCollection with reference to FuGE GenericMaterial. We should test if this work for representing samples, otherwise we can build our own extension of Material Added a new extension of Software which allows Customizations to be specified reachable from _analysisSoftwares (can we change this to _analysisSoftware?) We should discuss the order of the collection classes – we can shift these around if a more logical order is preferred? Extended SearchDatabase with the attributes / elements discussed on the call Adapted Protein grouping as discussed Username / email etc has not been added. This can be done with the FuGE Provider type which is already referenced. We can change this to the mzML specification if preferred? Changed "Sets" to "Lists" FeatureSet is still in the schema for now but not reachable from the root i.e. it can't make it into an instance document. I can get rid of completely if we want. I've attached a (mainly) auto-generated XML annotated with a few notes to demonstrate the main changes. Let me know if there's any problems. Likely to be a few mistakes with cardinalities etc. Cheers Andy From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 16 May 2008 14:39 To: Phil Jones @ EBI Cc: Jones, Andy; David Creasy; Martin Eisenacher; Sean L Seymour Subject: Re: Minutes? This is the structure (copied from openms) for identifying feature sets for quantification experiments. Not needed for v1 if indeed there will not be quantification there. -angel On Fri, May 16, 2008 at 9:35 AM, Phil Jones @ EBI <pj...@eb...> wrote: Hi Andy, This was an attempt to implement a general solution for quantitation, based upon the existing schema used by the openMS team. I suspect that it can be removed for the moment while we work on a robust solution. (Feature / convex hull is the concept that Andreas Bertsch described in his presentation at Toledo). best regards, Phil. 2008/5/16 Jones, Andy <And...@li...>: I'm doing a bit more of a thorough job tidying up the schema and re-arranging a few things to make it a bit simpler to understand and navigate. Anyone have any ideas what FeatureSet is? <FeatureSet> <Feature> <ConvexHull> <CoordinatePoint> Is any of this required?! Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 15 May 2008 14:28 To: Jones, Andy Cc: Martin Eisenacher; Phil Jones @ EBI; Sean L Seymour Subject: Re: Minutes? Hi All, Jones, Andy wrote: Hi all The mandatory association _types was an error in the FuGE light schema (it should have been optional), now fixed (new version to come). <pf:_types> <pf:cvParam accession="DP:4" name="" cvRef="PSI-PI"/> The purpose of the _types association was to give the type of Protocol/Software/Equipment, however this may be confusing and used incorrectly – as in the example so perhaps it should be removed altogether from FuGE light. These parameters should be _searchParams I think? I'm confused... <ProteinDetectionResult identifier="1" Sequence_ref="HSP7D_MANSE"> <ProteinDetectionHypothesis identifier="pgroup1_pdir1_pep1" Peptide_ref="put reference to peptide ID here" start="160" end="171" post="K" pre="I"> <_peptideEvidence> I agree that the current schema may be incorrect, but I think it should be: <ProteinDetectionResult identifier="1" Sequence_ref="HSP7D_MANSE"> <ProteinDetectionHypothesis identifier="pgroup1_pdir1_pep1" start="160" end="171" post="K" pre="I"> <_peptideEvidence Peptide_ref = "3_1"/> <_peptideEvidence Peptide_ref = "4_1"/> So... ProteinDetectionResult is a group of related proteins that cannot be unambiguously differentiated ProteinDetectionHypothesis is an individual proteins _peptideEvidence is the list of associations to individual peptides on which this hypothesis is based. It cannot be an attribute of ProteinDetectionHypothesis because multiple are required. Yes, I think this is correct. Certainly agrees with Seans' proposal: http://code.google.com/p/psi-pi/source/browse/trunk/examples/Use_case_Toledo_grouping_alt1_Sean.xml Are we happy with these names, they are not very clear about the purpose of the element? I'm not so sure about the names either... maybe discuss at the conference call. If we agree on this, I'll send out a new schema shortly? Yes please. Cheers Andy From: Martin Eisenacher [mailto:mar...@ru...] Sent: 15 May 2008 10:35 To: 'Andy Jones'; 'David Creasy'; 'Phil Jones @ EBI'; 'Sean L Seymour' Subject: AW: Minutes? Hi Andy, hi all! Thanks again for your schema work! Attached the Toledo use case corrected for the 14Mayworking schema. Some remarks: pf:cvParam directly under <ProteinDetectionResultSet>: make it optional? Directly under <SpectrumIdentificationProtocol>: pf:ContactRole and pf:_types look a little bit funny. Replace by pf:cvParam? Or make optional? <pf:ContactRole Contact_ref=""> <pf:_role cvParam_ref=""></pf:_role> </pf:ContactRole> <pf:_types> <pf:cvParam accession="DP:4" name="" cvRef="PSI-PI"/> Andy: > As I understand it, the protein grouping works as is, with no change, correct? I think there is one level lost from Seans original idea. The current schema lists the proteins (ProteinDetectionResult) and the Peptides (ProteinDetectionHypothesis). As I understood, Sean wanted protein groups (ProteinDetectionResult) with concrete Accessions (isoforms) (ProteinDetectionHypothesis) with their corresponding Peptide Evidences (PeptideEvidence). For that we need a good use case as discussed earlier. Independent from what we decide to do, the current schema is somewhat funny, because it enforces having _peptideEvidence elements with SpectrumIdentificationRef elements under the ProteinDetectionHypothesis element: <ProteinDetectionResultSet identifier="Proteins1"> <ProteinDetectionResult identifier="1" Sequence_ref="HSP7D_MANSE"> <ProteinDetectionHypothesis identifier="pgroup1_pdir1_pep1" Peptide_ref="put reference to peptide ID here" start="160" end="171" post="K" pre="I"> <_peptideEvidence> <!-- DELETE? reference to peptide done with attribute! From THERE ref to spectrum--> <SpectrumIdentificationItem_ref/> </_peptideEvidence> I think, the Peptide_ref attribute is sufficient, from there the spectrum is refrenced. Bye from Bochum Martin Von: Jones, Andy [mailto:And...@li...] Gesendet: Wednesday, May 14, 2008 6:17 PM An: David Creasy; Sean L Seymour; Martin Eisenacher Cc: Phil Jones @ EBI Betreff: RE: Minutes? Hi all, New version of the schema attached. Can you have a quick sanity check before posting on the list for discussion tomorrow. A few minor changes to FuGElight as well so a new schema for that is attached as well, Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 14 May 2008 15:17 To: Jones, Andy; Sean L Seymour; Martin Eisenacher Cc: Phil Jones @ EBI Subject: Re: Minutes? Andy, I'm not sure whether the minutes are sufficient for you to produce another rev. of the schema? For the protein grouping, I think that assuming we go with Sean's proposal we don't need to make any further changes? David Phil Jones @ EBI wrote: Hi David, Sorry about the delay - please find them pasted below. I will get these on to the web site this afternoon. Best regards, Phil. Attendees: Phil Jones Andy Jones Andreas Bertsch David Creasy Martin Eisenacher Julian Selley Jennifer Siepen Pierre-Alain Binz Angel sends his apologies. Minutes: 1. Going through schema changes Martin has completed an updated example document. Some comments: This new document fits the use cases described in Toledo. Mostly changes of element names. Currently missing is the new level required for protein ambiguity. * Do we still want DeterminationResultSet? * Some CV params were added * Still need a reference from the DeterminationResultSet to the Molecule. Andy - to follow FuGE rules, need to follow a convention to know what a ref is referring to. The convention in FuGE is Name-of-element_ref. It was agreed that this should be used in the existing schema. Do we need a convention for the actual identifier? In FuGE - can be anything, as difficult to enforce. Will stick with this policy for analysisXML. How is ambiguity encoded? Move ProteinDetectionResult one level deeper? Explanation of requirement: Two proteins, three peptides in common, but one different. While the grouping may be achievable based upon references, this grouping captures associations determined by a specific algorithm. Andy suggested that the grouping could be separated out into another data structure, that was referenced from the individual protein identifications. The final decision on this will be deferred until Sean is available - does the peptide evidence change if the grouping is done differently? Action: David to check with Sean. Example XML file has an error - the ProteinDetectionHypothesis should be PolyPeptideEvidence in file Use_case_Toledo_wo_grouping_working7May.xml. Andy: Extra classes. Renamed elements, and already abstract classes present. e.g. AnalysisResult - changed to AnalyteDetection. This would then be the extension point. Andy suggested the additional abstract layer should remain as an explicit extension point for future analyses. Going through issue list: Issue of attributes vs. elements. Keep as current. Protein Sequence should perhaps be an element as it can be very long. Element name 'Seq' to allow for both nucleic acid and protein sequences. Question - moving values from the CV to the schema? David reminded the group of the original suggestion that anything that is present in two or more search engines should be in the schema. Specific terms: Taxonomy filter Number of sequences searched - move into the schema, make optional. Mass tolerance? MIAPE requirements as candidates? Many of these things still need CV to name them. Pierre-Alain to send around the current version of the MIAPE documents to look for candidate data items that should be explicitly in the schema. Spectrum Reference - DONE. Martin / David to look at the current changes to mzML to ensure that spectrum refs are consistent with the latest version. Issues of cardinality. (optional or not?) Action: Andy will reflect any agreed changes in the schema. Quality assessment analysis is missing from the list of use cases. Need to be able to see the quality assessment e.g. false discovery rate, so this can be changed to modify the result list. (i.e. search of randomised database.) Input parameters are the pattern for the false proteins and the false discovery rate threshold, then report the rank of the proteins and their local false discovery estimations. However, this could go into the normal parameters for the protein identification. Next PSI meeting at 16:00 (London time) 15 May, 2008. -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 -- Phil Jones Senior Software Engineer PRIDE Project Team PANDA Group, EMBL-EBI Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD UK. Work phone: +44 1223 492610 Skype: philip-jones -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 -- Phil Jones Senior Software Engineer PRIDE Project Team PANDA Group, EMBL-EBI Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD UK. Work phone: +44 1223 492610 Skype: philip-jones -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 -- Phil Jones Senior Software Engineer PRIDE Project Team PANDA Group, EMBL-EBI Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD UK. Work phone: +44 1223 492610 Skype: philip-jones |