From: Philip J. <pj...@eb...> - 2006-07-11 10:12:10
|
Hi, Based upon the draft analysisXML UML model that Angel circulated a month or so ago, here are a few requirements that may need to be addressed. It is very likely that many of these requirements are already addressed in the draft model or by the FuGE model that it extends, especially taking into account that the majority of the classes in the diagram do not have fields included, but hopefully this list may provoke some discussion... The current draft UML model can be retrieved from: http://psidev.sourceforge.net/proteomics-informatics/documents/analysisUML.ppt Information that may need to be captured: 1. Polypeptide (mostly obvious stuff, probably just fields that are present but have not been included in the Powerpoint diagram): * Sequence (esp. for peptides) [0..1] * database name / version [0..1] - /Appears to be addressed by SequenceDatabase class, The current model suggests that all proteins identified in one XML need to be identified from the same database - presumably this is a safe assumption?/ * accession / accession version [0..1], mandatory for proteins? * database cross references [0..*] * Start and end coordinates of peptides in relation to any proteins that they identify [0..1] or [0..*] ?? * Upstream / Downstream flanking regions for peptides Will polypeptide be sub-classed to allow (for example) accession to be enforced for protein identifications? 2. Presumably the 'SearchProtocol' class will handle details of: * search engine identity * search engine version * search input parameters / settings (Is this CV parameterised stuff with CVs from individual search engine vendors?) 3. Protein modifications - CustomModification class: * Are the monoisotopicMass & averageMass values expected or observed? Does analysisXML need to store both? Are expected values [1..*]? 4. Protein modifications - Modification class * 'position' field declared as mandatory, however the position of the PTM is often unknown. Ability to handle 'fuzzy' or approximate location? (Does this last requirement even exist?) 5. SearchHypothesis / Search result: * As mentioned at the last PSI-PI teleconference, just to flag up the need for a clear plan of how to (for example) connect to a gel spot recorded in GelML format etc. Best regards, Phil. -- _______________________________________ Phil Jones Software Engineer BioSapiens / PRIDE Project Team Sequence Database Group EMBL-EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UK Tel +44 (0)1223 492 610 (Direct Line) mailto:pj...@eb... |