From: Martin E. <mar...@ru...> - 2008-05-28 15:06:36
|
OK, difficult to track this discussion. Summary: 1. longer chain of analyses (as discussed by Andy): not modeled, all okay 2. data filtering issue (my use case: quality assessment): its parameters and results are currently modeled "next to" peptide and protein detection (I will start a discussion on that later, when my use case is assembled ;-) ) 3. tree-like structure of (protein) analyses: is currently possible, makes sense (Angel) / no sense (Andy) My opinion: I (of course) do not insist on an "actual analysis" attribute; I can imagine two possibilities: i) problem could be ignored in the schema and judged later by a "semantic validator" as "wrong"; ii) if we generally want to forbid that, we could allow zero or one ProteinDetect under AnalysisCollection. With ii) we can have EITHER one spectrum ident OR many spectrum idents OR one spectrum ident and one protein detect OR many spectrum idents and one protein detect. That is "a bit" workflow, but without tree-like protein analyses. I would prefer that to the file solution suggested by Angel, because ontologies, databases, samples, . are not doubled and we have less problems with moving results (partial file copy or uploading into database). My intuition is, that quantification fits into that suggestion, but that is no argument at the moment. ;-) Bye Martin Von: psi...@li...urceforge. net [mailto:psi...@li...ur ceforge.net] Im Auftrag von Angel Pizarro Gesendet: Wednesday, May 28, 2008 2:43 PM An: Jones, Andy Cc: psi...@li... Betreff: Re: [Psidev-pi-dev] analysis tree On Wed, May 28, 2008 at 7:33 AM, Jones, Andy <And...@li...> wrote: No I mean that there should be a generic DataFiltering protocol to define arbitrary data filtering operations. And yes, we need some examples of this for the CV and schema. But then this gets back to the problem that Martin highlighted of having multiple peptide and protein sets within one file... PeptideSet1 --> DataFiltering --> PeptideSet2 PeptideSet2 --> ProteinDetection --> ProteinSet1 ProteinSet1 --> DataFiltering --> ProteinSet2 Where ProteinSet2 is the "final" results... Simply reconstructing this graph states what most would call "final" is ProteinSet2. Martin's example was more ambiguous: PeptideSet1 -> ProteinDetection1 -> ProteinSet1 PeptideSet1 -> ProteinDetection2 -> ProteinSet2 Reconstructing this graph would in no way tell you what the author of the file meant to be their canonical result. There is no amount of schema or CV encoding that will automatically disambiguate this for all cases, other than a simple label which Martin proposes. Frankly I think that while it would "work" it is not such a good idea to create a set schema element to essentially encourage bad encoding of results. BTW, FuGE has the same issue. All the results are valid results and can stand on their own. Ultimately it is the consumer that will determine what to label as "final". A way to get rid of all of these issues from axml is to move it even closer to mzML and not encode any workflow. A file would restrict itself to just a single node of a workflow that references into some input file. E.g. (nodes are individual files): mzML1 -> axml1 (peptide ids : mascot) mzML1 -> axml2 (peptide ids : sequest) axml1 -> axml3 (protein determination : mascot) [axml1, axm2] -> axml5 (peptide determination : peptideprophet) axml5 -> axml6 (protein determination : proteinprophet) etc. etc. In this case the final result is a file and as such unambiguously encoded. -angel |