From: Angel P. <an...@ma...> - 2008-05-28 12:43:16
|
On Wed, May 28, 2008 at 7:33 AM, Jones, Andy <And...@li...> wrote: > No I mean that there should be a generic DataFiltering protocol to define > arbitrary data filtering operations. And yes, we need some examples of this > for the CV and schema. > > > > But then this gets back to the problem that Martin highlighted of having > multiple peptide and protein sets within one file... > > > > PeptideSet1 à DataFiltering à PeptideSet2 > > PeptideSet2 à ProteinDetection à ProteinSet1 > > ProteinSet1 à DataFiltering à ProteinSet2 > > > Where ProteinSet2 is the "final" results... > > > Simply reconstructing this graph states what most would call "final" is ProteinSet2. Martin's example was more ambiguous: PeptideSet1 -> ProteinDetection1 -> ProteinSet1 PeptideSet1 -> ProteinDetection2 -> ProteinSet2 Reconstructing this graph would in no way tell you what the author of the file meant to be their canonical result. There is no amount of schema or CV encoding that will automatically disambiguate this for all cases, other than a simple label which Martin proposes. Frankly I think that while it would "work" it is not such a good idea to create a set schema element to essentially encourage bad encoding of results. BTW, FuGE has the same issue. All the results are valid results and can stand on their own. Ultimately it is the consumer that will determine what to label as "final". A way to get rid of all of these issues from axml is to move it even closer to mzML and not encode any workflow. A file would restrict itself to just a single node of a workflow that references into some input file. E.g. (nodes are individual files): mzML1 -> axml1 (peptide ids : mascot) mzML1 -> axml2 (peptide ids : sequest) axml1 -> axml3 (protein determination : mascot) [axml1, axm2] -> axml5 (peptide determination : peptideprophet) axml5 -> axml6 (protein determination : proteinprophet) etc. etc. In this case the final result is a file and as such unambiguously encoded. -angel |