Re: [Psidev-pi-dev] analysis tree

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Wed, May 28, 2008 at 7:33 AM, Jones, Andy <And...@li...>
wrote:

>  No I mean that there should be a generic DataFiltering protocol to define
> arbitrary data filtering operations. And yes, we need some examples of this
> for the CV and schema.
>
>
>
> But then this gets back to the problem that Martin highlighted of having
> multiple peptide and protein sets within one file...
>
>
>
> PeptideSet1 à DataFiltering à PeptideSet2
>
> PeptideSet2 à ProteinDetection à ProteinSet1
>
> ProteinSet1 à DataFiltering à ProteinSet2
>

>
> Where ProteinSet2 is the "final" results...
>
>
>
Simply reconstructing this graph states what most would call "final" is
ProteinSet2. Martin's example was more ambiguous:

PeptideSet1 -> ProteinDetection1 -> ProteinSet1
PeptideSet1 -> ProteinDetection2 -> ProteinSet2

Reconstructing this graph would in no way tell you what the author of
the file meant to be their canonical result. There is no amount of
 schema or CV encoding that will automatically disambiguate this for all
cases, other than a simple label which Martin proposes. Frankly I think that
while it would "work" it is not such a good idea to create a set schema
element to essentially encourage bad encoding of results.

BTW, FuGE has the same issue. All the results are valid results and
can stand on their own. Ultimately it is the consumer
 that will determine what to label as "final".

A way to get rid of all of these issues from axml is to move it even closer
to mzML and not encode any workflow. A file would restrict itself to just a
single node of a workflow that references into some input file. E.g. (nodes
are individual files):

mzML1 -> axml1 (peptide ids : mascot)
mzML1 -> axml2 (peptide ids : sequest)
axml1 -> axml3 (protein determination : mascot)
[axml1, axm2] -> axml5 (peptide determination : peptideprophet)
axml5 -> axml6 (protein determination : proteinprophet)
 etc. etc.

In this case the final result is a file and as such unambiguously encoded.

-angel