Re: [Psidev-pi-dev] analysis tree

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Martin,

Thanks for your slides clarifying this issue. I think I tend to agree with Angel that the final result (on slide 4) is just ProteinDetection 2, what other purpose does ProteinDetection 1 serve in the file? 

The only use case I can come up with if ProteinDetection 1 is an intermediate set from on which some additional processing happens? However, I don’t see that we actually have a protocol structure defined in which this could be described, i.e. ProteinSet1 à ProtocolApp àProteinSet2

If we take this to the logical conclusion, perhaps we should restrict an AnalysisXML file to containing only 1 (or 0) ProteinDetectionLists...?

There is a related point that I don’t think we have addressed completely yet, which is:

“What set of peptides / proteins does the creator of the file consider to be acceptable i.e. correct according to their threshold”

As an example, given a protein hypothesis, it might be acceptable to report all the peptides that can be matched to this protein, even if some of the peptide identifications are very weak. Similarly, only by including a long list of proteins (e.g. including decoy proteins), can certain protein scores be calculated. 

So are we suggesting:

1)      The file contains the peptides and proteins that the file producer considers correct i.e. above a threshold

2)      Do we need to give a mechanism for reporting as much as they like (e.g. all peptides and proteins that could be correct) with a flag for saying – I consider these to be correct and, for example, you should only load these into a database...

I think supporting use case 2 is actually very difficult because communicating a threshold is not simple - thresholds can be set on peptides, proteins, protein ambiguity groups, with interrelations between these e.g. ProteinProphet alters peptide probabilities based on protein level evidence. 

Therefore, my vote is that we only support use case 1. This means that if someone wants to communicate a complete set of possible peptide / proteins (including decoys / likely false positives), it can be done, but this would be a separate file from their “final result set”.

Does everyone see what I’m getting at...?

Cheers

Andy

From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro
Sent: 27 May 2008 16:53
To: Martin Eisenacher
Cc: psi...@li...
Subject: Re: [Psidev-pi-dev] analysis tree

Still don't agree with this. In your last slide, both of the protein determinations are the result IMHO. 

If you want to highlight one or the other as "The Result" , create a file w/ just that analysis.

-angel

On Tue, May 27, 2008 at 11:42 AM, Martin Eisenacher <mar...@ru...> wrote:

Dear colleagues!

In the last Telecon I tried(!) to
argument for an attribute marking the
"actual" analysis of an AnalysisXML
file.

We agreed to let me assemble some
descriptive slides (attached).

I hope, with them my point gets more
clear. I think it is not
only a philosophical question, because
it has consequences for programming,
tools and databases...

 Bye
  Martin

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Psidev-pi-dev mailing list
Psi...@li...
https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev

-- 
Angel Pizarro
Director, ITMAT Bioinformatics Facility
806 Biological Research Building
421 Curie Blvd.
Philadelphia, PA 19104-6160
215-573-3736