From: Jones, A. <And...@li...> - 2008-05-28 11:11:13
|
Hi Martin, Thanks for your slides clarifying this issue. I think I tend to agree with Angel that the final result (on slide 4) is just ProteinDetection 2, what other purpose does ProteinDetection 1 serve in the file? The only use case I can come up with if ProteinDetection 1 is an intermediate set from on which some additional processing happens? However, I don’t see that we actually have a protocol structure defined in which this could be described, i.e. ProteinSet1 à ProtocolApp àProteinSet2 If we take this to the logical conclusion, perhaps we should restrict an AnalysisXML file to containing only 1 (or 0) ProteinDetectionLists...? There is a related point that I don’t think we have addressed completely yet, which is: “What set of peptides / proteins does the creator of the file consider to be acceptable i.e. correct according to their threshold” As an example, given a protein hypothesis, it might be acceptable to report all the peptides that can be matched to this protein, even if some of the peptide identifications are very weak. Similarly, only by including a long list of proteins (e.g. including decoy proteins), can certain protein scores be calculated. So are we suggesting: 1) The file contains the peptides and proteins that the file producer considers correct i.e. above a threshold 2) Do we need to give a mechanism for reporting as much as they like (e.g. all peptides and proteins that could be correct) with a flag for saying – I consider these to be correct and, for example, you should only load these into a database... I think supporting use case 2 is actually very difficult because communicating a threshold is not simple - thresholds can be set on peptides, proteins, protein ambiguity groups, with interrelations between these e.g. ProteinProphet alters peptide probabilities based on protein level evidence. Therefore, my vote is that we only support use case 1. This means that if someone wants to communicate a complete set of possible peptide / proteins (including decoys / likely false positives), it can be done, but this would be a separate file from their “final result set”. Does everyone see what I’m getting at...? Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: 27 May 2008 16:53 To: Martin Eisenacher Cc: psi...@li... Subject: Re: [Psidev-pi-dev] analysis tree Still don't agree with this. In your last slide, both of the protein determinations are the result IMHO. If you want to highlight one or the other as "The Result" , create a file w/ just that analysis. -angel On Tue, May 27, 2008 at 11:42 AM, Martin Eisenacher <mar...@ru...> wrote: Dear colleagues! In the last Telecon I tried(!) to argument for an attribute marking the "actual" analysis of an AnalysisXML file. We agreed to let me assemble some descriptive slides (attached). I hope, with them my point gets more clear. I think it is not only a philosophical question, because it has consequences for programming, tools and databases... Bye Martin ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |