From: Angel P. <an...@ma...> - 2008-05-27 15:53:08
|
Still don't agree with this. In your last slide, both of the protein determinations are the result IMHO. If you want to highlight one or the other as "The Result" , create a file w/ just that analysis. -angel On Tue, May 27, 2008 at 11:42 AM, Martin Eisenacher < mar...@ru...> wrote: > Dear colleagues! > > In the last Telecon I tried(!) to > argument for an attribute marking the > "actual" analysis of an AnalysisXML > file. > > We agreed to let me assemble some > descriptive slides (attached). > > I hope, with them my point gets more > clear. I think it is not > only a philosophical question, because > it has consequences for programming, > tools and databases... > > Bye > Martin > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Jones, A. <And...@li...> - 2008-05-28 11:11:13
|
Hi Martin, Thanks for your slides clarifying this issue. I think I tend to agree with Angel that the final result (on slide 4) is just ProteinDetection 2, what other purpose does ProteinDetection 1 serve in the file? The only use case I can come up with if ProteinDetection 1 is an intermediate set from on which some additional processing happens? However, I don’t see that we actually have a protocol structure defined in which this could be described, i.e. ProteinSet1 à ProtocolApp àProteinSet2 If we take this to the logical conclusion, perhaps we should restrict an AnalysisXML file to containing only 1 (or 0) ProteinDetectionLists...? There is a related point that I don’t think we have addressed completely yet, which is: “What set of peptides / proteins does the creator of the file consider to be acceptable i.e. correct according to their threshold” As an example, given a protein hypothesis, it might be acceptable to report all the peptides that can be matched to this protein, even if some of the peptide identifications are very weak. Similarly, only by including a long list of proteins (e.g. including decoy proteins), can certain protein scores be calculated. So are we suggesting: 1) The file contains the peptides and proteins that the file producer considers correct i.e. above a threshold 2) Do we need to give a mechanism for reporting as much as they like (e.g. all peptides and proteins that could be correct) with a flag for saying – I consider these to be correct and, for example, you should only load these into a database... I think supporting use case 2 is actually very difficult because communicating a threshold is not simple - thresholds can be set on peptides, proteins, protein ambiguity groups, with interrelations between these e.g. ProteinProphet alters peptide probabilities based on protein level evidence. Therefore, my vote is that we only support use case 1. This means that if someone wants to communicate a complete set of possible peptide / proteins (including decoys / likely false positives), it can be done, but this would be a separate file from their “final result set”. Does everyone see what I’m getting at...? Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: 27 May 2008 16:53 To: Martin Eisenacher Cc: psi...@li... Subject: Re: [Psidev-pi-dev] analysis tree Still don't agree with this. In your last slide, both of the protein determinations are the result IMHO. If you want to highlight one or the other as "The Result" , create a file w/ just that analysis. -angel On Tue, May 27, 2008 at 11:42 AM, Martin Eisenacher <mar...@ru...> wrote: Dear colleagues! In the last Telecon I tried(!) to argument for an attribute marking the "actual" analysis of an AnalysisXML file. We agreed to let me assemble some descriptive slides (attached). I hope, with them my point gets more clear. I think it is not only a philosophical question, because it has consequences for programming, tools and databases... Bye Martin ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Angel P. <an...@ma...> - 2008-05-28 11:16:50
|
On Wed, May 28, 2008 at 7:11 AM, Jones, Andy <And...@li...> wrote: > > Therefore, my vote is that we only support use case 1. This means that if > someone wants to communicate a complete set of possible peptide / proteins > (including decoys / likely false positives), it can be done, but this would > be a separate file from their "final result set". > > > > Does everyone see what I'm getting at...? > > I do ;-) and am right there with you. Although to get at the question of user-defined thresholding, that in itself is an analysis, and should be communicated as such in the protocol + params. -angel > > > Cheers > > Andy > > > > > > > > > > > > > > > > *From:* psi...@li... [mailto: > psi...@li...] *On Behalf Of *Angel Pizarro > *Sent:* 27 May 2008 16:53 > *To:* Martin Eisenacher > *Cc:* psi...@li... > *Subject:* Re: [Psidev-pi-dev] analysis tree > > > > Still don't agree with this. In your last slide, both of the protein > determinations are the result IMHO. > > > > If you want to highlight one or the other as "The Result" , create a file > w/ just that analysis. > > > > -angel > > > > > > On Tue, May 27, 2008 at 11:42 AM, Martin Eisenacher < > mar...@ru...> wrote: > > Dear colleagues! > > In the last Telecon I tried(!) to > argument for an attribute marking the > "actual" analysis of an AnalysisXML > file. > > We agreed to let me assemble some > descriptive slides (attached). > > I hope, with them my point gets more > clear. I think it is not > only a philosophical question, because > it has consequences for programming, > tools and databases... > > Bye > Martin > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > -- > Angel Pizarro > Director, ITMAT Bioinformatics Facility > 806 Biological Research Building > 421 Curie Blvd. > Philadelphia, PA 19104-6160 > 215-573-3736 > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Jones, A. <And...@li...> - 2008-05-28 11:25:28
|
I do ;-) and am right there with you. Although to get at the question of user-defined thresholding, that in itself is an analysis, and should be communicated as such in the protocol + params. You mean peptide thresholding in the SpectrumIdentificationProtocol and protein thresholding in the ProteinDetectionProtocol? If so, we need examples of how this can be done using cvParams... From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 28 May 2008 12:17 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] analysis tree On Wed, May 28, 2008 at 7:11 AM, Jones, Andy <And...@li...> wrote: Therefore, my vote is that we only support use case 1. This means that if someone wants to communicate a complete set of possible peptide / proteins (including decoys / likely false positives), it can be done, but this would be a separate file from their "final result set". Does everyone see what I'm getting at...? I do ;-) and am right there with you. Although to get at the question of user-defined thresholding, that in itself is an analysis, and should be communicated as such in the protocol + params. -angel Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: 27 May 2008 16:53 To: Martin Eisenacher Cc: psi...@li... Subject: Re: [Psidev-pi-dev] analysis tree Still don't agree with this. In your last slide, both of the protein determinations are the result IMHO. If you want to highlight one or the other as "The Result" , create a file w/ just that analysis. -angel On Tue, May 27, 2008 at 11:42 AM, Martin Eisenacher <mar...@ru...> wrote: Dear colleagues! In the last Telecon I tried(!) to argument for an attribute marking the "actual" analysis of an AnalysisXML file. We agreed to let me assemble some descriptive slides (attached). I hope, with them my point gets more clear. I think it is not only a philosophical question, because it has consequences for programming, tools and databases... Bye Martin ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Angel P. <an...@ma...> - 2008-05-28 11:29:47
|
On Wed, May 28, 2008 at 7:25 AM, Jones, Andy <And...@li...> wrote: > I do ;-) and am right there with you. Although to get at the question of > user-defined thresholding, that in itself is an analysis, and should be > communicated as such in the protocol + params. > > > > You mean peptide thresholding in the SpectrumIdentificationProtocol and > protein thresholding in the ProteinDetectionProtocol? If so, we need > examples of how this can be done using cvParams... > > No I mean that there should be a generic DataFiltering protocol to define arbitrary data filtering operations. And yes, we need some examples of this for the CV and schema. -angel > > > > > > > > > *From:* an...@it... [mailto:an...@it...] *On Behalf Of > *Angel Pizarro > *Sent:* 28 May 2008 12:17 > *To:* Jones, Andy > > *Cc:* psi...@li... > *Subject:* Re: [Psidev-pi-dev] analysis tree > > > > On Wed, May 28, 2008 at 7:11 AM, Jones, Andy <And...@li...> > wrote: > > > > Therefore, my vote is that we only support use case 1. This means that if > someone wants to communicate a complete set of possible peptide / proteins > (including decoys / likely false positives), it can be done, but this would > be a separate file from their "final result set". > > > > Does everyone see what I'm getting at...? > > > I do ;-) and am right there with you. Although to get at the question of > user-defined thresholding, that in itself is an analysis, and should be > communicated as such in the protocol + params. > > -angel > > > > Cheers > > Andy > > > > > > > > > > > > > > > > *From:* psi...@li... [mailto: > psi...@li...] *On Behalf Of *Angel Pizarro > *Sent:* 27 May 2008 16:53 > *To:* Martin Eisenacher > *Cc:* psi...@li... > *Subject:* Re: [Psidev-pi-dev] analysis tree > > > > Still don't agree with this. In your last slide, both of the protein > determinations are the result IMHO. > > > > If you want to highlight one or the other as "The Result" , create a file > w/ just that analysis. > > > > -angel > > > > > > On Tue, May 27, 2008 at 11:42 AM, Martin Eisenacher < > mar...@ru...> wrote: > > Dear colleagues! > > In the last Telecon I tried(!) to > argument for an attribute marking the > "actual" analysis of an AnalysisXML > file. > > We agreed to let me assemble some > descriptive slides (attached). > > I hope, with them my point gets more > clear. I think it is not > only a philosophical question, because > it has consequences for programming, > tools and databases... > > Bye > Martin > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > -- > Angel Pizarro > Director, ITMAT Bioinformatics Facility > 806 Biological Research Building > 421 Curie Blvd. > Philadelphia, PA 19104-6160 > 215-573-3736 > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > -- > Angel Pizarro > Director, ITMAT Bioinformatics Facility > 806 Biological Research Building > 421 Curie Blvd. > Philadelphia, PA 19104-6160 > 215-573-3736 > -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Jones, A. <And...@li...> - 2008-05-28 11:33:36
|
No I mean that there should be a generic DataFiltering protocol to define arbitrary data filtering operations. And yes, we need some examples of this for the CV and schema. But then this gets back to the problem that Martin highlighted of having multiple peptide and protein sets within one file... PeptideSet1 à DataFiltering à PeptideSet2 PeptideSet2 à ProteinDetection à ProteinSet1 ProteinSet1 à DataFiltering à ProteinSet2 Where ProteinSet2 is the “final” results... From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 28 May 2008 12:30 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] analysis tree On Wed, May 28, 2008 at 7:25 AM, Jones, Andy <And...@li...> wrote: I do ;-) and am right there with you. Although to get at the question of user-defined thresholding, that in itself is an analysis, and should be communicated as such in the protocol + params. You mean peptide thresholding in the SpectrumIdentificationProtocol and protein thresholding in the ProteinDetectionProtocol? If so, we need examples of how this can be done using cvParams... No I mean that there should be a generic DataFiltering protocol to define arbitrary data filtering operations. And yes, we need some examples of this for the CV and schema. -angel From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 28 May 2008 12:17 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] analysis tree On Wed, May 28, 2008 at 7:11 AM, Jones, Andy <And...@li...> wrote: Therefore, my vote is that we only support use case 1. This means that if someone wants to communicate a complete set of possible peptide / proteins (including decoys / likely false positives), it can be done, but this would be a separate file from their "final result set". Does everyone see what I'm getting at...? I do ;-) and am right there with you. Although to get at the question of user-defined thresholding, that in itself is an analysis, and should be communicated as such in the protocol + params. -angel Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Angel Pizarro Sent: 27 May 2008 16:53 To: Martin Eisenacher Cc: psi...@li... Subject: Re: [Psidev-pi-dev] analysis tree Still don't agree with this. In your last slide, both of the protein determinations are the result IMHO. If you want to highlight one or the other as "The Result" , create a file w/ just that analysis. -angel On Tue, May 27, 2008 at 11:42 AM, Martin Eisenacher <mar...@ru...> wrote: Dear colleagues! In the last Telecon I tried(!) to argument for an attribute marking the "actual" analysis of an AnalysisXML file. We agreed to let me assemble some descriptive slides (attached). I hope, with them my point gets more clear. I think it is not only a philosophical question, because it has consequences for programming, tools and databases... Bye Martin ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Angel P. <an...@ma...> - 2008-05-28 12:43:16
|
On Wed, May 28, 2008 at 7:33 AM, Jones, Andy <And...@li...> wrote: > No I mean that there should be a generic DataFiltering protocol to define > arbitrary data filtering operations. And yes, we need some examples of this > for the CV and schema. > > > > But then this gets back to the problem that Martin highlighted of having > multiple peptide and protein sets within one file... > > > > PeptideSet1 à DataFiltering à PeptideSet2 > > PeptideSet2 à ProteinDetection à ProteinSet1 > > ProteinSet1 à DataFiltering à ProteinSet2 > > > Where ProteinSet2 is the "final" results... > > > Simply reconstructing this graph states what most would call "final" is ProteinSet2. Martin's example was more ambiguous: PeptideSet1 -> ProteinDetection1 -> ProteinSet1 PeptideSet1 -> ProteinDetection2 -> ProteinSet2 Reconstructing this graph would in no way tell you what the author of the file meant to be their canonical result. There is no amount of schema or CV encoding that will automatically disambiguate this for all cases, other than a simple label which Martin proposes. Frankly I think that while it would "work" it is not such a good idea to create a set schema element to essentially encourage bad encoding of results. BTW, FuGE has the same issue. All the results are valid results and can stand on their own. Ultimately it is the consumer that will determine what to label as "final". A way to get rid of all of these issues from axml is to move it even closer to mzML and not encode any workflow. A file would restrict itself to just a single node of a workflow that references into some input file. E.g. (nodes are individual files): mzML1 -> axml1 (peptide ids : mascot) mzML1 -> axml2 (peptide ids : sequest) axml1 -> axml3 (protein determination : mascot) [axml1, axm2] -> axml5 (peptide determination : peptideprophet) axml5 -> axml6 (protein determination : proteinprophet) etc. etc. In this case the final result is a file and as such unambiguously encoded. -angel |
From: Martin E. <mar...@ru...> - 2008-05-28 15:06:36
|
OK, difficult to track this discussion. Summary: 1. longer chain of analyses (as discussed by Andy): not modeled, all okay 2. data filtering issue (my use case: quality assessment): its parameters and results are currently modeled "next to" peptide and protein detection (I will start a discussion on that later, when my use case is assembled ;-) ) 3. tree-like structure of (protein) analyses: is currently possible, makes sense (Angel) / no sense (Andy) My opinion: I (of course) do not insist on an "actual analysis" attribute; I can imagine two possibilities: i) problem could be ignored in the schema and judged later by a "semantic validator" as "wrong"; ii) if we generally want to forbid that, we could allow zero or one ProteinDetect under AnalysisCollection. With ii) we can have EITHER one spectrum ident OR many spectrum idents OR one spectrum ident and one protein detect OR many spectrum idents and one protein detect. That is "a bit" workflow, but without tree-like protein analyses. I would prefer that to the file solution suggested by Angel, because ontologies, databases, samples, . are not doubled and we have less problems with moving results (partial file copy or uploading into database). My intuition is, that quantification fits into that suggestion, but that is no argument at the moment. ;-) Bye Martin Von: psi...@li...urceforge. net [mailto:psi...@li...ur ceforge.net] Im Auftrag von Angel Pizarro Gesendet: Wednesday, May 28, 2008 2:43 PM An: Jones, Andy Cc: psi...@li... Betreff: Re: [Psidev-pi-dev] analysis tree On Wed, May 28, 2008 at 7:33 AM, Jones, Andy <And...@li...> wrote: No I mean that there should be a generic DataFiltering protocol to define arbitrary data filtering operations. And yes, we need some examples of this for the CV and schema. But then this gets back to the problem that Martin highlighted of having multiple peptide and protein sets within one file... PeptideSet1 --> DataFiltering --> PeptideSet2 PeptideSet2 --> ProteinDetection --> ProteinSet1 ProteinSet1 --> DataFiltering --> ProteinSet2 Where ProteinSet2 is the "final" results... Simply reconstructing this graph states what most would call "final" is ProteinSet2. Martin's example was more ambiguous: PeptideSet1 -> ProteinDetection1 -> ProteinSet1 PeptideSet1 -> ProteinDetection2 -> ProteinSet2 Reconstructing this graph would in no way tell you what the author of the file meant to be their canonical result. There is no amount of schema or CV encoding that will automatically disambiguate this for all cases, other than a simple label which Martin proposes. Frankly I think that while it would "work" it is not such a good idea to create a set schema element to essentially encourage bad encoding of results. BTW, FuGE has the same issue. All the results are valid results and can stand on their own. Ultimately it is the consumer that will determine what to label as "final". A way to get rid of all of these issues from axml is to move it even closer to mzML and not encode any workflow. A file would restrict itself to just a single node of a workflow that references into some input file. E.g. (nodes are individual files): mzML1 -> axml1 (peptide ids : mascot) mzML1 -> axml2 (peptide ids : sequest) axml1 -> axml3 (protein determination : mascot) [axml1, axm2] -> axml5 (peptide determination : peptideprophet) axml5 -> axml6 (protein determination : proteinprophet) etc. etc. In this case the final result is a file and as such unambiguously encoded. -angel |