From: Jennifer S. <jen...@ma...> - 2008-06-26 12:28:00
|
Hi, I am in the process of trying to put together an example instance document for OMSSA and have a few questions. To make things more complicated I have gone for an example where I run the search on a concatenated forward/reverse database. At the moment I have all the results in the analysisXML file i.e. in the ConceptualMoleculeCollection I am listing all proteins and peptides identified including the reverse sequences. I am unsure if (a) I am supposed to be listing all results and (b) if all results are supposed to be listed how I mark the reverse ones as decoy or does it not matter? I am also listing all results (forward and reverse) in DataCollection. The next step for me would be to calculate false discovery rates based upon the OMSSA results and select 'good' peptides, I am not sure where these results would be reported? A quick question relates to the 'PeptideEvidence'. One of the attributes is "pre" as in the previous flanking sequence. If my peptide is the N-terminal peptide what would pre be? pre="" or pre="-"? or does it not matter? Finally the database searched was a custom database, is there anywhere to report how a database was generated? Sometimes we also search peptide databases i.e. the database would have the same number of 'protein' entries as the original but there would only be one peptide per protein would I be able to report how many peptides are in the underlying database searched - would it be a cvParam? Thanks, Jenny |
From: David C. <dc...@ma...> - 2008-06-26 14:07:21
|
Hi Jenny, Jennifer Siepen wrote: > Hi, > > I am in the process of trying to put together an example instance > document for OMSSA and have a few questions. To make things more > complicated I have gone for an example where I run the search on a > concatenated forward/reverse database. Nothing like jumping in at the deep end! > At the moment I have all the > results in the analysisXML file i.e. in the ConceptualMoleculeCollection > I am listing all proteins and peptides identified including the reverse > sequences. I am unsure if (a) I am supposed to be listing all results > and (b) if all results are supposed to be listed how I mark the reverse > ones as decoy or does it not matter? In some ways it doesn't matter, because they are just lists of proteins/peptides. However, you might like to look at Martin's example which contains results from Mascot and Sequest, and model forward/reverse on this: http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working12June/MPC_use_case_working12June.axml (See also: http://code.google.com/p/psi-pi/issues/detail?id=32) For the proteins, if the reverse entries don't have different accessions, you could use a different Database_ref. For the peptides, to make it more human readable, you could encode 'Reverse' into the id? > > I am also listing all results (forward and reverse) in DataCollection. I'd recommend two sets of results: <SpectrumIdentificationList id="OMSSA_forward"> ... <SpectrumIdentificationList id="OMSSA_reverse"> ... > The next step for me would be to calculate false discovery rates based > upon the OMSSA results and select 'good' peptides, I am not sure where > these results would be reported? And nor am I yet. One issue is that this is a 'dynamic' sort of thing. For a particular cutoff expect value (or some rule), you might get x hits from the forward database, and y hits from the reverse database. For a different cutoff expect value, you would get x' and y' results. AnalysisXML is (currently) expected to report for just one 'cutoff' - i.e. a consumer of the analysisXML document couldn't recalculate the value. So, the proteins reported (from the forward / and reverse database) are the list for the cutoff decided by the producer of the file. We will discuss this in a conference call > > A quick question relates to the 'PeptideEvidence'. One of the attributes > is "pre" as in the previous flanking sequence. If my peptide is the > N-terminal peptide what would pre be? pre="" or pre="-"? or does it not > matter? We just need to decide and document - maybe at the conference call later today. > > Finally the database searched was a custom database, is there anywhere > to report how a database was generated? Possibly outside the scope of AnalysisXML. > Sometimes we also search > peptide databases i.e. the database would have the same number of > 'protein' entries as the original but there would only be one peptide > per protein would I be able to report how many peptides are in the > underlying database searched - would it be a cvParam? This was something we discussed briefly on 2008-05-15: http://www.psidev.info/index.php?q=node/325 We need the number of residues and sequences, although we don't currently have a record of the number of peptides in the database. Discussion of how to specify databases at: http://code.google.com/p/psi-pi/issues/detail?id=31 So, maybe you could add some notes there? David > > Thanks, > > Jenny > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Pierre-Alain B. <pie...@is...> - 2008-06-27 07:54:02
|
Hi all, my 2cents there: David Creasy wrote: > Hi Jenny, > > Jennifer Siepen wrote: > >> Hi, >> >> I am in the process of trying to put together an example instance >> document for OMSSA and have a few questions. To make things more >> complicated I have gone for an example where I run the search on a >> concatenated forward/reverse database. >> > Nothing like jumping in at the deep end! > Definitely welcome in the game! > >> At the moment I have all the >> results in the analysisXML file i.e. in the ConceptualMoleculeCollection >> I am listing all proteins and peptides identified including the reverse >> sequences. I am unsure if (a) I am supposed to be listing all results >> and (b) if all results are supposed to be listed how I mark the reverse >> ones as decoy or does it not matter? >> > In some ways it doesn't matter, because they are just lists of > proteins/peptides. > > However, you might like to look at Martin's example which contains > results from Mascot and Sequest, and model forward/reverse on this: > > http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working12June/MPC_use_case_working12June.axml > > (See also: http://code.google.com/p/psi-pi/issues/detail?id=32) > For the proteins, if the reverse entries don't have different > accessions, you could use a different Database_ref. > For the peptides, to make it more human readable, you could encode > 'Reverse' into the id? > Maybe we can think already of the way the PEFF format (new proposed PSI common database format) deals with it: each entry has a DBprefix followed by a ":" and the accession number. In one DB file, you can concatenate as many DBs as you want. A forward and a reverse entry might have different prefixes,and the UniqueDBIdentifiers would look like sp:P00721 sp_rev:P00721 > >> I am also listing all results (forward and reverse) in DataCollection. >> > I'd recommend two sets of results: > <SpectrumIdentificationList id="OMSSA_forward"> > ... > > <SpectrumIdentificationList id="OMSSA_reverse"> > ... > > Why not, but is that a problem if a spectrum interpretation is both in fw and in rev ? does it have to be duplicated? > >> The next step for me would be to calculate false discovery rates based >> upon the OMSSA results and select 'good' peptides, I am not sure where >> these results would be reported? >> > And nor am I yet. One issue is that this is a 'dynamic' sort of thing. > For a particular cutoff expect value (or some rule), you might get x > hits from the forward database, and y hits from the reverse database. > For a different cutoff expect value, you would get x' and y' results. > AnalysisXML is (currently) expected to report for just one 'cutoff' - > i.e. a consumer of the analysisXML document couldn't recalculate the > value. So, the proteins reported (from the forward / and reverse > database) are the list for the cutoff decided by the producer of the > file. We will discuss this in a conference call > > If the search params set the cutoffs, you can calculate a single value. If you calculate the FDR as post processing, it would look like an additional analysis, therefore formally a new Analysis. But I believe you could set more than one Analysis set of params and generate the AnalysisXML from the end result. >> A quick question relates to the 'PeptideEvidence'. One of the attributes >> is "pre" as in the previous flanking sequence. If my peptide is the >> N-terminal peptide what would pre be? pre="" or pre="-"? or does it not >> matter? >> > We just need to decide and document - maybe at the conference call later > today. > > >> Finally the database searched was a custom database, is there anywhere >> to report how a database was generated? >> > Possibly outside the scope of AnalysisXML. > Don't we have a source information for the searched database? Cheers, Pierre-Alain > >> Sometimes we also search >> peptide databases i.e. the database would have the same number of >> 'protein' entries as the original but there would only be one peptide >> per protein would I be able to report how many peptides are in the >> underlying database searched - would it be a cvParam? >> > This was something we discussed briefly on 2008-05-15: > http://www.psidev.info/index.php?q=node/325 > > We need the number of residues and sequences, although we don't > currently have a record of the number of peptides in the database. > Discussion of how to specify databases at: > > http://code.google.com/p/psi-pi/issues/detail?id=31 > > So, maybe you could add some notes there? > > > David > > >> Thanks, >> >> Jenny >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> > > |
From: Martin E. <mar...@ru...> - 2008-07-03 09:12:36
|
Hi! > > I am in the process of trying to put together an example instance > > document for OMSSA and have a few questions. To make things more > > complicated I have gone for an example where I run the search on a > > concatenated forward/reverse database. Great! We need real use cases / example docs, otherwise our discussions are quite academic. > > At the moment I have all the > > results in the analysisXML file i.e. in the ConceptualMoleculeCollection > > I am listing all proteins and peptides identified including the reverse > > sequences. I am unsure if (a) I am supposed to be listing all results > > and (b) if all results are supposed to be listed how I mark the reverse > > ones as decoy or does it not matter? > In some ways it doesn't matter, because they are just lists of > proteins/peptides. I agree, what you list is your decision, but it would be helpful to report this decision. So e.g. a CVparam, that it was a reverse search; and a FDR threshold, if you list only the forward proteins below this threshold. > However, you might like to look at Martin's example which contains Originally I shouted out for an own Analysis type "Quality Assurance" but I was convinced that it is not necessary. In our (MPC) use case I decided to list ALL identified proteins, the forward and decoy; to mark the decoy, I reported a "decoy pattern" CVParam. I have no FDR threshold, or I could have set it to "1.0". The decoy pattern belongs to the ProteinDetermination, because in doing a SpectrumIdentification, it has no meaning. > > I am also listing all results (forward and reverse) in DataCollection. > I'd recommend two sets of results: > <SpectrumIdentificationList id="OMSSA_forward"> ... > <SpectrumIdentificationList id="OMSSA_reverse"> But you cannot specify two result sets of ONE SpectrumIdentification. So with this suggestion you would have to have one SpectrumIdentification for the forward and one for reverse. I used one and reported a decoy pattern. > AnalysisXML is (currently) expected to > report for just one 'cutoff' - > i.e. a consumer of the analysisXML > document couldn't recalculate the > value. Yes, we agreed to have another AnalysisXML for another cut-off. I should put that into the wiki page ;-) > > N-terminal peptide what would pre be? pre="" or pre="-"? > We just need to decide and document - > maybe at the conference call later > today. New issue 34; in SEQUEST it is "-"; Oh, I see, that David finished this issue just-in-time because we decided on that in TeleCon 26th June. ;-) It is in the wiki now... > > Finally the database searched was a > custom database, is there anywhere > > to report how a database was > generated? > Possibly outside the scope of > AnalysisXML. In Inputs we have SearchDatabase. Then follows DatabaseName. We could add DatabaseProperties... That would be quite useful to describe the type of decoy DB. Bye Martin |
From: Martin E. <mar...@ru...> - 2008-07-03 09:16:43
|
Forget my following comments: > In Inputs we have SearchDatabase. Then follows DatabaseName. > We could add DatabaseProperties... > That would be quite useful to describe the type of decoy DB. That is discussed in issue 31, I would be happy with Phils suggestion. Bye Martin |