From: Pierre-Alain B. <pie...@is...> - 2008-06-27 07:54:02
|
Hi all, my 2cents there: David Creasy wrote: > Hi Jenny, > > Jennifer Siepen wrote: > >> Hi, >> >> I am in the process of trying to put together an example instance >> document for OMSSA and have a few questions. To make things more >> complicated I have gone for an example where I run the search on a >> concatenated forward/reverse database. >> > Nothing like jumping in at the deep end! > Definitely welcome in the game! > >> At the moment I have all the >> results in the analysisXML file i.e. in the ConceptualMoleculeCollection >> I am listing all proteins and peptides identified including the reverse >> sequences. I am unsure if (a) I am supposed to be listing all results >> and (b) if all results are supposed to be listed how I mark the reverse >> ones as decoy or does it not matter? >> > In some ways it doesn't matter, because they are just lists of > proteins/peptides. > > However, you might like to look at Martin's example which contains > results from Mascot and Sequest, and model forward/reverse on this: > > http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working12June/MPC_use_case_working12June.axml > > (See also: http://code.google.com/p/psi-pi/issues/detail?id=32) > For the proteins, if the reverse entries don't have different > accessions, you could use a different Database_ref. > For the peptides, to make it more human readable, you could encode > 'Reverse' into the id? > Maybe we can think already of the way the PEFF format (new proposed PSI common database format) deals with it: each entry has a DBprefix followed by a ":" and the accession number. In one DB file, you can concatenate as many DBs as you want. A forward and a reverse entry might have different prefixes,and the UniqueDBIdentifiers would look like sp:P00721 sp_rev:P00721 > >> I am also listing all results (forward and reverse) in DataCollection. >> > I'd recommend two sets of results: > <SpectrumIdentificationList id="OMSSA_forward"> > ... > > <SpectrumIdentificationList id="OMSSA_reverse"> > ... > > Why not, but is that a problem if a spectrum interpretation is both in fw and in rev ? does it have to be duplicated? > >> The next step for me would be to calculate false discovery rates based >> upon the OMSSA results and select 'good' peptides, I am not sure where >> these results would be reported? >> > And nor am I yet. One issue is that this is a 'dynamic' sort of thing. > For a particular cutoff expect value (or some rule), you might get x > hits from the forward database, and y hits from the reverse database. > For a different cutoff expect value, you would get x' and y' results. > AnalysisXML is (currently) expected to report for just one 'cutoff' - > i.e. a consumer of the analysisXML document couldn't recalculate the > value. So, the proteins reported (from the forward / and reverse > database) are the list for the cutoff decided by the producer of the > file. We will discuss this in a conference call > > If the search params set the cutoffs, you can calculate a single value. If you calculate the FDR as post processing, it would look like an additional analysis, therefore formally a new Analysis. But I believe you could set more than one Analysis set of params and generate the AnalysisXML from the end result. >> A quick question relates to the 'PeptideEvidence'. One of the attributes >> is "pre" as in the previous flanking sequence. If my peptide is the >> N-terminal peptide what would pre be? pre="" or pre="-"? or does it not >> matter? >> > We just need to decide and document - maybe at the conference call later > today. > > >> Finally the database searched was a custom database, is there anywhere >> to report how a database was generated? >> > Possibly outside the scope of AnalysisXML. > Don't we have a source information for the searched database? Cheers, Pierre-Alain > >> Sometimes we also search >> peptide databases i.e. the database would have the same number of >> 'protein' entries as the original but there would only be one peptide >> per protein would I be able to report how many peptides are in the >> underlying database searched - would it be a cvParam? >> > This was something we discussed briefly on 2008-05-15: > http://www.psidev.info/index.php?q=node/325 > > We need the number of residues and sequences, although we don't > currently have a record of the number of peptides in the database. > Discussion of how to specify databases at: > > http://code.google.com/p/psi-pi/issues/detail?id=31 > > So, maybe you could add some notes there? > > > David > > >> Thanks, >> >> Jenny >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> > > |