Re: [Psidev-pi-dev] OMSSA example doc - a few questions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi all,
my 2cents there:

David Creasy wrote:
> Hi Jenny,
>
> Jennifer Siepen wrote:
>   
>> Hi,
>>
>> I am in the process of trying to put together an example instance 
>> document for OMSSA and have a few questions.  To make things more 
>> complicated I have gone for an example where I run the search on a 
>> concatenated forward/reverse database.  
>>     
> Nothing like jumping in at the deep end!
>   
Definitely welcome in the game!
>   
>> At the moment I have all the 
>> results in the analysisXML file i.e. in the ConceptualMoleculeCollection 
>> I am listing all proteins and peptides identified including the reverse 
>> sequences.  I am unsure if (a) I am supposed to be listing all results 
>> and (b) if all results are supposed to be listed how I mark the reverse 
>> ones as decoy or does it not matter?
>>     
> In some ways it doesn't matter, because they are just lists of 
> proteins/peptides.
>
> However, you might like to look at Martin's example which contains 
> results from Mascot and Sequest, and model forward/reverse on this:
>
> http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working12June/MPC_use_case_working12June.axml
>
> (See also: http://code.google.com/p/psi-pi/issues/detail?id=32)
> For the proteins, if the reverse entries don't have different 
> accessions, you could use a different Database_ref.
> For the peptides, to make it more human readable, you could encode 
> 'Reverse' into the id?
>   
Maybe we can think already of the way the PEFF format (new proposed PSI 
common database format) deals with it:
each entry has a DBprefix followed by a ":" and the accession number. In 
one DB file, you can concatenate as many DBs as you want. A forward and 
a reverse entry might have different prefixes,and the 
UniqueDBIdentifiers would look like

sp:P00721
sp_rev:P00721

>   
>> I am also listing all results (forward and reverse) in DataCollection.  
>>     
> I'd recommend two sets of results:
> <SpectrumIdentificationList id="OMSSA_forward">
> ...
>
> <SpectrumIdentificationList id="OMSSA_reverse">
> ...
>
>   
Why not, but is that a problem if a spectrum interpretation is both in 
fw and in rev ? does it have to be duplicated?
>   
>> The next step for me would be to calculate false discovery rates based 
>> upon the OMSSA results and select 'good' peptides, I am not sure where 
>> these results would be reported? 
>>     
> And nor am I yet. One issue is that this is a 'dynamic' sort of thing. 
> For a particular cutoff expect value (or some rule), you might get x 
> hits from the forward database, and y hits from the reverse database. 
> For a different cutoff expect value, you would get x' and y' results. 
> AnalysisXML is (currently) expected to report for just one 'cutoff' - 
> i.e. a consumer of the analysisXML document couldn't recalculate the 
> value. So, the proteins reported (from the forward / and reverse 
> database) are the list for the cutoff decided by the producer of the 
> file. We will discuss this in a conference call
>
>   
If the search params set the cutoffs, you can calculate a single value.
If you calculate the FDR as post processing, it would look like an 
additional analysis, therefore formally a new Analysis. But I believe 
you could set more than one Analysis set of params and generate the 
AnalysisXML from the end result.

>> A quick question relates to the 'PeptideEvidence'. One of the attributes 
>> is "pre" as in the previous flanking sequence.  If my peptide is the 
>> N-terminal peptide what would pre be? pre="" or  pre="-"? or does it not 
>> matter?
>>     
> We just need to decide and document - maybe at the conference call later 
> today.
>
>   
>> Finally the database searched was a custom database, is there anywhere 
>> to report how a database was generated?  
>>     
> Possibly outside the scope of AnalysisXML.
>   
Don't we have a source information for the searched database?

Cheers,
Pierre-Alain
>   
>> Sometimes we also search 
>> peptide databases i.e. the database would have the same number of 
>> 'protein' entries as the original but there would only be one peptide 
>> per protein would I be able to report how many peptides are in the 
>> underlying database searched - would it be a cvParam?
>>     
> This was something we discussed briefly on 2008-05-15:
> http://www.psidev.info/index.php?q=node/325
>
> We need the number of residues and sequences, although we don't 
> currently have a record of the number of peptides in the database.
> Discussion of how to specify databases at:
>
> http://code.google.com/p/psi-pi/issues/detail?id=31
>
> So, maybe you could add some notes there?
>
>
> David
>
>   
>> Thanks,
>>
>> Jenny
>>
>> -------------------------------------------------------------------------
>> Check out the new SourceForge.net Marketplace.
>> It's the best place to buy or sell services for
>> just about anything Open Source.
>> http://sourceforge.net/services/buy/index.php
>> _______________________________________________
>> Psidev-pi-dev mailing list
>> Psi...@li...
>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev
>>     
>
>