The article describing this approach is attached. The key is Equation (10).
Basically, we should have two decoy databases that are the same size as the target. In one of the decoys, both peptides are shuffled. In the other, only a single peptide is shuffled. Then if we set a score threshold, the estimated number of false positives is given by Equation (10): it's the total number of decoys above the threshold minus twice the number of D-D decoys above the threshold. If you take this value and divide by the total number of targets above the threshold, then you get the FDR. Importantly, though, this calculation has to be done separately for each kind of identification. For this purpose, I would be inclined to lump together the linear, selfloop and dead-end matches, and then treat the inters as one category and inter+inter-intra as another category.
I don't know if you want to try to implement this confidence estimation procedure within search-for-xlinks or put it into the assign-confidence command. I can see an argument for either approach.
Sean, can you clarify for me how search-for-xlinks currently generates decoys? For cross-links, I think we want the decoy database to contain three decoys per target. If the target links peptide A to B, then we want a decoy with (1) A shuffled and B not shuffled, (2) A not shuffled and B shuffled, and (3) both shuffled. If we do this, then I think we can use the Walzthoeni procedure.
The old sfx shuffled both peptides on the fly to generate FF. Currently,
the new sfx runs the search separately with a target peptide and decoy
peptide database to generate a list for TT and FF. To implement the W
procedure, I think it would be more time and space efficient if we used the
peptide decoy database to generate the TF, FT, and FF for each TT by
keeping track of each target peptide's associated pre-generated decoy(s).
Thanks,
Sean
On Fri, Jun 24, 2016 at 2:34 PM, William S Noble wsnoble@users.sf.net
wrote:
Yes, I think this makes sense. Do you have a sense for how difficult that would be to get working?