Crux-Toolkit / Issues / #339 Implement Waltzthoeni approach to cross-link FDR estimation

#339 Implement Waltzthoeni approach to cross-link FDR estimation

Milestone: post v2.0

Status: open

Owner: sjoemac

Labels: High priority (56)

Updated: 2016-06-24

Created: 2015-10-01

Creator: William S Noble

Private: No

The article describing this approach is attached. The key is Equation (10).

Basically, we should have two decoy databases that are the same size as the target. In one of the decoys, both peptides are shuffled. In the other, only a single peptide is shuffled. Then if we set a score threshold, the estimated number of false positives is given by Equation (10): it's the total number of decoys above the threshold minus twice the number of D-D decoys above the threshold. If you take this value and divide by the total number of targets above the threshold, then you get the FDR. Importantly, though, this calculation has to be done separately for each kind of identification. For this purpose, I would be inclined to lump together the linear, selfloop and dead-end matches, and then treat the inters as one category and inter+inter-intra as another category.

I don't know if you want to try to implement this confidence estimation procedure within search-for-xlinks or put it into the assign-confidence command. I can see an argument for either approach.

1 Attachments

nmeth.2103.pdf

Discussion

William S Noble - 2016-06-24

labels: --> High priority
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

William S Noble - 2016-06-24

Sean, can you clarify for me how search-for-xlinks currently generates decoys? For cross-links, I think we want the decoy database to contain three decoys per target. If the target links peptide A to B, then we want a decoy with (1) A shuffled and B not shuffled, (2) A not shuffled and B shuffled, and (3) both shuffled. If we do this, then I think we can use the Walzthoeni procedure.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- sjoemac - 2016-06-24
  
  The old sfx shuffled both peptides on the fly to generate FF. Currently,
  the new sfx runs the search separately with a target peptide and decoy
  peptide database to generate a list for TT and FF. To implement the W
  procedure, I think it would be more time and space efficient if we used the
  peptide decoy database to generate the TF, FT, and FF for each TT by
  keeping track of each target peptide's associated pre-generated decoy(s).
  
  Thanks,
  Sean
  
  On Fri, Jun 24, 2016 at 2:34 PM, William S Noble wsnoble@users.sf.net
  wrote:
  
  Sean, can you clarify for me how search-for-xlinks currently generates
  decoys? For cross-links, I think we want the decoy database to contain
  three decoys per target. If the target links peptide A to B, then we want a
  decoy with (1) A shuffled and B not shuffled, (2) A not shuffled and B
  shuffled, and (3) both shuffled. If we do this, then I think we can use the
  Walzthoeni procedure.
  
  ** [issues:#339] Implement Waltzthoeni approach to cross-link FDR
  estimation**
  
  Status: open
  Milestone: post v2.0
  Labels: High priority
  Created: Thu Oct 01, 2015 02:33 AM UTC by William S Noble
  Last Updated: Fri Jun 24, 2016 06:59 PM UTC
  Owner: sjoemac
  Attachments:
  
  nmeth.2103.pdf
  (609.8 kB; application/pdf)
  
  The article describing this approach is attached. The key is Equation
  (10).
  
  Basically, we should have two decoy databases that are the same size as
  the target. In one of the decoys, both peptides are shuffled. In the
  other, only a single peptide is shuffled. Then if we set a score
  threshold, the estimated number of false positives is given by Equation
  (10): it's the total number of decoys above the threshold minus twice the
  number of D-D decoys above the threshold. If you take this value and
  divide by the total number of targets above the threshold, then you get the
  FDR. Importantly, though, this calculation has to be done separately for
  each kind of identification. For this purpose, I would be inclined to lump
  together the linear, selfloop and dead-end matches, and then treat the
  inters as one category and inter+inter-intra as another category.
  
  I don't know if you want to try to implement this confidence estimation
  procedure within search-for-xlinks or put it into the assign-confidence
  command. I can see an argument for either approach.
  
  Sent from sourceforge.net because you indicated interest in <
  https://sourceforge.net/p/cruxtoolkit/issues/339/>
  
  To unsubscribe from further messages, please visit <
  https://sourceforge.net/auth/subscriptions/>
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

William S Noble - 2016-06-24

Yes, I think this makes sense. Do you have a sense for how difficult that would be to get working?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Implement Waltzthoeni approach to cross-link FDR estimation

Software toolkit for tandem mass spectrometry analysis

Milestone

Searches

Help

#339 Implement Waltzthoeni approach to cross-link FDR estimation

Discussion