Menu

#339 Implement Waltzthoeni approach to cross-link FDR estimation

post v2.0
open
sjoemac
2016-06-24
2015-10-01
No

The article describing this approach is attached. The key is Equation (10).

Basically, we should have two decoy databases that are the same size as the target. In one of the decoys, both peptides are shuffled. In the other, only a single peptide is shuffled. Then if we set a score threshold, the estimated number of false positives is given by Equation (10): it's the total number of decoys above the threshold minus twice the number of D-D decoys above the threshold. If you take this value and divide by the total number of targets above the threshold, then you get the FDR. Importantly, though, this calculation has to be done separately for each kind of identification. For this purpose, I would be inclined to lump together the linear, selfloop and dead-end matches, and then treat the inters as one category and inter+inter-intra as another category.

I don't know if you want to try to implement this confidence estimation procedure within search-for-xlinks or put it into the assign-confidence command. I can see an argument for either approach.

1 Attachments

Discussion

  • William S Noble

    William S Noble - 2016-06-24
    • labels: --> High priority
     
  • William S Noble

    William S Noble - 2016-06-24

    Sean, can you clarify for me how search-for-xlinks currently generates decoys? For cross-links, I think we want the decoy database to contain three decoys per target. If the target links peptide A to B, then we want a decoy with (1) A shuffled and B not shuffled, (2) A not shuffled and B shuffled, and (3) both shuffled. If we do this, then I think we can use the Walzthoeni procedure.

     
    • sjoemac

      sjoemac - 2016-06-24

      The old sfx shuffled both peptides on the fly to generate FF. Currently,
      the new sfx runs the search separately with a target peptide and decoy
      peptide database to generate a list for TT and FF. To implement the W
      procedure, I think it would be more time and space efficient if we used the
      peptide decoy database to generate the TF, FT, and FF for each TT by
      keeping track of each target peptide's associated pre-generated decoy(s).

      Thanks,
      Sean

      On Fri, Jun 24, 2016 at 2:34 PM, William S Noble wsnoble@users.sf.net
      wrote:

      Sean, can you clarify for me how search-for-xlinks currently generates
      decoys? For cross-links, I think we want the decoy database to contain
      three decoys per target. If the target links peptide A to B, then we want a
      decoy with (1) A shuffled and B not shuffled, (2) A not shuffled and B
      shuffled, and (3) both shuffled. If we do this, then I think we can use the
      Walzthoeni procedure.


      ** [issues:#339] Implement Waltzthoeni approach to cross-link FDR
      estimation**

      Status: open
      Milestone: post v2.0
      Labels: High priority
      Created: Thu Oct 01, 2015 02:33 AM UTC by William S Noble
      Last Updated: Fri Jun 24, 2016 06:59 PM UTC
      Owner: sjoemac
      Attachments:

      The article describing this approach is attached. The key is Equation
      (10).

      Basically, we should have two decoy databases that are the same size as
      the target. In one of the decoys, both peptides are shuffled. In the
      other, only a single peptide is shuffled. Then if we set a score
      threshold, the estimated number of false positives is given by Equation
      (10): it's the total number of decoys above the threshold minus twice the
      number of D-D decoys above the threshold. If you take this value and
      divide by the total number of targets above the threshold, then you get the
      FDR. Importantly, though, this calculation has to be done separately for
      each kind of identification. For this purpose, I would be inclined to lump
      together the linear, selfloop and dead-end matches, and then treat the
      inters as one category and inter+inter-intra as another category.

      I don't know if you want to try to implement this confidence estimation
      procedure within search-for-xlinks or put it into the assign-confidence
      command. I can see an argument for either approach.


      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/cruxtoolkit/issues/339/>

      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>

       
  • William S Noble

    William S Noble - 2016-06-24

    Yes, I think this makes sense. Do you have a sense for how difficult that would be to get working?

     

Log in to post a comment.

MongoDB Logo MongoDB
Gen AI apps are built with MongoDB Atlas
Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.
Try Free →