Menu

#179 MatchCollection method extendTabDelimited corrupts delta_cn value

post v2.0
closed
Kaipo
None
2015-04-01
2014-07-11
Alice Cheng
No

In MatchCollection "extendTabDelimited(Database MatchFileReader& result_file, Database decoy_database"), for each match added to the MatchCollection, getDeltaCn() and getScore(DELTA_CN) gives different results. getDeltaCn() gives the correct value, while getScore(DELTA_CN) gives a very small number almost equivalent to 0. This causes corruption of delta_cn value after reading in a tab-delimited file and outputing in a different format.

A very quick superficial fix would be to call "match->setScore(DELTA_CN, delta_cn);" before the match is added to the collection, but there may be similar problems for other optional parameters.

Related

Issues: #179

Discussion

  • William S Noble

    William S Noble - 2014-07-11
    • labels: --> High priority
    • assigned_to: Kaipo
     
  • Kaipo

    Kaipo - 2014-07-14

    Can we just use getDeltaCn() to get the correct value? I don't know that we should use getScore(DELTA_CN) or setScore(DELTA_CN, delta_cn), because a delta cn is not really a score (though it is derived from one).

     
    • William S Noble

      William S Noble - 2014-07-15

      I haven't looked at the code, but I think one issue is that the deltaCn
      score that we receive as input might not be calculated correctly. I think
      we have to decide whether to fix this problem or just pass the incorrect
      deltaCn through directly. I am inclined to issue a one-time warning,
      something like:

      "Warning: on line %d of file %s the reported deltaCn value of %g is
      incorrect. The correct value should be %g/%g = %g. This value is not
      being corrected."

      The only case where we would actually calculate the score (rather than
      passing through the given score) is when no deltaCn appears in the input
      file but one is required for the output file.

      Does this make sense?

      Bill

      On Mon, Jul 14, 2014 at 3:05 PM, Kaipo kaipot@users.sf.net wrote:

      Can we just use getDeltaCn() to get the correct value? I don't know that
      we should use getScore(DELTA_CN) or setScore(DELTA_CN, delta_cn), because a
      delta cn is not really a score (though it is derived from one).


      Status: open
      Milestone: post v2.0
      Labels: High priority
      Created: Fri Jul 11, 2014 06:20 PM UTC by Alice Cheng
      Last Updated: Fri Jul 11, 2014 06:25 PM UTC
      Owner: Kaipo

      In MatchCollection "extendTabDelimited(Database MatchFileReader&
      result_file, Database
      decoy_database"), for each match added to the
      MatchCollection, getDeltaCn() and getScore(DELTA_CN) gives different
      results. getDeltaCn() gives the correct value, while getScore(DELTA_CN)
      gives a very small number almost equivalent to 0. This causes corruption of
      delta_cn value after reading in a tab-delimited file and outputing in a
      different format.

      A very quick superficial fix would be to call "match->setScore(DELTA_CN,
      delta_cn);" before the match is added to the collection, but there may be
      similar problems for other optional parameters.


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/cruxtoolkit/issues/179/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Issues: #179

  • Alice Cheng

    Alice Cheng - 2014-07-15

    I thought the input delta_cn was correct? Though I am not too sure what the correct value is, the input delta_cn was what I obtained from running tide-search with --compute-sp T. [obtainable from getDeltaCn()]

    Using only "getDeltaCn()" to get the correct value could work, but I think the current code for writing outputs (at least for PepXMLWriter) uses an iterator through all the scores. Therefore, this would mean editing all the writers to not use scores for output, and I think when converting a MatchCollection into a ProteinMatchCollection, there is an iteration through all the scores. There are probably also many other instances where an iteration through the scores are used. So this seems a lot more complicated.

    This is why I am worried that getScore(DELTA_CN) might not be the only incorrect score. In that function, I also see

    match->setZState(zstate_);
    match->setDeltaCn(delta_cn);
    match->setDeltaLCn(ln_delta_cn);
    match->setLnExperimentSize(ln_experiment_size);

    I'm not too sure how many of these values are scores, but I know at minimum DELTA_CN and DELTA_LCN are scores, so I have a feeling that getScore(DELTA_LCN) would be different from getDeltaLCn() as well since there are no calls to setScore.

     
    • William S Noble

      William S Noble - 2014-07-15

      So I am talking with Alice, and I think the issue is clearer. Basically,
      we are wondering why there are two apparently identical functions in
      Match.h:

      /
      * Must ask for score that has been computed
      \returns the match_mode score in the match object
      /
      FLOAT_T getScore(
      SCORER_TYPE_T match_mode ///< the working mode (SP, XCORR) -in
      );

      /*
      * gets the match delta_cn
      /
      FLOAT_T getDeltaCn();

      It seems like getScore(DELTA_CN) should yield the same result as
      getDeltaCn(), but it doesn't. Does anyone know why we have two of
      these? It looks like these actually access different places in
      memory. Can anyone tell us the history of these two parallel ways of
      accessing scores, so we can try to decide which one to eliminate?

      Thanks

      Bill

      On Tue, Jul 15, 2014 at 1:25 PM, Alice Cheng acheng94@users.sf.net wrote:

      I thought the input delta_cn was correct? Though I am not too sure what
      the correct value is, the input delta_cn was what I obtained from running
      tide-search with --compute-sp T. [obtainable from getDeltaCn()]

      Using only "getDeltaCn()" to get the correct value could work, but I think
      the current code for writing outputs (at least for PepXMLWriter) uses an
      iterator through all the scores. Therefore, this would mean editing all the
      writers to not use scores for output, and I think when converting a
      MatchCollection into a ProteinMatchCollection, there is an iteration
      through all the scores. There are probably also many other instances where
      an iteration through the scores are used. So this seems a lot more
      complicated.

      This is why I am worried that getScore(DELTA_CN) might not be the only
      incorrect score. In that function, I also see

      match->setZState(zstate_);
      match->setDeltaCn(delta_cn);
      match->setDeltaLCn(ln_delta_cn);
      match->setLnExperimentSize(ln_experiment_size);

      I'm not too sure how many of these values are scores, but I know at
      minimum DELTA_CN and DELTA_LCN are scores, so I have a feeling that
      getScore(DELTA_LCN) would be different from getDeltaLCn() as well since
      there are no calls to setScore.


      Status: open
      Milestone: post v2.0
      Labels: High priority
      Created: Fri Jul 11, 2014 06:20 PM UTC by Alice Cheng
      Last Updated: Mon Jul 14, 2014 10:05 PM UTC
      Owner: Kaipo

      In MatchCollection "extendTabDelimited(Database MatchFileReader&
      result_file, Database
      decoy_database"), for each match added to the
      MatchCollection, getDeltaCn() and getScore(DELTA_CN) gives different
      results. getDeltaCn() gives the correct value, while getScore(DELTA_CN)
      gives a very small number almost equivalent to 0. This causes corruption of
      delta_cn value after reading in a tab-delimited file and outputing in a
      different format.

      A very quick superficial fix would be to call "match->setScore(DELTA_CN,
      delta_cn);" before the match is added to the collection, but there may be
      similar problems for other optional parameters.


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/cruxtoolkit/issues/179/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Issues: #179

  • William S Noble

    William S Noble - 2014-07-16

    Thanks, Sean.

    Alice, can you investigate the code and see which programs are still using
    the old, getdeltacn and setdeltacn functions? Maybe we can just eliminate
    these entirely.

    Bill

    On Tue, Jul 15, 2014 at 8:20 PM, Sean McIlwain sjoemac@gmail.com wrote:

    Historically I think the getdeltacn and setdeltacn existed before the sets
    core and getscore. I don't understand why it was this way but I'd vote to
    move it into setScore since it is a score and it would reduce the amount of
    methods in the class. Also it seems cleaner to me as well.

    Thanks
    Sean

    Sent from my iPhone

    On Jul 15, 2014, at 5:21 PM, William S Noble thabangh@gmail.com wrote:

    So I am talking with Alice, and I think the issue is clearer. Basically,
    we are wondering why there are two apparently identical functions in
    Match.h:

    /
    * Must ask for score that has been computed
    \returns the match_mode score in the match object
    /
    FLOAT_T getScore(
    SCORER_TYPE_T match_mode ///< the working mode (SP, XCORR) -in
    );

    /*
    * gets the match delta_cn
    /
    FLOAT_T getDeltaCn();

    It seems like getScore(DELTA_CN) should yield the same result as getDeltaCn(), but it doesn't. Does anyone know why we have two of these? It looks like these actually access different places in memory. Can anyone tell us the history of these two parallel ways of accessing scores, so we can try to decide which one to eliminate?

    Thanks

    Bill

    On Tue, Jul 15, 2014 at 1:25 PM, Alice Cheng acheng94@users.sf.net
    wrote:

    I thought the input delta_cn was correct? Though I am not too sure what
    the correct value is, the input delta_cn was what I obtained from running
    tide-search with --compute-sp T. [obtainable from getDeltaCn()]

    Using only "getDeltaCn()" to get the correct value could work, but I
    think the current code for writing outputs (at least for PepXMLWriter) uses
    an iterator through all the scores. Therefore, this would mean editing all
    the writers to not use scores for output, and I think when converting a
    MatchCollection into a ProteinMatchCollection, there is an iteration
    through all the scores. There are probably also many other instances where
    an iteration through the scores are used. So this seems a lot more
    complicated.

    This is why I am worried that getScore(DELTA_CN) might not be the only
    incorrect score. In that function, I also see

    match->setZState(zstate_);
    match->setDeltaCn(delta_cn);
    match->setDeltaLCn(ln_delta_cn);
    match->setLnExperimentSize(ln_experiment_size);

    I'm not too sure how many of these values are scores, but I know at
    minimum DELTA_CN and DELTA_LCN are scores, so I have a feeling that
    getScore(DELTA_LCN) would be different from getDeltaLCn() as well since
    there are no calls to setScore.


    Status: open
    Milestone: post v2.0
    Labels: High priority
    Created: Fri Jul 11, 2014 06:20 PM UTC by Alice Cheng
    Last Updated: Mon Jul 14, 2014 10:05 PM UTC
    Owner: Kaipo

    In MatchCollection "extendTabDelimited(Database MatchFileReader&
    result_file, Database
    decoy_database"), for each match added to the
    MatchCollection, getDeltaCn() and getScore(DELTA_CN) gives different
    results. getDeltaCn() gives the correct value, while getScore(DELTA_CN)
    gives a very small number almost equivalent to 0. This causes corruption of
    delta_cn value after reading in a tab-delimited file and outputing in a
    different format.

    A very quick superficial fix would be to call "match->setScore(DELTA_CN,
    delta_cn);" before the match is added to the collection, but there may be
    similar problems for other optional parameters.


    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/cruxtoolkit/issues/179/

    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/


    Crux-internal mailing list
    Crux-internal@u.washington.edu
    http://mailman13.u.washington.edu/mailman/listinfo/crux-internal

     

    Related

    Issues: #179

  • Alice Cheng

    Alice Cheng - 2014-07-22

    Using grep with 'getDeltaCn' and 'setDeltaCn', it stated that MatchCollection, Match, MzIdentMLWriter, PinWriter, ProteinMatchCollection, MzIdentMLReader, PepXMLReader, SQTReader still uses these functions.

     
  • Kaipo

    Kaipo - 2015-03-12

    Here is a patch that gets rid of getDeltaCn/getDeltaLCn and setDeltaCn/setDeltaLCn, could someone review it?

     
  • Kaipo

    Kaipo - 2015-04-01
    • labels: High priority -->
    • status: open --> closed
     
  • Kaipo

    Kaipo - 2015-04-01

    fixed in r16681

     

Log in to post a comment.