Menu

#80 Remove or explain differences in XCorr between Tide and search-for-matches

Crux v2.0
closed
Kaipo
None
2014-01-16
2013-12-12
No

Related

Issues: #80

Discussion

  • William S Noble

    William S Noble - 2013-12-12

    Places to look

    • bin width and offset
    • 10-bin normalization
    • flanking peaks
     
    • Jeff Howbert

      Jeff Howbert - 2013-12-17

      Regarding removal of fragment peaks in the vicinity of the precursor m/z ...

      Crux's Scorer.cpp::createIntensityArrayObserved.cpp removes those peaks with this code:

          // skip all peaks within precursor ion mz +/- 15
          if(peak_location < precursor_mz + 15 &&  peak_location > precursor_mz - 15){
            continue;
          }
      

      This has existed in Crux since at least Nov. 2011.

      I can find no comparable processing in the function ObservedPeakSet::PreprocessSpectrum in tide/spectrum_preprocess2.cc. However, it does exist in the same function in tide/spectrum_preprocess_new.cc. spectrum_preprocess2.cc is currently included in the Crux build (per the CMakeLists.txt in the tide directory) while spectrum_preprocess_new.cc is not.

      The preprocessing code in spectrum_preprocess_new.cc looks a lot more like the code in Scorer.cpp than does that in spectrum_preprocess2.cc. It seems likely that someone (Ben?) had a Tide/Crux code harmonization underway, but never finished it.

       
    • Jeff Howbert

      Jeff Howbert - 2013-12-17

      In tide/spectrum_preprocess2.cc, function ObservedPeakSet::PreprocessSpectrum, fragment masses are binned with the transformation:

          int mz = (int)(peak_location / bin_width + 0.5);
      

      bin_width is set from a global constant bin_width_mono = 1.0005079, and bin offset is baked into the above equation with value 0.50.

      For reference, this is exactly equivalent to the binning formula in the current Scorer.h if BIN_SIZE = 1.0005079 and BIN_OFFSET = 0.50:

      #define INTEGERIZE(VALUE,BIN_SIZE,BIN_OFFSET) \
        ((int)( ( ( VALUE / BIN_SIZE ) + 1.0 ) - BIN_OFFSET ) )
      

      It would be understandable if these values were baked into SEQUEST once upon a time, as they give optimal results for charge 2+ precursors, and suboptimal but still decent results for charge 3+ precursors.

       
  • Jeff Howbert

    Jeff Howbert - 2013-12-17

    Some history on 10-bin normalization of the observed spectrum in createIntensityArrayObserved() in Scorer.cpp. Since at least Oct. 2011 the initial calculation of the region for a peak has been:

        // map peak location to bin
        mz = INTEGERIZE(peak_location, bin_width, bin_offset);
        region = mz / region_selector;
    

    In Oct. 2011 this was followed by:

        // don't let index beyond array
        if(region >= NUM_REGIONS){
          continue;
        }
    

    This was incorrect, as it caused the highest mass peak to always be discarded.

    In Nov. 2011, Barbara resolved the bug by modifying the code to the following; it has not changed since:

        // don't let index beyond array
        if(region>= NUM_REGIONS) {
          if (region == NUM_REGIONS&&  mz<  experimental_mass_cut_off) {
            // Force peak into lower bin
            region = NUM_REGIONS - 1;
          }
          else {
            // Skip peak altogether
            continue;
          }
        }
    

    The current code in tide/spectrum_preprocess2.cc contains the following:

      // Obvious bug from SEQUEST: highest peaks are ignored:
      for (int i = NUM_SPECTRUM_REGIONS * region_size; i <= largest_mz; ++i)
        peaks_[i] = 0;
    

    which deliberately reproduces the bug that existed in Crux until Nov. 2011.

     
    • William S Noble

      William S Noble - 2013-12-17

      Thanks for documenting these differences, Jeff. Can either you or Kaipo
      fix the 10-bin normalization bug and then see if we can get XCorr to agree
      between search-for-matches and Tide for at least one PSM, when we set
      parameters appropriately?

      Bill

      On Wed, Dec 18, 2013 at 3:52 AM, Jeff Howbert howbert@users.sf.net wrote:

      Some history on 10-bin normalization of the observed spectrum in
      createIntensityArrayObserved() in Scorer.cpp. Since at least Oct. 2011 the
      initial calculation of the region for a peak has been:

      // map peak location to bin
      mz = INTEGERIZE(peak_location, bin_width, bin_offset);
      region = mz / region_selector;
      

      In Oct. 2011 this was followed by:

      // don't let index beyond array
      if(region >= NUM_REGIONS){
        continue;
      }
      

      This was incorrect, as it caused the highest mass peak to always be
      discarded.

      In Nov. 2011, Barbara resolved the bug by modifying the code to the
      following; it has not changed since:

      // don't let index beyond array
      if(region>= NUM_REGIONS) {
        if (region == NUM_REGIONS&&  mz<  experimental_mass_cut_off) {
          // Force peak into lower bin
          region = NUM_REGIONS - 1;
        }
        else {
          // Skip peak altogether
          continue;
        }
      }
      

      The current code in tide/spectrum_preprocess2.cc contains the following:

      // Obvious bug from SEQUEST: highest peaks are ignored:
      for (int i = NUM_SPECTRUM_REGIONS * region_size; i <= largest_mz; ++i)
      peaks_[i] = 0;

      which deliberately reproduces the bug that existed in Crux until Nov. 2011.

      Status: open
      Created: Thu Dec 12, 2013 10:57 PM UTC by William S Noble
      Last Updated: Thu Dec 12, 2013 11:29 PM UTC
      Owner: Kaipo


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/cruxtoolkit/issues/80/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Issues: #80

  • Kaipo

    Kaipo - 2014-01-16

    differences are minimal as of r16286

     
  • Kaipo

    Kaipo - 2014-01-16
    • status: open --> closed
     

Log in to post a comment.