Regarding removal of fragment peaks in the vicinity of the precursor m/z ...
Crux's Scorer.cpp::createIntensityArrayObserved.cpp removes those peaks with this code:
// skip all peaks within precursor ion mz +/- 15if(peak_location<precursor_mz+15&&peak_location>precursor_mz-15){continue;}
This has existed in Crux since at least Nov. 2011.
I can find no comparable processing in the function ObservedPeakSet::PreprocessSpectrum in tide/spectrum_preprocess2.cc. However, it does exist in the same function in tide/spectrum_preprocess_new.cc. spectrum_preprocess2.cc is currently included in the Crux build (per the CMakeLists.txt in the tide directory) while spectrum_preprocess_new.cc is not.
The preprocessing code in spectrum_preprocess_new.cc looks a lot more like the code in Scorer.cpp than does that in spectrum_preprocess2.cc. It seems likely that someone (Ben?) had a Tide/Crux code harmonization underway, but never finished it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It would be understandable if these values were baked into SEQUEST once upon a time, as they give optimal results for charge 2+ precursors, and suboptimal but still decent results for charge 3+ precursors.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Some history on 10-bin normalization of the observed spectrum in createIntensityArrayObserved() in Scorer.cpp. Since at least Oct. 2011 the initial calculation of the region for a peak has been:
// map peak location to binmz=INTEGERIZE(peak_location,bin_width,bin_offset);region=mz/region_selector;
In Oct. 2011 this was followed by:
// don't let index beyond arrayif(region>=NUM_REGIONS){continue;}
This was incorrect, as it caused the highest mass peak to always be discarded.
In Nov. 2011, Barbara resolved the bug by modifying the code to the following; it has not changed since:
// don't let index beyond arrayif(region>=NUM_REGIONS){if(region==NUM_REGIONS&&mz<experimental_mass_cut_off){// Force peak into lower binregion=NUM_REGIONS-1;}else{// Skip peak altogethercontinue;}}
The current code in tide/spectrum_preprocess2.cc contains the following:
Thanks for documenting these differences, Jeff. Can either you or Kaipo
fix the 10-bin normalization bug and then see if we can get XCorr to agree
between search-for-matches and Tide for at least one PSM, when we set
parameters appropriately?
Some history on 10-bin normalization of the observed spectrum in
createIntensityArrayObserved() in Scorer.cpp. Since at least Oct. 2011 the
initial calculation of the region for a peak has been:
// map peak location to binmz=INTEGERIZE(peak_location,bin_width,bin_offset);region=mz/region_selector;
In Oct. 2011 this was followed by:
// don't let index beyond arrayif(region>=NUM_REGIONS){continue;}
This was incorrect, as it caused the highest mass peak to always be
discarded.
In Nov. 2011, Barbara resolved the bug by modifying the code to the
following; it has not changed since:
// don't let index beyond arrayif(region>=NUM_REGIONS){if(region==NUM_REGIONS&&mz<experimental_mass_cut_off){// Force peak into lower binregion=NUM_REGIONS-1;}else{// Skip peak altogethercontinue;}}
The current code in tide/spectrum_preprocess2.cc contains the following:
// Obvious bug from SEQUEST: highest peaks are ignored:
for (int i = NUM_SPECTRUM_REGIONS * region_size; i <= largest_mz; ++i)
peaks_[i] = 0;
which deliberately reproduces the bug that existed in Crux until Nov. 2011.
Places to look
Regarding removal of fragment peaks in the vicinity of the precursor m/z ...
Crux's Scorer.cpp::createIntensityArrayObserved.cpp removes those peaks with this code:
This has existed in Crux since at least Nov. 2011.
I can find no comparable processing in the function ObservedPeakSet::PreprocessSpectrum in tide/spectrum_preprocess2.cc. However, it does exist in the same function in tide/spectrum_preprocess_new.cc. spectrum_preprocess2.cc is currently included in the Crux build (per the CMakeLists.txt in the tide directory) while spectrum_preprocess_new.cc is not.
The preprocessing code in spectrum_preprocess_new.cc looks a lot more like the code in Scorer.cpp than does that in spectrum_preprocess2.cc. It seems likely that someone (Ben?) had a Tide/Crux code harmonization underway, but never finished it.
In tide/spectrum_preprocess2.cc, function ObservedPeakSet::PreprocessSpectrum, fragment masses are binned with the transformation:
bin_width is set from a global constant bin_width_mono = 1.0005079, and bin offset is baked into the above equation with value 0.50.
For reference, this is exactly equivalent to the binning formula in the current Scorer.h if BIN_SIZE = 1.0005079 and BIN_OFFSET = 0.50:
It would be understandable if these values were baked into SEQUEST once upon a time, as they give optimal results for charge 2+ precursors, and suboptimal but still decent results for charge 3+ precursors.
Some history on 10-bin normalization of the observed spectrum in createIntensityArrayObserved() in Scorer.cpp. Since at least Oct. 2011 the initial calculation of the region for a peak has been:
In Oct. 2011 this was followed by:
This was incorrect, as it caused the highest mass peak to always be discarded.
In Nov. 2011, Barbara resolved the bug by modifying the code to the following; it has not changed since:
The current code in tide/spectrum_preprocess2.cc contains the following:
which deliberately reproduces the bug that existed in Crux until Nov. 2011.
Thanks for documenting these differences, Jeff. Can either you or Kaipo
fix the 10-bin normalization bug and then see if we can get XCorr to agree
between search-for-matches and Tide for at least one PSM, when we set
parameters appropriately?
Bill
On Wed, Dec 18, 2013 at 3:52 AM, Jeff Howbert howbert@users.sf.net wrote:
Related
Issues:
#80differences are minimal as of r16286