The Oxonium Browser processes shotgun proteomics data through a sequential pipeline, from raw mzML input to an interactive dashboard.
┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ SAGE Database │───▶│ mzML │───▶ │ Oxonium Ion │───▶ │ Results │───▶ │ Interactive │
│ Search │ │ Calibration │ │ Detection │ │ Processing │ │ Dashboard │
└───────────────┘ └───────────────┘ └───────────────┘ └───────────────┘ └───────────────┘
The pipeline requires three input files:
Purpose: Convert vendor-specific RAW files to open-format mzML.
Recommended tool: ProteoWizard MSConvert. Use centroided mzML with 64-bit encoding (32-bit acceptable for Astral data to reduce file size). The Docker container cannot process RAW files directly due to Windows-native library requirements.
pysage_v6_scanner.py)Purpose: Identify unmodified peptide spectra to exclude from glycopeptide analysis.
The module wraps the SAGE search engine (via sagepy) to perform fast peptide-spectrum matching. It uses a target-decoy approach for FDR control and returns scan numbers of identified peptides, which are then skipped during oxonium ion detection.
Key settings:
Results are cached — if a SAGE output file already exists for the input mzML, the search is skipped and cached results are reused.
mzml_recalibration_v6.py)Purpose: Read mzML files and improve mass accuracy through two-pass calibration.
Pass 1 — Global calibration:
Matches seven amino acid fragment reference peaks (147.113, 175.119, 201.123, 215.139, 228.134, 258.145, 292.129 m/z) across all spectra at 20 ppm tolerance. Requires a minimum of 500 matched spectra per reference peak. Fits a global linear calibration function using least squares regression and applies it to all spectra.
Pass 2 — Per-spectrum calibration:
After global calibration, each spectrum is individually recalibrated at tighter 10 ppm tolerance. Requires at least 3 matched reference peaks per spectrum for a stable linear fit. Spectra with insufficient matches retain the global calibration.
A diagnostic plot (mass_error_two_pass_calibration.png) shows error distributions at each stage: original, after global calibration, and after per-spectrum calibration.
get_oxonium_scans_v5.py)Purpose: Scan non-peptide MS2 spectra for diagnostic sugar oxonium ion pairs.
For each spectrum not identified by SAGE, the scanner checks whether both diagnostic masses (oxonium ion and its water loss fragment) are present within the defined mass error tolerance. If both are found and the average normalized intensity exceeds the threshold, the detection is recorded.
Detection metrics computed per oxonium ion:
process_results.py)Purpose: Organize raw detection results into structured datasets.
Separates test mass controls (names starting with Ox_test_) from real sugar detections, sorts results by normalized presence, and prepares DataFrames for Excel export and dashboard visualization.
ox_scanner_dash_v24.py)Purpose: Provide interactive visualization and exploration of results.
Built with Plotly Dash, the dashboard offers real-time filtering, multiple visualization types, and export functionality. See the Dashboard Guide for a full walkthrough.
Key components:
main_script.py)Purpose: Coordinate the entire workflow.
Reads environment variables for configuration, discovers input files, optionally merges the chemspace database with the user-provided curated list (deduplicating overlapping masses), runs each pipeline stage in sequence, exports results to Excel, and launches the dashboard.
Oxonium Browser identifies sugars based on diagnostic mass, but cannot differentiate between isomeric sugars. When a hexose (Hex) is detected, additional experiments or literature review are needed to determine whether it is glucose, galactose, mannose, or another hexose isomer. The tool provides evidence of glycosylation and sugar mass; structural characterization requires complementary techniques.
The Docker version requires pre-converted mzML files. Direct RAW file analysis is not supported due to vendor library compatibility limitations in Linux containers.
The required memory is approximately equal to the mzML file size. For Astral data, ensure sufficient available RAM and consider using 32-bit encoding and spectral density reduction during conversion.