Menu

Sugar Database

glycolab Dinko Soic

Sugar Database

Overview

Oxonium Browser uses a sugar oxonium ion database to define which masses to search for in MS2 spectra. Each entry specifies a diagnostic mass pair: the intact oxonium ion ([M+H−H₂O]⁺) and its water loss fragment ([M+H−2H₂O]⁺). Both masses must be present within the mass error tolerance for a positive detection.

Two databases are available:

  • Curated database — user-provided Excel file with named sugars and test masses
  • Chemspace database — optional hardcoded comprehensive chemical space of >3,300 monosaccharide compositions

Curated Database

File Format

The curated database is an Excel file (.xlsx) placed in the Input directory. The current version ships as OX_DB_CURATED_v03.xlsx.

Required Columns

Column Description Example
Oxonium Unique name for the sugar HexNAc
ox_mass1 Primary diagnostic mass (oxonium ion) 204.0867
ox_mass2 Secondary diagnostic mass (water loss) 186.0761

Additional columns (Example sugar, Name, Monoisotopic Mass, M+H+) may be present for reference but are not required by the software.

Included Sugars

The curated database covers common, rare, and derivative monosaccharides compiled from CSDB, KEGG, and published glycan literature, including pentoses, hexoses, heptoses, ulosonic acids, deoxy sugars, amino sugars, acetylated and methylated derivatives, and others.

Diagnostic Mass Pairs

Each sugar is defined by two diagnostic masses rather than a single mass. Requiring both the intact ion and its water loss product significantly reduces false positive detections compared to single-mass matching.

For most sugars, the pair consists of:

For some sugars (e.g. uronic acids), ox_mass2 may represent a carboxylic acid loss fragment instead.

Adding Custom Sugars

To add a new sugar to the curated database:

  1. Determine the exact monoisotopic mass of your sugar
  2. Calculate the two diagnostic masses:
  3. ox_mass1 = [M+H]⁺ − H₂O = monoisotopic mass + 1.00728 − 18.01056
  4. ox_mass2 = ox_mass1 − 18.01056
  5. Add a new row to the Excel file with a unique name in the Oxonium column
  6. Enter the diagnostic masses in ox_mass1 and ox_mass2
  7. Save the file maintaining the same format

The tool will automatically include all entries during the next analysis run. Consider testing with a known positive control sample first.

Test Masses (Negative Controls)

Purpose

The database includes entries named Ox_test_1, Ox_test_2, etc. These are random masses serving as built-in negative controls for empirical false discovery assessment. Any detections of test masses represent random chance matches, providing a direct estimate of the false positive rate at the current threshold settings. For guidance on using test masses to optimize thresholds, see Detection Metrics.

How Test Masses Are Generated

Test masses are generated as random values within the sugar oxonium ion mass range (100–400 Da): a random integer base with a fractional part of 0–0.2 Da (or 0–0.25 Da above 250 Da). Each candidate is checked against all real sugar masses at 0.0075 Da tolerance to avoid accidental overlap. The water loss mass is computed as ox_mass1 − 18.011 Da. The number of test masses is approximately equal to the number of real sugar entries (1:1 ratio).

Curated vs. Chemspace ("extensive") Test Masses

The curated database (small set of empirical sugar masses) and chemspace ("extensive" chemical sugar space) database each have their own independent set of test masses, tagged separately (curated_test and chemspace_test). Test masses should not be removed from the provided database.

Chemspace Database (Optional use)

What It Contains

The Chemspace database contains 3,332 chemically plausible monosaccharide compositions systematically enumerated within a defined elemental space (C, H, O, N, S) and mass range (~83–382 Da). Each entry is represented by its molecular formula (e.g. C5H8O2, C8H15NO6) and follows the same diagnostic mass-pair structure as the curated database. The database also includes 3,332 fixed random test masses used as negative controls. These entries are included together with the empirical sugars in the provided OX_DB_COMBINED_v03.xlsx reference file. Empirical sugars are labelled as "curated" in the list column, whereas theoretical Chemspace entries are labelled as "extensive".

How to Use

The pipeline automatically searches both the curated and Chemspace databases.

How to see Chemspace enhanced output in the Dashboard

Switching to the combined view displays both curated matches and additional Chemspace matches. This automatically applies stricter default thresholds (counts ≥ 25, intensity ≥ 1.5%, presence ≥ 0.025%) to account for the larger search space. Switching back to curated view restores the original thresholds.

Interpreting Chemspace Results

Genuine sugar signals stand out even with >3,000 candidates. Key indicators:

  • High counts and intensity — well above the test mass metrics
  • Water loss partners — the match table groups related masses; a group containing both the intact ion and its water loss is stronger evidence
  • Co-occurrence — genuine fragments from the same glycan should cluster together in the co-occurrence plot
    Some molecular formulas may consistently appear across different organisms as top hits without being sugars — these likely represent dipeptide fragments or other non-glycan contaminants.

From Formula to Sugar Identity

Chemspace hits are identified by molecular formula only (e.g. C8H15N1O6). To determine the actual sugar identity, cross-reference the formula with sugar databases (CSDB, KEGG, GlyTouCan), check whether the mass matches known sugar derivatives for the organism, and validate with complementary experiments.

Recommended for:

  • Organisms with not or only poorly characterized metabolism and glycosylation
  • Discovery experiments looking for unexpected sugar modifications
  • Comparing curated results against the broader chemical space to ensure complete coverage

Database Comparison

Property Curated Chemspace
Size ~35 sugars + ~35 test masses ~3,300 sugars + ~3,300 test masses
Names Descriptive (Hex, HexNAc, dHex) Molecular formulas (C6H12O6, C8H15NO6)
Source Literature, CSDB, KEGG Systematic generation of compositions
Customizable Yes (edit Excel file) No (hardcoded)
Test masses In Excel file, user-editable Hardcoded, fixed across all runs
Always active Yes Only with CHEMSPACE_SEARCH=True

Back to Home


MongoDB Logo MongoDB