19/05/2026 (updated/extented version)
SugarBase: mapping glycomolecule precursors in microbes
=======================================================
SugarBase-X is an open-source, Python-based pipeline (available as both a standalone executable and Python code) designed to process LC–MS/MS data (in .mzXML format) for the identification of nucleotide-activated sugars
(NT-sugars) based on diagnostic nucleotide fragments and a predefined chemical composition space. The workflow combines pseudo-MS1 precursor information with MS2 fragment evidence, filters candidate masses using theoretical chemical compositions, and assigns a confidence score to each detected hit.
The pipeline supports both a discovery mode (screening a broad chemical space of possible nucleotide-sugars) and a targeted mode (searching only predefined compositions from an input list). The desktop executable allows to explore the results interactively.
====
1. System requirements
SugarBase-X was tested on 64-bit versions of Windows 10 Enterprise 22H2 and Windows 11 Enterprise 24H2.
Testing was performed on both desktop and laptop systems. The executable is expected to run on any standard 64-bit Windows installations.
Any required non-standard hardware: no additional software dependencies are required for the desktop executable version (SugarBase_Desktop_Executable.zip). The software is distributed as a standalone Windows executable.
A Python source-code version (SugarBase_Final_Python.zip) is also available via SourceForge.
Tested hardware configurations: Intel Core i7-1365U CPU, 16 GB RAM, Intel Xeon W-2195 CPU, 64 GB RAM & Intel Core i7-7700K CPU, 32 GB RAM
Standard desktop/laptop storage sufficient (approx. 50 GB min for storing multiple runs, data conversion and processing); tested on HDD and SSD systems.
No dedicated GPU required.
Non-standard hardware: none required.
====
2. Installation guide
The standalone Windows executable version of SugarBase-X requires no manual installation of additional software dependencies by the user.
The graphical user interface is implemented using the Python Dash framework and runs locally in the user’s default web browser via a local Flask server (http://127.0.0.1:8050/).
To setup SugarBase, download the desktop executable: https://sourceforge.net/projects/sugarbase-x/files/SugarBase_v08032026.zip/download
Unzip and open the desktop executable and test performance on test data: https://sourceforge.net/projects/sugarbase-x/files/Test_data_SugarBase.zip/download
Typical setup time on a "normal" desktop computer is approx. 5-10 minutes, which includes downloading (fast connection), unzipping and opening the SugarBase-X software.
====
3+4. Demo, Instructions for use and run on data
After downloading the SugarBase-X executable and the accompanying test data (see Installation Guide), the software can be launched by opening the executable file, which automatically
starts a local browser-based graphical user interface. The user should first select an input folder containing the .mzXML files to be analysed. A maximum of three .mzXML files can be
processed simultaneously in a single analysis.
The provided demo dataset contains the file JvE_20231220_NT60_UDP-Glc_Part1_PRM01.mzXML, corresponding to a UDP-glucose standard injection,
and the files JvE_20251103_NT64_CJ_11168_Part1_PRM01.mzXML and JvE_20251103_NT64_CJ_11168_Part2_PRM01.mzXML, which represent replicate analyses acquired over different mass ranges for a
metabolite extract of Campylobacter jejuni strain 11168. Allowed nucleotide fragment masses and plausible nucleotide-sugar chemical compositions are hard-coded within the executable.
After selecting the input folder, the user should specify an output folder and enter a sample name. Two analysis modes are available: “Targeted” and “Discovery”. In Targeted mode,
the software requires a Target.xlsx file located in the input folder, containing the chemical formulas of the sugars of interest (e.g. C6H12O6). For the demo dataset, the provided
target list is test_target_monosaccharides.xlsx. Additional target sugars can be included by adding their molecular sum formulas to this file. In Discovery mode, no target list is
required and the software searches for all plausible nucleotide-sugar compositions supported by the internal database and scoring algorithm.
The analysis is initiated by clicking the “Run Analysis” button using the default parameters. Upon completion, an interactive tabular dashboard is displayed, presenting the identified
nucleotide-sugar matches together with their calculated scores. The results table can be exported as an Excel file using the “Export” function, and individual nucleotide-sugar hits can
be explored interactively through the graphical interface (by selecting hits, using the "tick box" in the table). A detailed description of all input settings, parameters, and output
fields is provided below.
Input files:
PGC-MS/MS raw data converted to .mzXML files (e.g. by using msconvert: https://proteowizard.sourceforge.io/tools/msconvert.html, and peakPicking/vendor specific).
Up to three files can be processed in one go. Should be stored in the 'INPUT' folder.
Input parameters:
General parameters:
Consecutive Scans: default = 3
'Consecutive Scans' represents a filter parameter: hits representing potential nucleotide-sugars are only kept when the identified hit is present in at least X consecutive scans of the MS data.
Min MS1 Intensity: default = 1000
Minimum MS1 intensity threshold for peaks identified in pseudo-MS1 spectra.
Min Retention Time (min): default = 4.5
Lower retention time cutoff (in minutes) for pseudo-MS1 data.
Max Retention Time (min): default = 15.0
Upper retention time cutoff (in minutes) for pseudo-MS1 data.
Max Fragment Error (ppm): default = 10
Mass accuracy (in ppm) for fragment ion searches.
Max Sugar Mass Error (ppm): default = 10
Mass accuracy (in ppm) for MS1 data matching to the chemical sugar space.
DIA Scan Cycle: default = 30
Represents one PRM cycle. An 'DIA Scan Cycle' of 30 means that the same mass is targeted every 30th scan, with consecutive scans at i, i+30, i+2*30, i+3*30, etc.
Advanced parameters:
Deisotoping:
Enable deisotoping: default = on
Whether to perform deisotoping of the pseudo-MS1 spectra.
Mass Error (ppm): default = 10
Mass accuracy (in ppm) used for deisotoping.
Charge States: default = 2
Charge state to consider when removing isotopic envelopes (e.g. 2 = remove up to doubly charged envelopes).
Scoring thresholds:
Cons Score Threshold: default = 4
A hit (potential nucleotide-sugar) scores +10 when at least 'Cons Score Threshold' consecutive scans are present. Otherwise, the score is 0.
Pearson High Threshold: default = 0.8
A hit (potential nucleotide-sugar) scores +10 if the Pearson correlation of the precursors and fragments across three consecutive scans > 'Pearson High Threshold'.
Pearson Low Threshold: default = 0.7
A hit (potential nucleotide-sugar) scores -10 if the Pearson correlation of the precursors and fragments across three consecutive scans < 'Pearson Low Threshold'. A score of -50 is assigned when the Pearson correlation <0.50.
MS2 Peak Fraction Score: default = 1
A hit (potential nucleotide-sugar) scores +10 if the intensity ratio of the MS2 fragment compared to the "MS1" precursor > 'MS2 Peak Fraction Score' AND < 100. A score of -50 is assigned when the ratio >110.
MS1 Intensity Score: default = 100000
A hit (potential nucleotide-sugar) scores +10 if the MS1 Intensity of the precursor > 'MS1 Intensity Score'. Otherwise, the score is 0.
Additional scoring thresholds CMP-sugars:
Apply Nitrogen Rule: default = on
Enables filtering of CMP-hit data using the nitrogen rule. According to the nitrogen rule, a valid CMP-NulO hit must have monosaccharides with both N and H atoms either even or odd in number; all others are excluded.
Precursor Ratio Cutoff (%): default = 25
Enables filtering of CMP-hit data based on the intensity ratio of the precursor identified in the MS2 scan compared to the fragment intensity in the MS2 scan.
====
Expected output
The interactive browser window shows all NT-sugar hits in tabular format, which can be further filtered and visualised. In addition to the interactive dashboard output, the software generates a output folder containing a "Combined_NT_sugar_matches.xlsx" file summarising all nucleotide-sugar matches across
the analysed runs, and separate subfolders for each analysed sample including a "sample_name.xlsx" summary report and an mzxml_data.npz metadata file. Within the Excel report, the worksheet
PARAMETERS lists the analysis settings (if parameter export is enabled), CMP-HITs summarises CMP-sugar hits, HITLIST contains the consolidated nucleotide-sugar hits, and DETAILED reports
all nucleotide-sugar hits detected at the individual scan level prior to hit consolidation.
Example output data for all supported analysis modes are included in the directory Test_data_SugarBase\OUTPUT. This includes Cjejuni_11168_DISCOVERY which contains the combined output from the
replicate C. jejuni analyses processed in Discovery mode, whereas Cjejuni_11168_TARGETED contains the corresponding combined output generated in Targeted mode. The folders UDP-Glc_DISCOVERY
and UDP-Glc_TARGETED contain the analyses of the UDP-glucose standard injection processed in Discovery and Targeted modes, respectively.
Expected run time
For the provided demo dataset, the expected runtime on a standard desktop computer is approximately 1–5 minutes for the analysis of two samples.
Dash browser table and excel summary output column descriptions:
HIT_ID
Description: A unique identifier assigned to each potential nucleotide-sugar hit.
Total score
Description: The overall score assigned to the identified hit.
Scan
Description: The scan number associated with the identified hit.
Precursor m/z
Description: The m/z value of the theoretical nucleotide-sugar, derived from the chemical composition database.
NT-Sugar
Description: The nucleotide combined with the monoisotopic mass of the corresponding monosaccharide.
Nucleotide Fragment
Description: The nucleotide fragment detected for the given hit.
Sugar Mass
Description: The monoisotopic mass of the identified potential monosaccharide.
Sum Formula
Description: The chemical sum formula of the potential monosaccharide.
Mass Error (ppm)
Description: The difference (in parts per million) between the mass of the identified hit and the theoretical mass of the corresponding nucleotide-sugar from the chemical composition database.
Relative abundance
Description: The relative intensity (%) of the hit, normalized to the most intense hit detected within the same sample.
MS1 Intensity
Description: The MS1 signal intensity corresponding to the identified hit.
RT (min)
Description: The chromatographic retention time of the hit, expressed in minutes.
Isolation centre
Description: The m/z center of the isolation window used during acquisition.
Consecutive Score
Description: Assigned +10 if the hit is detected in at least 'S_cons' consecutive scans; otherwise, the score is 0.
Pearson Score
Description: Assigned +10 if the Pearson correlation of the precursor and corresponding fragment intensities across three consecutive scans exceeds 'Pearson High Threshold'. A score of -10 is assigned if < 'Pearson Low Threshold' and a score of -50 when <0.50.
Frag Score
Description: Assigned +10 if the intensity ratio of the MS2 fragment relative to the "MS1" precursor is <100, a score of -10 is assigned when <'MS2 Peak Fraction Score' and a score of -50 when >110.
Parent Score
Description: Assigned +10 if the MS1 Intensity of the identified hit exceeds 'S_ms1_int'; otherwise, the score is 0.
CO2 Score
Description: Assigned +10 if a CO2 loss peak is present in the corresponding "MS2" spectrum of the identified hit; otherwise, the score is 0.
MS1/MS2 Correlation
Description: the Pearson correlation of the precursors and fragments across three consecutive scans.
% Fragmentation
The MS2 fragment intensity/MS1 precursor intensity (or MS2 precursor intensity for CMP-sugars)*100%
====
Citation:
van Ede, J. M., Sorensen, M. C. H., van Loosdrecht, M., & Pabst, M. (2026). SugarBase: mapping glycomolecule precursors in microbes. bioRxiv, 2026-04.
https://www.biorxiv.org/content/10.64898/2026.04.20.719630v1.abstract
Contact:
Martin Pabst (m.pabst@tudelft.nl)