Recent changes to Interactive Dashboard

Interactive Dashboard modified by glycolab

glycolab — Sat, 30 May 2026 11:12:22 -0000

--- v3
+++ v4
@@ -1,6 +1,5 @@
 # NovoGlyco: Interactive Dashboard Guide
-
-The NovoGlyco dashboard is a web-based interface built with Plotly Dash, accessible at `http://localhost:8050` after running the pipeline. It provides real-time interactive exploration of glycopeptide identifications, glycan compositions, and glycoprotein candidates.
+The NovoGlyco dashboard is a web-based interface built with Plotly Dash and accessible at http://localhost:8050. The standalone executable operates entirely through this interface, whereas in the Python and Docker versions the interface is used for visualization of output graphs and tables. In all cases, the browser interface enables real-time interactive exploration of glycopeptide identifications, glycan compositions, and glycoprotein candidates.

 This page describes each component of the dashboard, from top to bottom.

Interactive Dashboard modified by Dinko Soic

Dinko Soic — Wed, 18 Mar 2026 15:18:02 -0000

--- v2
+++ v3
@@ -8,11 +8,11 @@

 ## Oxonium Ion Co-occurrence Heatmap

-The first element in the dashboard is a heatmap showing how frequently different oxonium ions co-occur in the same MS2 spectra.
+The first element in the dashboard is a clustered heatmap showing how frequently different oxonium ions co-occur in the same MS2 spectra.

 **What it shows:** Each cell represents the fraction of spectra containing one oxonium ion (row) that also contain another oxonium ion (column). The colour scale ranges from 0% (white) to 100% (dark blue).

-**How it's calculated:** For each pair of oxonium ions, the number of spectra containing both is divided by the total number of spectra containing the row oxonium ion. This means the matrix is asymmetric — "HexNAc in Hex" may differ from "Hex in HexNAc" because HexNAc may be present in more spectra than Hex.
+**How it's calculated:** For each pair of oxonium ions, the number of spectra containing both is divided by the total number of spectra containing the row oxonium ion. This means the matrix is asymmetric — "HexNAc in Hex" may differ from "Hex in HexNAc" because HexNAc may be present in more spectra than Hex. Rows and columns are ordered by hierarchical clustering based on Jaccard similarity (|A∩B| / |A∪B|, symmetric) using average linkage. A dendrogram on the left shows the clustering structure, grouping oxonium ions with similar spectral overlap patterns.

 **How to use it:** Look for clusters of high co-occurrence — these indicate monosaccharides that are part of the same glycan structure. For example, if HexNAc and Hex always co-occur at >90%, they likely belong to the same oligosaccharide. Low co-occurrence between two sugars suggests they appear on different glycans or different proteins. Oxonium ions with no detected spectra are marked in grey with "No scans".

Interactive Dashboard modified by Dinko Soic

Dinko Soic — Fri, 13 Mar 2026 08:58:52 -0000

--- v1
+++ v2
@@ -1,314 +1,149 @@
-# NovoGlyco: Module Descriptions
+# NovoGlyco: Interactive Dashboard Guide

-This document provides detailed descriptions of each module in the NovoGlyco platform, explaining their purpose, key features and technical highlights. These modules work together to form the complete glycopeptide identification and analysis pipeline.
+The NovoGlyco dashboard is a web-based interface built with Plotly Dash, accessible at `http://localhost:8050` after running the pipeline. It provides real-time interactive exploration of glycopeptide identifications, glycan compositions, and glycoprotein candidates.

-## Main Script 
-### File: `main_script.py`
+This page describes each component of the dashboard, from top to bottom.

-### Purpose
-The main orchestration script that coordinates the entire NovoGlyco workflow from input processing to result visualization.
+---

-### Key Features
-- **Pipeline Orchestration**: Sequentially executes all analysis steps 
-- **Input Validation**: Checks for required input files and their formats
-- **Parameter Configuration**: Sets global parameters used across modules
-- **Progress Tracking**: Displays progress information during execution
-- **Resource Management**: Tracks execution time and manages file outputs
-- **Dashboard Initialization**: Launches the interactive visualization interface
+## Oxonium Ion Co-occurrence Heatmap

-### Technical Highlights
-- Implements a 10-step workflow with clear progress reporting
-- Manages separate processing for each defined oxonium ion
-- Supports configuration of multiple analysis parameters
+The first element in the dashboard is a heatmap showing how frequently different oxonium ions co-occur in the same MS2 spectra.

-## SAGE Database Search 
-### File: `pysage_v6.py`
+**What it shows:** Each cell represents the fraction of spectra containing one oxonium ion (row) that also contain another oxonium ion (column). The colour scale ranges from 0% (white) to 100% (dark blue).

-### Purpose
-Performs peptide spectrum matching against a protein database to identify unmodified peptides, establishing a baseline for subsequent glycopeptide analysis.
+**How it's calculated:** For each pair of oxonium ions, the number of spectra containing both is divided by the total number of spectra containing the row oxonium ion. This means the matrix is asymmetric — "HexNAc in Hex" may differ from "Hex in HexNAc" because HexNAc may be present in more spectra than Hex.

-### Key Features
-- **SAGE Database Searching**: Ultrafast peptide-spectrum matching
-- **Statistical Validation**: Target-decoy approach for false discovery rate control
-- **Result Filtering**: Extracts scan numbers and proteins for downstream steps
-- **Format Conversion**: Creates PeptideShaker-compatible output
+**How to use it:** Look for clusters of high co-occurrence — these indicate monosaccharides that are part of the same glycan structure. For example, if HexNAc and Hex always co-occur at >90%, they likely belong to the same oligosaccharide. Low co-occurrence between two sugars suggests they appear on different glycans or different proteins. Oxonium ions with no detected spectra are marked in grey with "No scans".

-### Technical Highlights
-- Configures enzyme digestion with customizable parameters
-- Handles static modifications (Carbamidomethyl C) and variable modifications (Oxidation M)
-- Calculates posterior error probabilities for robust validation
+---

-## mzML Reading and Processing
-### File: `mzmlread_sage_v2.py`
+## Histogram Selector and Oxonium Checkboxes

-### Purpose
-Extracts MS/MS spectra from mzML files and filters out spectra already identified in the database search, focusing subsequent analysis on unidentified spectra.
+Below the heatmap, a dropdown menu selects the primary histogram view:

-### Key Features
-- **Spectrum Extraction**: Parses complex mzML file format
-- **Metadata Processing**: Extracts precursor information and scan parameters
-- **Filtering**: Removes spectra with identified unmodified peptides
-- **Activation Pair Detection**: Identifies complementary HCD/ETD scan pairs
+- **Mass delta** — distribution of glycan masses (precursor mass − peptide mass). Only available for spectra where peptide identification succeeded.
+- **Precursor offsets** — distribution of mass differences between the precursor and high-mass fragment ions. Available for all oxonium-positive spectra, including those without peptide identification.

-### Technical Highlights
-- Leverages pyteomics library for efficient mzML parsing
-- Matches complementary fragmentation methods based on parent ion masses
-- Extracts complete spectral information including m/z arrays and intensities
-- Provides statistical summary of retained vs. filtered spectra
+**Oxonium checkboxes** below the dropdown control which traces are overlaid on the histogram. Each checkbox corresponds to one oxonium ion from your input list, plus the untargeted search results (available for mass delta view only). 

-## FASTA Processing 
-### File: `fastaread_v6.py`
+**Decoy checkboxes** appear when mass delta is selected, allowing you to overlay decoy matches for empirical false positive assessment.

-### Purpose
-Processes protein database to generate peptides with a focus on potential glycosylation sites, creating the reference database for sequence tag matching.
+---

-### Key Features
-- **Focused Database Creation**: Processes only proteins identified in sample
-- **In-silico Digestion**: Implements tryptic digestion with configurable parameters
-- **Glycosite Filtering**: Focuses on peptides with potential glycosylation sites
-- **Mass Calculation**: Computes accurate peptide masses
+## Main Histogram

-### Technical Highlights
-- Standardizes amino acids (L→I substitution) for consistent mass calculations
-- Generates decoy peptides for false discovery assessment
-- Includes glycosylation site pattern matching 
-- Calculates monoisotopic masses with fixed modifications
+### Mass Delta View

-## DirectTag Configuration 
-### File: `configure_directag_v3.py`
+Shows the distribution of glycan masses across all identified glycopeptide spectra.

-### Purpose
-Generates configuration files for the DirectTag de novo sequencing tool, tailoring parameters for glycopeptide analysis.
+**What it shows:** A frequency histogram binned at the configured bin width (default 0.1 Da). Each bar represents the number of PSMs with a mass delta in that range. The x-axis spans 0 to 2000 Da.

-### Key Features
-- **Parameter Customization**: Configures all DirectTag settings
-- **Modification Definition**: Sets up fixed and variable modifications
-- **Sequence Tag Configuration**: Defines tag length and count parameters
+**How to interpret it:** Recurring mass deltas appearing as prominent peaks indicate genuine glycan modifications: the same glycan mass observed on different peptides from the same or different proteins. Isolated single-count bins are more likely noise or artefacts. Comparing the untargeted search trace (all matches) against specific oxonium traces reveals which mass deltas are confirmed by diagnostic sugar fragments and which are not.

-### Technical Highlights
-- Provides sensible defaults for glycopeptide analysis
-- Allows customization of all DirectTag parameters
-- Enables adjustment of tag length for different analysis scenarios
+### Precursor Offsets View

-## DirectTag Execution 
-### File: `directag_v4.py`
+Shows two paired plots: a frequency histogram (left) and a summed intensity bar plot (right).

-### Purpose
-Runs the DirectTag de novo sequencing tool on MS/MS spectra and extracts sequence tags.
+**What it shows:** The frequency histogram counts how many fragment ions produce each offset value. The intensity plot sums the raw signal intensities for each offset bin, highlighting which offsets correspond to abundant fragments versus rare ones.

-### Key Features
-- **External Tool Execution**: Runs the DirectTag binary
-- **Output Processing**: Parses complex DirectTag output format
-- **Result Integration**: Links sequence tags with scan numbers
+**How to interpret it:** Recurring offset values correspond to specific monosaccharide masses: for example, peaks at ~203 Da (HexNAc), ~162 Da (Hex), or ~146 Da (dHex). The intensity plot helps distinguish genuine glycan fragments from low-level background. This analysis is independent of peptide identification, so it captures glycan information even from spectra where sequence tagging failed.

-### Technical Highlights
-- Handles the multi-line format of DirectTag output files
-- Processes tag-scan associations through line number tracking
+---

-## Oxonium Ion Detection 
-### File: `oxoniums_v7.py`
+## Child Plots

-### Purpose
-Identifies potential glycopeptides by screening MS/MS spectra for diagnostic sugar marker ions.
+Clicking any bin in the main histogram generates three rows of child plots showing complementary data for the spectra in that bin.

-### Key Features
-- **Oxonium Ion Screening**: Identifies potential glycopeptide-containing spectra
-- **Charge Deconvolution**: Handles multiply charged fragment ions
-- **Offset Calculation**: Computes mass offsets for glycan analysis
-- **Paired Scan Processing**: Handles complementary fragmentation methods
+### When mass delta is the main view:

-### Technical Highlights
-- Implements advanced charge state deconvolution algorithm
-- Normalizes intensities for comparison across spectra
-- Processes both HCD and ETD fragmentation data
-- Calculates multiple types of mass offsets for comprehensive analysis
+| Row | Left (histogram) | Right (bar plot) |
+|-----|-------------------|-------------------|
+| 1 | Precursor offsets frequency | Precursor offsets intensity |
+| 2 | Peptide offsets frequency | Peptide offsets intensity |
+| 3 | m/z spectrum frequency | m/z spectrum intensity |

-## De Novo Result Processing 
-### File: `pronovo_v4.py`
+- **Precursor offsets** (row 1) reveal which monosaccharide units are being lost from the intact glycopeptide. If you clicked a mass delta bin at 1203 Da, the precursor offsets show the stepwise fragmentation pattern of that specific glycan.
+- **Peptide offsets** (row 2) show the complementary pattern — sequential monosaccharide additions from the peptide backbone (Y0). The first peptide offset peak identifies the linking sugar.
+- **m/z spectrum** (row 3) shows the binned fragment ion distribution, including diagnostic low-mass oxonium ions.

-### Purpose
-Filters de novo sequencing results to focus on glycopeptide-containing spectra, bridging the gap between oxonium ion detection and database matching.
+### When precursor offsets is the main view:

-### Key Features
-- **Result Integration**: Links de novo tags with oxonium-containing spectra
-- **Tag Filtering**: Ensures consistent tag length for reliable matching
-- **Status Tracking**: Flags which glycopeptide spectra yielded usable tags
+| Row | Left (histogram) | Right (bar plot) |
+|-----|-------------------|-------------------|
+| 1 | Mass delta frequency (full width) | — |
+| 2 | Peptide offsets frequency | Peptide offsets intensity |
+| 3 | m/z spectrum frequency | m/z spectrum intensity |

-### Technical Highlights
-- Focuses computational resources on promising glycopeptide candidates
-- Identifies which oxonium-containing spectra successfully yielded sequence tags
-- Prepares data structures for subsequent database matching
+The mass delta row shows which glycan masses are associated with spectra containing the selected precursor offset.

-## Database Matching 
-### File: `directmatch_v11.py`
+### Two-level drill-down

-### Purpose
-Matches sequence tags with database peptides to identify potential glycopeptides.
+Clicking a bin in any child plot further filters the data. For example: click a mass delta bin at 1203 Da → child plots appear → click a precursor offset bin at 203 Da → the glycoprotein candidate table below now shows only spectra with that specific mass delta and that specific precursor offset. Red dashed lines indicate the selected bins in both the main and child plots.

-### Key Features
-- **Sequence Tag Matching**: Maps partial sequences to database peptides
-- **Y0 Ion Validation**: Verifies matches using peptide fragment Y0 ion evidence
-- **Mass Delta Calculation**: Computes glycan mass from precursor-peptide difference
-- **Match Aggregation**: Groups and scores multiple matches per spectrum
+---

-### Technical Highlights
-- Creates efficient substring lookup structure for fast matching
-- Implements multiple validation criteria for confident identification
-- Calculates both mass deltas (glycan masses) and fragment offsets
-- Separates target and decoy results for statistical validation
+## Glycoprotein Candidate Table

-## Unmodified Peptide Finder 
-### File: `unmodified_peptide_finder_v4.py`
+Displayed below the child plots when a bin is selected. This table always shows results from the untargeted search, regardless of which oxonium checkboxes are selected for the histograms above.

-### Purpose
-Checks for unmodified versions of identified glycopeptides in the database search results, providing additional validation of glycosylation.
+### Protein summary

-### Key Features
-- **Cross-Validation**: Confirms glycosylation through unmodified counterparts
-- **Match Tracking**: Adds validation status to glycopeptide reports
+Proteins are ranked by unique peptide count first, then by PSM count. Each row shows:

-### Technical Highlights
-- Creates efficient lookup structure for peptide-protein combinations
-- Provides evidence for variable glycosylation at specific sites
-- Enhances confidence in glycopeptide identifications
+| Column | Description |
+|--------|-------------|
+| **#** | Rank |
+| **Protein** | Accession ID and protein description |
+| **PSMs** | Number of peptide-spectrum matches for this protein in the selected bin |
+| **Peptides** | Number of unique peptide sequences |
+| **Med. tags** | Median tag count — the median number of de novo sequence tags that independently matched the same peptide per spectrum. Higher values indicate more confident identifications. A value of 1 means each peptide was identified by a single tag; a value of 5 means 5 out of up to 10 tags pointed to the same sequence. |
+| **Oxonium evidence** | Coloured dots indicating which diagnostic oxonium ions were detected in this protein's spectra. 

-## Data Preparation for Visualization 
-### File: `prepare_data_for_plotting_v5.py`
+### Peptide detail table

-### Purpose
-Transforms glycopeptide data into a format suitable for interactive visualization.
+Clicking a protein row expands it to reveal all individual PSMs for that protein:

-### Key Features
-- **Data Integration**: Merges glycopeptide identifications with precursor offset data
-- **Classification**: Categorizes spectra into target, decoy, and unmatched groups
-- **Duplicate Tracking**: Flags unique scan numbers for statistical analysis
+| Column | Description |
+|--------|-------------|
+| **Peptide** | Matched peptide sequence (monospace font) |
+| **Scan #** | MS2 scan number |
+| **Δ Mass** | Mass delta — total glycan mass for this PSM |
+| **Prec. mass** | Precursor monoisotopic mass |
+| **Pep. mass** | Peptide monoisotopic mass (including carbamidomethylated cysteines) |
+| **p-value** | DirecTag JointpValue — combined statistical score for the sequence tag. Lower is better. Shown with a colour-coded bar: green (good) → orange → red (poor). |
+| **MzFidelity** | DirecTag MzFidelity — how closely observed fragment masses match expected values. Lower is better. |
+| **Complement** | DirecTag Complement — presence of complementary fragment ion pairs. Higher is better. |
+| **Intensity** | DirecTag Intensity — relative intensity of matched fragments. Higher is better. |
+| **Tags** | Number of de novo tags that matched this peptide for this specific spectrum. |
+| **Unmod.** | Whether SAGE also identified the unmodified version of this peptide elsewhere in the run (✓ or —). Finding the unmodified counterpart provides additional confidence. |
+| **Unmod. scans** | Scan numbers where the unmodified peptide was identified by SAGE. |
+| **Oxonium** | Per-scan oxonium ion presence shown as coloured dots, matching the histogram trace colours. |

-### Technical Highlights
-- Handles scientific notation conversion for consistent display
-- Preserves all precursor offset data even for unmatched spectra
-- Prepares data in the format required by the visualization system

-## Reporting 
-### File: `reporting_v4.py`
+## Excel Export

-### Purpose
-Generates comprehensive reports of glycopeptide identifications in formats suitable for further analysis and sharing.
+The "Export to Excel" button downloads the currently displayed data as an Excel file with two sheets:

-### Key Features
-- **Excel Reporting**: Creates detailed multi-sheet reports
-- **PeptideShaker Format**: Transforms results to PeptideShaker-compatible format
-- **Result Integration**: Compiles results from multiple oxonium ion analyses
-- **Glycopeptide Annotation**: Adds glycan modification mass information to peptides
+- **Protein summary** — one row per protein with accession, description, PSM count, unique peptides, median tag count, and oxonium evidence
+- **Peptide details** — all individual PSMs with scan numbers, masses, scores, tag counts, and oxonium flags

-### Technical Highlights
-- Formats glycopeptide data with  glycan mass notation
-- Enables integration with conventional proteomics workflows
-- Supports detailed exploration through multi-sheet organization
+Sheet names reflect the active oxonium filter: "Protein summary (open search)", "Protein summary (Hex)", "Peptide details (Hex, HexNAc)", etc.

-## Histogram Plotting 
-### File: `histogram_plotter_v7.py`
+---

-### Purpose
-Creates static histogram visualizations of mass offsets for reporting and publication, revealing patterns in glycan composition.
+## Tips for Effective Use

-### Key Features
-- **Distribution Visualization**: Shows frequency of mass differences
-- **Range Filtering**: Applies appropriate filtering to focus on the glycan mass range
-- **Publication-Quality Output**: Generates high-resolution figures
+**Start with mass delta.** The mass delta histogram gives you the broadest overview of glycan modifications in your sample. Look for prominent recurring peaks as these are your candidate glycan masses.

-### Technical Highlights
-- Handles both simple arrays and nested data structures
-- Produces consistently formatted figures with comprehensive labels
-- Supports various data types including mass deltas and offsets
+**Use precursor offsets to reconstruct glycans.** After identifying a mass delta of interest, the child precursor offset plot shows you which monosaccharides make up that glycan. Look for stepwise losses matching known sugar masses.

-## Intensity Plotting 
-### File: `intensity_plotter_v2.py`
+**Use peptide offsets to identify the linking sugar.** The first peak in the peptide offset child plot (smallest offset above Y0) identifies which monosaccharide is directly attached to the peptide backbone.

-### Purpose
-Creates intensity-based visualizations of mass offsets, highlighting the strongest signals in the mass spectrometry data.
+**Compare untargeted and oxonium-filtered views.** Select an oxonium ion checkbox in the histogram to see its trace overlaid on the untargeted results. Mass delta peaks present in the untargeted trace but absent from the oxonium trace may represent modifications other than glycosylation, or glycans with unusual sugars not in your oxonium database.

-### Key Features
-- **Intensity Weighting**: Sums fragment intensities by mass bin
-- **Signal Emphasis**: Highlights strong MS/MS fragments
-- **Complementary Visualization**: Provides intensity perspective alongside histogram counts
+**Use the oxonium filter to increase confidence.** In the glycoprotein candidate table, selecting multiple oxonium ions will make only spectra confirmed by all selected sugar fragments to appear. This is useful for separating genuine glycopeptides from background matches.

-### Technical Highlights
-- Sums intensity values within mass bins
-- Filters offsets to focus on the glycan mass range
-- Creates bar plots with comprehensive axis labeling
+**Check the co-occurrence heatmap first.** Before diving into mass deltas, the heatmap tells you which sugars are present and how they relate. High co-occurrence between two sugars means they're likely part of the same glycan — so selecting both in the oxonium filter should retain most genuine hits.

-## Interactive Visualization 
-### File: `interactive_plotter_multiple_intensity_v9.py`
-
-### Purpose
-Creates a comprehensive web-based dashboard for interactive exploration of glycoproteomics data, enabling in-depth analysis of identified glycopeptides.
-
-### Key Features
-- **Multi-Level Visualization**: Provides multi-level data exploration with dynamic filtering
-- **Comparative Analysis**: Visualization across multiple oxonium ions
-- **Co-occurrence Analysis**: Exploration of relationships between sugar oxonium markers
-- **Data Export**: Extraction of selected subsets for reporting
-
-### Technical Highlights
-- Built with Dash/Plotly for responsive interactivity
-- Implements complex callback structure for multi-level filtering
-- Supports side-by-side comparison of target and decoy matches
-- Features adaptive color schemes for multi-oxonium visualization
-
-## Module Dependencies and Interactions
-
-The NovoGlyco modules form a directed workflow with specific dependencies:
-
-1. **Initial Processing**:

-   - Main Script → SAGE Search → mzML Reading → FASTA Processing
-
-2. **De Novo Sequencing**:
-   - DirectTag Configuration → DirectTag Execution → De Novo Result Processing
-
-3. **Glycopeptide Identification**:
-   - Oxonium Detection → Database Matching → Unmodified Peptide Finder
-
-4. **Visualization and Reporting**:
-   - Data Preparation → Reporting & Plotting → Interactive Visualization
-
-This modular design enables each component to focus on its specialized task while contributing to the overall glycopeptide identification workflow. The architecture also allows for future enhancements and extensions of individual modules without disrupting the entire pipeline.   
-
-## Key Data Structures Shared Between Modules
-
-Several important data structures are passed between modules:
-
-1. **MS2 Spectra List**: 
-   - Contains complete spectrum information
-   - Passed from mzML Reading to Oxonium Detection
-
-2. **Oxonium Spectra Summary**:
-   - Contains glycopeptide-containing spectra information
-   - Passed from Oxonium Detection through several modules
-
-3. **De Novo Tags DataFrame**:
-   - Contains sequence tags with scan associations
-   - Passed from DirectTag Execution to Database Matching
-
-4. **Glycopeptide Matches**:
-   - Contains identified glycopeptides with metadata
-   - Passed through validation and reporting modules
-
-5. **Histogram Data Dictionary**:
-   - Contains processed data for visualization
-   - Used by multiple visualization components
-
-## Extending the Module System
-
-The modular design of NovoGlyco allows for several extension points:
-
-1. **Alternative Oxonium Ions**: 
-   - Add new ions to the Excel input file
-   - No code changes required
-
-2. **Different Fragmentation Methods**:
-   - Extend mzML Reading to handle additional activation methods
-   - Update Oxonium Detection for method-specific processing
-
-3. **Custom Validation Rules**:
-   - Modify Database Matching validation criteria
-   - Add new validation modules after identification
+**Look for unmodified peptide evidence.** In the peptide detail table, a ✓ in the Unmod. column means SAGE independently identified the same peptide without glycosylation.

Module Descriptions modified by Dinko Soic

Dinko Soic — Thu, 29 May 2025 14:01:54 -0000

NovoGlyco: Module Descriptions

This document provides detailed descriptions of each module in the NovoGlyco platform, explaining their purpose, key features and technical highlights. These modules work together to form the complete glycopeptide identification and analysis pipeline.

Main Script

File: `main_script.py`

Purpose

The main orchestration script that coordinates the entire NovoGlyco workflow from input processing to result visualization.

Key Features

Pipeline Orchestration: Sequentially executes all analysis steps
Input Validation: Checks for required input files and their formats
Parameter Configuration: Sets global parameters used across modules
Progress Tracking: Displays progress information during execution
Resource Management: Tracks execution time and manages file outputs
Dashboard Initialization: Launches the interactive visualization interface

Technical Highlights

Implements a 10-step workflow with clear progress reporting
Manages separate processing for each defined oxonium ion
Supports configuration of multiple analysis parameters

SAGE Database Search

File: `pysage_v6.py`

Purpose

Performs peptide spectrum matching against a protein database to identify unmodified peptides, establishing a baseline for subsequent glycopeptide analysis.

Key Features

SAGE Database Searching: Ultrafast peptide-spectrum matching
Statistical Validation: Target-decoy approach for false discovery rate control
Result Filtering: Extracts scan numbers and proteins for downstream steps
Format Conversion: Creates PeptideShaker-compatible output

Technical Highlights

Configures enzyme digestion with customizable parameters
Handles static modifications (Carbamidomethyl C) and variable modifications (Oxidation M)
Calculates posterior error probabilities for robust validation

mzML Reading and Processing

File: `mzmlread_sage_v2.py`

Purpose

Extracts MS/MS spectra from mzML files and filters out spectra already identified in the database search, focusing subsequent analysis on unidentified spectra.

Key Features

Spectrum Extraction: Parses complex mzML file format
Metadata Processing: Extracts precursor information and scan parameters
Filtering: Removes spectra with identified unmodified peptides
Activation Pair Detection: Identifies complementary HCD/ETD scan pairs

Technical Highlights

Leverages pyteomics library for efficient mzML parsing
Matches complementary fragmentation methods based on parent ion masses
Extracts complete spectral information including m/z arrays and intensities
Provides statistical summary of retained vs. filtered spectra

FASTA Processing

File: `fastaread_v6.py`

Purpose

Processes protein database to generate peptides with a focus on potential glycosylation sites, creating the reference database for sequence tag matching.

Key Features

Focused Database Creation: Processes only proteins identified in sample
In-silico Digestion: Implements tryptic digestion with configurable parameters
Glycosite Filtering: Focuses on peptides with potential glycosylation sites
Mass Calculation: Computes accurate peptide masses

Technical Highlights

Standardizes amino acids (L→I substitution) for consistent mass calculations
Generates decoy peptides for false discovery assessment
Includes glycosylation site pattern matching
Calculates monoisotopic masses with fixed modifications

DirectTag Configuration

File: `configure_directag_v3.py`

Purpose

Generates configuration files for the DirectTag de novo sequencing tool, tailoring parameters for glycopeptide analysis.

Key Features

Parameter Customization: Configures all DirectTag settings
Modification Definition: Sets up fixed and variable modifications
Sequence Tag Configuration: Defines tag length and count parameters

Technical Highlights

Provides sensible defaults for glycopeptide analysis
Allows customization of all DirectTag parameters
Enables adjustment of tag length for different analysis scenarios

DirectTag Execution

File: `directag_v4.py`

Purpose

Runs the DirectTag de novo sequencing tool on MS/MS spectra and extracts sequence tags.

Key Features

External Tool Execution: Runs the DirectTag binary
Output Processing: Parses complex DirectTag output format
Result Integration: Links sequence tags with scan numbers

Technical Highlights

Handles the multi-line format of DirectTag output files
Processes tag-scan associations through line number tracking

Oxonium Ion Detection

File: `oxoniums_v7.py`

Purpose

Identifies potential glycopeptides by screening MS/MS spectra for diagnostic sugar marker ions.

Key Features

Oxonium Ion Screening: Identifies potential glycopeptide-containing spectra
Charge Deconvolution: Handles multiply charged fragment ions
Offset Calculation: Computes mass offsets for glycan analysis
Paired Scan Processing: Handles complementary fragmentation methods

Technical Highlights

Implements advanced charge state deconvolution algorithm
Normalizes intensities for comparison across spectra
Processes both HCD and ETD fragmentation data
Calculates multiple types of mass offsets for comprehensive analysis

De Novo Result Processing

File: `pronovo_v4.py`

Purpose

Filters de novo sequencing results to focus on glycopeptide-containing spectra, bridging the gap between oxonium ion detection and database matching.

Key Features

Result Integration: Links de novo tags with oxonium-containing spectra
Tag Filtering: Ensures consistent tag length for reliable matching
Status Tracking: Flags which glycopeptide spectra yielded usable tags

Technical Highlights

Focuses computational resources on promising glycopeptide candidates
Identifies which oxonium-containing spectra successfully yielded sequence tags
Prepares data structures for subsequent database matching

Database Matching

File: `directmatch_v11.py`

Purpose

Matches sequence tags with database peptides to identify potential glycopeptides.

Key Features

Sequence Tag Matching: Maps partial sequences to database peptides
Y0 Ion Validation: Verifies matches using peptide fragment Y0 ion evidence
Mass Delta Calculation: Computes glycan mass from precursor-peptide difference
Match Aggregation: Groups and scores multiple matches per spectrum

Technical Highlights

Creates efficient substring lookup structure for fast matching
Implements multiple validation criteria for confident identification
Calculates both mass deltas (glycan masses) and fragment offsets
Separates target and decoy results for statistical validation

Unmodified Peptide Finder

File: `unmodified_peptide_finder_v4.py`

Purpose

Checks for unmodified versions of identified glycopeptides in the database search results, providing additional validation of glycosylation.

Key Features

Cross-Validation: Confirms glycosylation through unmodified counterparts
Match Tracking: Adds validation status to glycopeptide reports

Technical Highlights

Creates efficient lookup structure for peptide-protein combinations
Provides evidence for variable glycosylation at specific sites
Enhances confidence in glycopeptide identifications

Data Preparation for Visualization

File: `prepare_data_for_plotting_v5.py`

Purpose

Transforms glycopeptide data into a format suitable for interactive visualization.

Key Features

Data Integration: Merges glycopeptide identifications with precursor offset data
Classification: Categorizes spectra into target, decoy, and unmatched groups
Duplicate Tracking: Flags unique scan numbers for statistical analysis

Technical Highlights

Handles scientific notation conversion for consistent display
Preserves all precursor offset data even for unmatched spectra
Prepares data in the format required by the visualization system

Reporting

File: `reporting_v4.py`

Purpose

Generates comprehensive reports of glycopeptide identifications in formats suitable for further analysis and sharing.

Key Features

Excel Reporting: Creates detailed multi-sheet reports
PeptideShaker Format: Transforms results to PeptideShaker-compatible format
Result Integration: Compiles results from multiple oxonium ion analyses
Glycopeptide Annotation: Adds glycan modification mass information to peptides

Technical Highlights

Formats glycopeptide data with glycan mass notation
Enables integration with conventional proteomics workflows
Supports detailed exploration through multi-sheet organization

Histogram Plotting

File: `histogram_plotter_v7.py`

Purpose

Creates static histogram visualizations of mass offsets for reporting and publication, revealing patterns in glycan composition.

Key Features

Distribution Visualization: Shows frequency of mass differences
Range Filtering: Applies appropriate filtering to focus on the glycan mass range
Publication-Quality Output: Generates high-resolution figures

Technical Highlights

Handles both simple arrays and nested data structures
Produces consistently formatted figures with comprehensive labels
Supports various data types including mass deltas and offsets

Intensity Plotting

File: `intensity_plotter_v2.py`

Purpose

Creates intensity-based visualizations of mass offsets, highlighting the strongest signals in the mass spectrometry data.

Key Features

Intensity Weighting: Sums fragment intensities by mass bin
Signal Emphasis: Highlights strong MS/MS fragments
Complementary Visualization: Provides intensity perspective alongside histogram counts

Technical Highlights

Sums intensity values within mass bins
Filters offsets to focus on the glycan mass range
Creates bar plots with comprehensive axis labeling

Interactive Visualization

File: `interactive_plotter_multiple_intensity_v9.py`

Purpose

Creates a comprehensive web-based dashboard for interactive exploration of glycoproteomics data, enabling in-depth analysis of identified glycopeptides.

Key Features

Multi-Level Visualization: Provides multi-level data exploration with dynamic filtering
Comparative Analysis: Visualization across multiple oxonium ions
Co-occurrence Analysis: Exploration of relationships between sugar oxonium markers
Data Export: Extraction of selected subsets for reporting

Technical Highlights

Built with Dash/Plotly for responsive interactivity
Implements complex callback structure for multi-level filtering
Supports side-by-side comparison of target and decoy matches
Features adaptive color schemes for multi-oxonium visualization

Module Dependencies and Interactions

The NovoGlyco modules form a directed workflow with specific dependencies:

Initial Processing:
Main Script → SAGE Search → mzML Reading → FASTA Processing
De Novo Sequencing:
DirectTag Configuration → DirectTag Execution → De Novo Result Processing
Glycopeptide Identification:
Oxonium Detection → Database Matching → Unmodified Peptide Finder
Visualization and Reporting:
Data Preparation → Reporting & Plotting → Interactive Visualization

This modular design enables each component to focus on its specialized task while contributing to the overall glycopeptide identification workflow. The architecture also allows for future enhancements and extensions of individual modules without disrupting the entire pipeline.

Key Data Structures Shared Between Modules

Several important data structures are passed between modules:

MS2 Spectra List:
Contains complete spectrum information
Passed from mzML Reading to Oxonium Detection
Oxonium Spectra Summary:
Contains glycopeptide-containing spectra information
Passed from Oxonium Detection through several modules
De Novo Tags DataFrame:
Contains sequence tags with scan associations
Passed from DirectTag Execution to Database Matching
Glycopeptide Matches:
Contains identified glycopeptides with metadata
Passed through validation and reporting modules
Histogram Data Dictionary:
Contains processed data for visualization
Used by multiple visualization components

Extending the Module System

The modular design of NovoGlyco allows for several extension points:

Alternative Oxonium Ions:
Add new ions to the Excel input file
No code changes required
Different Fragmentation Methods:
Extend mzML Reading to handle additional activation methods
Update Oxonium Detection for method-specific processing
Custom Validation Rules:
Modify Database Matching validation criteria
Add new validation modules after identification

Recent changes to Interactive Dashboard

Interactive Dashboard modified by glycolab

Interactive Dashboard modified by Dinko Soic

Interactive Dashboard modified by Dinko Soic

Module Descriptions modified by Dinko Soic

NovoGlyco: Module Descriptions

Main Script

File: main_script.py

Purpose

Key Features

Technical Highlights

SAGE Database Search

File: pysage_v6.py

Purpose

Key Features

Technical Highlights

mzML Reading and Processing

File: mzmlread_sage_v2.py

Purpose

Key Features

Technical Highlights

FASTA Processing

File: fastaread_v6.py

Purpose

Key Features

Technical Highlights

DirectTag Configuration

File: configure_directag_v3.py

Purpose

Key Features

Technical Highlights

DirectTag Execution

File: directag_v4.py

Purpose

Key Features

Technical Highlights

Oxonium Ion Detection

File: oxoniums_v7.py

Purpose

Key Features

Technical Highlights

De Novo Result Processing

File: pronovo_v4.py

Purpose

Key Features

Technical Highlights

Database Matching

File: directmatch_v11.py

Purpose

Key Features

Technical Highlights

Unmodified Peptide Finder

File: unmodified_peptide_finder_v4.py

Purpose

Key Features

Technical Highlights

Data Preparation for Visualization

File: prepare_data_for_plotting_v5.py

Purpose

Key Features

Technical Highlights

Reporting

File: reporting_v4.py

Purpose

Key Features

Technical Highlights

Histogram Plotting

File: histogram_plotter_v7.py

Purpose

Key Features

Technical Highlights

Intensity Plotting

File: intensity_plotter_v2.py

Purpose

Key Features

Technical Highlights

Interactive Visualization

File: interactive_plotter_multiple_intensity_v9.py

Purpose

Key Features

File: `main_script.py`

File: `pysage_v6.py`

File: `mzmlread_sage_v2.py`

File: `fastaread_v6.py`

File: `configure_directag_v3.py`

File: `directag_v4.py`

File: `oxoniums_v7.py`

File: `pronovo_v4.py`

File: `directmatch_v11.py`

File: `unmodified_peptide_finder_v4.py`

File: `prepare_data_for_plotting_v5.py`

File: `reporting_v4.py`

File: `histogram_plotter_v7.py`

File: `intensity_plotter_v2.py`

File: `interactive_plotter_multiple_intensity_v9.py`