| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| NovoGlyco_docker_v1.0.1.zip | 2025-06-03 | 11.5 MB | |
| readme.md | 2025-05-29 | 22.0 kB | |
| Totals: 2 Items | 11.5 MB | 0 |
NovoGlyco Docker
NovoGlyco is a comprehensive glycoproteomics platform for identifying and characterizing prokaryotic protein glycosylation from large-scale shotgun proteomics data. It works in tandem with Oxonium Browser (our tool for discovering diagnostic sugar oxonium ions in MS/MS data) to provide a complete solution for untargeted prokaryiotic glycopeptide analysis.
Workflow Overview
NovoGlyco implements a multi-stage analytical pipeline for glycopeptide identification:
┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Input Files │────▶│ SAGE Database │────▶│MS/MS Spectrum│────▶│ Oxonium Ion │
│ Processing │ │ Search │ │ Filtering │ │ Detection │
└───────────────┘ └───────────────┘ └───────────────┘ └───────────────┘
│
▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Interactive │◀────│ Glycopeptide │◀────│ Sequence Tag │◀───│ DirectTag De │
│ Dashboard │ │ Validation │ │ Matching │ │Novo Sequencing│
└───────────────┘ └───────────────┘ └───────────────┘ └───────────────┘
The NovoGlyco workflow includes: - SAGE protein database search to remove MS2 spectra of unmodified peptides - Detection of oxonium ions in the remaining spectra from potential glycopeptides - Generation of de novo sequence tags using DirectTag - Matching tags to a proteome database thus identifying potential glycopeptides - Creation of interactive visualizations and reports
Note: This is a beta version. A graphical user interface (GUI) will be released soon.
Documentation
Detailed documentation is available in the SourceForge Wiki section, including comprehensive parameter guide, analytical metrics explanation, and system architecture.
Requirements
- Docker (version 19.03 or higher recommended)
- At least 4GB of available RAM
- At least 10GB of free disk space
- For Astral raw files, increased memory will be required.
Input Files
Place the following files in the Input directory:
- Mass Spectrometry Data:
.mzMLfile-
Note: Vendor-specific RAW files must be converted to mzML format before using this tool
-
Protein Database:
-
.fastafile containing protein sequences -
Sugar Oxonium Ion List:
.xlsxfile containing sugar oxonium ions to be searched for (Output of Oxonium Browser - see example format below)
Quick Start
- Download and extract the project: ```bash # Download the latest release from SourceForge # https://sourceforge.net/projects/novoglyco/files/
# Extract the downloaded archive unzip novoglyco-docker-v1.0.0.zip # or tar -xzf novoglyco-docker-v1.0.0.tar.gz
# Navigate to the project directory cd novoglyco-docker ```
- Prepare your input files
- Place your
.mzMLfiles (MS/MS data),.fastafiles (protein database), and.xlsxfile (oxonium ion definitions) in theInputdirectory -
The DirectTag executable files are already included in the
directag_windows_64bitsdirectory for Windows anddirectag_linux_64bitdirectory for Linux -
Build the Docker image:
bash docker build -t novoglyco . -
Run NovoGlyco using one of the following methods:
First, ensure any previous NovoGlyco container is removed:
bash
docker rm -f novoglyco
### Option 1: Single Command (recommended for first-time runs)
Run both DirectTag and Docker together with a single command:
For Windows (Command Prompt):
start "DirectTag Process" cmd /c run_directag_windows.bat & docker run --name novoglyco -p 8050:8050 -v "%cd%\Input:/app/Input" -v "%cd%\Output:/app/Output" novoglyco
For Windows (PowerShell):
Start-Process -FilePath "cmd" -ArgumentList "/c run_directag_windows.bat"; docker run --name novoglyco -p 8050:8050 -v "${PWD}\Input:/app/Input" -v "${PWD}\Output:/app/Output" novoglyco
For Linux/macOS:
./run_directag_linux.sh & docker run --name novoglyco -p 8050:8050 -v "$(pwd)/Input:/app/Input" -v "$(pwd)/Output:/app/Output" novoglyco
### Option 2: Separate Commands
Run both processes in separate terminal windows:
Terminal 1 - Start DirectTag script:
For Windows:
run_directag_windows.bat
For Linux:
./run_directag_linux.sh
Terminal 2 - Start Docker container:
For Windows Command Prompt:
docker run --name novoglyco -p 8050:8050 -v "%cd%\Input:/app/Input" -v "%cd%\Output:/app/Output" novoglyco
For PowerShell:
docker run --name novoglyco -p 8050:8050 -v "${PWD}\Input:/app/Input" -v "${PWD}\Output:/app/Output" novoglyco
For Linux/macOS:
docker run --name novoglyco -p 8050:8050 -v "$(pwd)/Input:/app/Input" -v "$(pwd)/Output:/app/Output" novoglyco
### Option 3: Docker Only (when tags are already generated)
If you already have DirectTag tags files in your Input directory with the desired tag length, you can skip the DirectTag execution and run only the Docker container:
For Windows Command Prompt:
docker run --name novoglyco -p 8050:8050 -v "%cd%\Input:/app/Input" -v "%cd%\Output:/app/Output" novoglyco
For PowerShell:
docker run --name novoglyco -p 8050:8050 -v "${PWD}\Input:/app/Input" -v "${PWD}\Output:/app/Output" novoglyco
For Linux/macOS:
docker run --name novoglyco -p 8050:8050 -v "$(pwd)/Input:/app/Input" -v "$(pwd)/Output:/app/Output" novoglyco
Note: For the Docker-only approach to work, there must be a tags file in the Input directory named exactly [mzML_filename]_DIRECTAG_top10_tag[Tag_Length].tags. If you've previously generated tags with a different naming convention, make sure to rename them accordingly.
- Access the interactive dashboard
-
Open your browser and navigate to:
http://localhost:8050 -
Results will be saved to the
Outputdirectory -
To analyze the next set of data or use different parameters:
- Press
Ctrl+Cto stop the Docker container - Remove the container:
docker rm -f novoglyco - Run the analysis command again with new input files
Running NovoGlyco with Custom Parameters
To use NovoGlyco with custom parameters, add the environment variables to your Docker run command:
Option 1: Single Command with Custom Parameters
For Windows (Command Prompt):
start "DirectTag Process" cmd /c run_directag_windows.bat & docker run --name novoglyco -p 8050:8050 -e TAG_LENGTH=6 -e MIN_OFFSET=800 -e INT=0.2 -v "%cd%\Input:/app/Input" -v "%cd%\Output:/app/Output" novoglyco
For Windows (PowerShell):
Start-Process -FilePath "cmd" -ArgumentList "/c run_directag_windows.bat"; docker run --name novoglyco -p 8050:8050 -e TAG_LENGTH=6 -e MIN_OFFSET=800 -e INT=0.2 -v "${PWD}\Input:/app/Input" -v "${PWD}\Output:/app/Output" novoglyco
For Linux/macOS:
./run_directag_linux.sh & docker run --name novoglyco -p 8050:8050 -e TAG_LENGTH=6 -e MIN_OFFSET=800 -e INT=0.2 -v "$(pwd)/Input:/app/Input" -v "$(pwd)/Output:/app/Output" novoglyco
Option 2: Separate Commands with Custom Parameters
Terminal 1 - Start DirectTag script (same as before)
Terminal 2 - Start Docker container with parameters:
For Windows Command Prompt:
docker run --name novoglyco -p 8050:8050 ^
-e TAG_LENGTH=6 ^
-e MIN_OFFSET=800 ^
-e INT=0.2 ^
-v "%cd%\Input:/app/Input" ^
-v "%cd%\Output:/app/Output" ^
novoglyco
For PowerShell:
docker run --name novoglyco -p 8050:8050 `
-e TAG_LENGTH=6 `
-e MIN_OFFSET=800 `
-e INT=0.2 `
-v "${PWD}\Input:/app/Input" `
-v "${PWD}\Output:/app/Output" `
novoglyco
For Linux/macOS:
docker run --name novoglyco -p 8050:8050 \
-e TAG_LENGTH=6 \
-e MIN_OFFSET=800 \
-e INT=0.2 \
-v "$(pwd)/Input:/app/Input" \
-v "$(pwd)/Output:/app/Output" \
novoglyco
Available Parameters
NovoGlyco offers the following configurable parameters:
Glycopeptide Detection Parameters
GLYCOSITES=[STN] # Filter for peptides containing S, T, or N
AMINO_ACID_MARKER=false # Set to true to enable amino acid marker detection
INT=0.1 # Intensity threshold for oxonium ion detection
MASS_ERROR=0.005 # Mass error tolerance (Da)
MIN_OFFSET=750 # Minimum mass offset to consider
VALIDATE_PEPTIDE_MASS=true # Validate peptide mass (Y0 ion presence)
TYPE=ETD # Activation method to exclude: ETD, HCD or NONE
Peptide Database Parameters
MIN_LENGTH=6 # Minimum peptide length
MAX_MISSED_CLEAVAGES=0 # Maximum allowed missed cleavages
SAGE Database Search Parameters
SAGE_MIN_PEPTIDE_LENGTH=6 # Minimum peptide length for SAGE search
SAGE_MISSED_CLEAVAGES=2 # Maximum missed cleavages for SAGE search
SAGE_GENERATE_DECOYS=true # Generate decoy database for FDR estimation
SAGE_PREC_TOL=20 # Precursor mass tolerance in ppm
SAGE_FRAG_TOL=20 # Fragment ion mass tolerance in ppm
SAGE_FDR=1 # FDR threshold as percentage
De Novo Sequencing Parameters
TAG_LENGTH=5 # Length of sequence tags for de novo sequencing
Visualization Parameters
BIN_WIDTH=0.1 # Bin width for histograms
SAVE_PLOTS_LOCALLY=false # Save static plots locally
PORT=8050 # Port for the interactive dashboard
Common Parameter Combinations
Bacterial O-Glycosylation Analysis
-e TAG_LENGTH=5 \
-e MIN_OFFSET=600 \
-e INT=0.1 \
-e GLYCOSITES=[ST] \
-e TYPE=ETD \
-e SAGE_FDR=1
N-Glycosylation Analysis
-e TAG_LENGTH=6 \
-e MIN_OFFSET=900 \
-e INT=0.1 \
-e GLYCOSITES=N \
-e TYPE=ETD \
-e SAGE_FDR=1
Low-Abundance Glycopeptide Discovery
-e TAG_LENGTH=4 \
-e MIN_OFFSET=500 \
-e INT=0.05 \
-e VALIDATE_PEPTIDE_MASS=false \
-e SAGE_FDR=5
Key Analytical Metrics
The platform uses three primary mass metrics for glycopeptide characterization:
- Mass Delta: Difference between precursor mass and peptide mass (total glycan mass)
- Precursor Offsets: Differences between precursor mass and fragment masses (cleaved glycan fragments)
- Peptide Offsets: Differences between fragment masses and peptide mass (attached glycan fragments)
Through the combined analysis of these metrics, NovoGlyco can identify glycopeptides and provide insights into glycan compositions.
Technical Overview of DirectTag Integration
DirectTag is an executable that cannot run directly in Docker's Linux environment. Our solution uses a hybrid approach:
- Docker Container Side (Linux):
- The main pipeline runs in Docker
- When DirectTag is needed, it creates a signal file in the Input directory
-
It then waits for the tags file to appear before continuing
-
Host Side:
- The DirectTag script runs in parallel and watches for the signal file
- When the signal appears, it processes the file and generates the required tags
- The Docker container automatically continues when tags are available
Windows-Specific Notes
For Windows users, the DirectTag executable is in the directag_windows_64bits directory and is run through the batch script.
Linux-Specific Notes
For Linux users, the DirectTag executable is in the directag_linux_64bit directory. Ensure the script has executable permissions:
chmod +x run_directag_linux.sh
If the Linux DirectTag executable doesn't have execute permissions, the script will attempt to set them, but you may need to do this manually:
chmod +x directag_linux_64bit/directag
Updates to the Dashboard
After changing configuration parameters and restarting the container, you'll need to manually refresh your browser page to see the updated dashboard with new parameters. This is normal behavior when running web applications in Docker.
Mass-Based Detection Limitations: This approach identifies sugars based on diagnostic oxonium ion masses, but cannot differentiate between isomeric sugars. For example, when a hexose (Hex) is detected, additional biochemical experiments or literature review would be required to determine whether it represents glucose, galactose, mannose, or another hexose isomer. The tool provides evidence of glycosylation and sugar mass, but structural characterization requires complementary techniques.
RAW File Conversion
NovoGlyco requires mzML format input files. Vendor-specific RAW files must be converted before using this tool.
Converting RAW files to mzML Format
Using ProteoWizard MSConvert (Recommended):
1. Download MSConvert from ProteoWizard
2. Open MSConvert GUI
3. Select your RAW file(s)
4. Output format: mzML
5. Recommended settings:
- Check "Peak Picking" (Set MS levels: 1-2) for Astral data, you can select MS levels 2 (only), to reduce size
- Binary encoding precision:
- 64-bit for maximum precision (default) for Astral data, 32-bit can be used to reduce file size
- zlib compression: checked
- Write index: checked
- TPP compatibility: checked
6. Click "Start" to convert
7. Place the resulting .mzML file in the Input directory
Note on encoding precision 32-bit encoding has been tested with Astral files. However, please check your available memory before starting runs, as the required memory is approximately equal to the size of the mzML file.
For more detailed conversion instructions, see the MSConvert documentation.
Oxonium Ion Excel File Format
The Excel file contains the following key columns:
- Oxonium: Name of the sugar monomer (e.g., "HexNAc", "Hex", "Hept")
- ox_mass1: Primary diagnostic mass (intact oxonium ion)
- ox_mass2: Secondary diagnostic mass (either water loss, or carboxylic acid fragment)
Additional columns may be present for reference information, but these three are required for the software to function.
Note: We recommend generating Oxonium Ion List using our Oxonium Browser (see on Sourceforge: https://sourceforge.net/projects/oxoniumbrowserx/files/)
Output
After a successful run, the following output files will be generated in the Output/[mzML filename] directory:
- Excel reports with identified glycopeptides
- Glycopeptide and peptide results in PeptideShaker format
- Plots of mass deltas and offsets, if specified
Troubleshooting
Common Issues
- "No raw or mzML file found in the input folder" error
- Ensure you have placed an .mzML file in the Input directory
- RAW files are not supported directly - convert them to mzML format first using MSConvert
-
Check file permissions and names (case sensitivity matters)
-
Dashboard not accessible
- Try accessing via http://localhost:8051
- Check that the port mapping is correct in your docker run command
-
Verify that no other application is using port 8051
-
Container name conflict
-
If you see "The container name is already in use":
bash # Remove the existing container docker rm -f novoglyco -
"No space left on device" error
- Clean up Docker resources:
bash docker system prune -a -f -
Allocate more disk space to Docker in Docker Desktop settings
-
Port Conflict Issues
- Use a different port in the Docker run command:
... docker run --name novoglyco -p 8051:8050 ... -
Then access the dashboard at http://localhost:8051
-
DirectTag script not starting properly
- Make sure the script file is executable (Linux)
- Try running the DirectTag script manually to check for errors
-
Check that the DirectTag executable exists in the correct directory
-
Windows Command Processor Issues
- If you encounter issues with the
&operator in the single command approach, try using the separate commands option instead
Docker Tips
- View running containers:
docker ps - View container logs:
docker logs novoglyco - Stop container:
docker stop novoglyco - Remove container:
docker rm -f novoglyco - Remove old containers:
docker container prune - Remove old images:
docker image prune
License
NovoGlyco is released under the Apache License 2.0.
Copyright (c) 2025
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License. You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Please note that some of the functions and libraries used by NovoGlyco may not share the same license as NovoGlyco. If you want to use any of these in a different context, ensure that you obtain the appropriate licenses for the dependent libraries and tools. Additional info dependencies and functions (not complete):
Python: Python 3.x, open-source, https://www.python.org/ Dash: MIT License, https://dash.plotly.com/ Pyteomics: MIT License, https://pyteomics.readthedocs.io/ pandas: BSD 3-Clause License, https://pandas.pydata.org/ matplotlib: Matplotlib is licensed under the PSF License, https://matplotlib.org/ scipy: BSD 3-Clause License, https://scipy.org/ numpy: BSD 3-Clause License, https://numpy.org/ Sage: MIT License, https://github.com/lazear/sage (Used for advanced spectrum annotation and pre-filtering) Lazear, Michael R. "Sage: an open-source tool for fast proteomics searching and quantification at scale." Journal of Proteome Research 22.11 (2023): 3652-3659. Other Python Libraries: Please review the licenses for any other third-party packages used. Version/History: Version 1.0.0
Future versions will continue to improve functionality and performance, with regular updates to fix bugs and add features.
Data/Privacy: NovoGlyco does not collect, store, or transmit any personal data. It operates entirely on the local machine and does not interact with any external servers or services. All data processing and analysis occur locally, and the application does not send or receive any data over the internet unless specifically configured to do so (e.g., if the user chooses to share files for support purposes). Data Collection: No data is collected by the NovoGlyco itself. However, any files processed using NovoGlyco (such as proteomics data files) are handled on the user's local machine and are not transmitted unless manually shared by the user. Data Security: NovoGlyco does not store any sensitive data and does not have access to personal or confidential information. All data handling is kept within the scope of the users local environment, and files are not uploaded or shared without explicit user action. Privacy Policy: Since NovoGlyco does not engage in data collection or sharing, a privacy policy is not required. However, users should ensure they are aware of the privacy policies of any external tools or libraries that may collect data in their respective functionalities. User Consent: By using NovoGlyco, you consent to the software operating on your local machine as described above, and you are responsible for managing your own data and files.
No Warranty Disclaimer: THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Please ensure you are complying with the terms and conditions of the dependencies and their respective licenses.
Citation
If you use this software in your research, please cite: Soic D and Pabst M. NovoGlyco: mapping protein glycosylation in prokaryotes. bioRxiv. 2025.
Contacts
Dinko Soic (dsoic@pharma.hr) Martin Pabst (m.pabst@tudelft.nl)