| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2025-09-30 | 2.3 kB | |
| v1.7.0 - Dynamic 4-Bit Quantization source code.tar.gz | 2025-09-30 | 99.1 kB | |
| v1.7.0 - Dynamic 4-Bit Quantization source code.zip | 2025-09-30 | 123.4 kB | |
| Totals: 3 Items | 224.8 kB | 0 | |
🚀 Dynamic LLM Quantization
- New Feature: On-the-fly 4-bit quantization for non-quantized models
- Selective Quantization: Only quantizes the language model, keeps diffusion at full precision
- Major VRAM Savings: Significantly reduced memory usage with minimal quality impact
- Easy Toggle: Simple dropdown to switch between full precision and 4-bit
✨ What's New
Quantize LLM Parameter
- New parameter:
quantize_llmin both Single and Multiple Speaker nodes - Options:
full precision(default) - Original model quality4bit- Dynamic quantization for VRAM savings
How It Works
- Quantizes only the language model component (the largest part)
- Keeps diffusion head at full precision for quality
- Uses NF4 (4-bit NormalFloat) optimized for neural networks
- Applied dynamically - no need to download separate quantized models
⚡ Performance Benefits
- ✅ Faster Generation: Reduced memory bandwidth = faster processing
- ✅ Lower VRAM: Run larger models on smaller GPUs
- ✅ Batch Processing: Fit more in memory for parallel generation
- ✅ Quality Preserved: Minimal degradation vs full precision
🎯 Use Cases
Perfect for: - Running VibeVoice-Large on 16GB GPUs - Batch processing multiple generations - Faster iteration during development - Production environments with limited VRAM
⚙️ Requirements
- GPU: NVIDIA CUDA-capable GPU
- Library: bitsandbytes (auto-installed if missing)
- Note: Falls back to full precision on CPU/MPS
💡 Smart Detection
- Automatically disabled for pre-quantized models
- Only applies to standard full-precision models
- Clear logging shows when quantization is active
💾 Installation
Install via ComfyUI Manager or manually:
:::bash
git clone https://github.com/Enemyx-net/VibeVoice-ComfyUI
📋 Usage Tips
- Start with full precision to establish quality baseline
- Switch to 4-bit for production or when VRAM limited
- Ideal for VibeVoice-Large on consumer GPUs
- No effect on already quantized models (they stay as-is)
🔧 Technical Details
- Quantization Type: NF4 (4-bit NormalFloat)
- Applied per-session (not saved to disk)