Download Latest Version v1.8.1 - Critical Bitsandbytes Fix source code.tar.gz (100.8 kB)
Email in envelope

Get an email when there's a new version of VibeVoice ComfyUI

Home / v1.7.0
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2025-09-30 2.3 kB
v1.7.0 - Dynamic 4-Bit Quantization source code.tar.gz 2025-09-30 99.1 kB
v1.7.0 - Dynamic 4-Bit Quantization source code.zip 2025-09-30 123.4 kB
Totals: 3 Items   224.8 kB 0

🚀 Dynamic LLM Quantization

  • New Feature: On-the-fly 4-bit quantization for non-quantized models
  • Selective Quantization: Only quantizes the language model, keeps diffusion at full precision
  • Major VRAM Savings: Significantly reduced memory usage with minimal quality impact
  • Easy Toggle: Simple dropdown to switch between full precision and 4-bit

✨ What's New

Quantize LLM Parameter

  • New parameter: quantize_llm in both Single and Multiple Speaker nodes
  • Options:
  • full precision (default) - Original model quality
  • 4bit - Dynamic quantization for VRAM savings

How It Works

  • Quantizes only the language model component (the largest part)
  • Keeps diffusion head at full precision for quality
  • Uses NF4 (4-bit NormalFloat) optimized for neural networks
  • Applied dynamically - no need to download separate quantized models

⚡ Performance Benefits

  • Faster Generation: Reduced memory bandwidth = faster processing
  • Lower VRAM: Run larger models on smaller GPUs
  • Batch Processing: Fit more in memory for parallel generation
  • Quality Preserved: Minimal degradation vs full precision

🎯 Use Cases

Perfect for: - Running VibeVoice-Large on 16GB GPUs - Batch processing multiple generations - Faster iteration during development - Production environments with limited VRAM

⚙️ Requirements

  • GPU: NVIDIA CUDA-capable GPU
  • Library: bitsandbytes (auto-installed if missing)
  • Note: Falls back to full precision on CPU/MPS

💡 Smart Detection

  • Automatically disabled for pre-quantized models
  • Only applies to standard full-precision models
  • Clear logging shows when quantization is active

💾 Installation

Install via ComfyUI Manager or manually:

:::bash
git clone https://github.com/Enemyx-net/VibeVoice-ComfyUI

📋 Usage Tips

  • Start with full precision to establish quality baseline
  • Switch to 4-bit for production or when VRAM limited
  • Ideal for VibeVoice-Large on consumer GPUs
  • No effect on already quantized models (they stay as-is)

🔧 Technical Details

  • Quantization Type: NF4 (4-bit NormalFloat)
  • Applied per-session (not saved to disk)
Source: README.md, updated 2025-09-30