Demucs (Deep Extractor for Music Sources) is a deep-learning framework for music source separation—extracting individual instrument or vocal tracks from a mixed audio file. The system is based on a U-Net-like convolutional architecture combined with recurrent and transformer elements to capture both short-term and long-term temporal structure. It processes raw waveforms directly rather than spectrograms, allowing for higher-quality reconstruction and fewer artifacts in separated tracks. The repository includes pretrained models for common tasks such as isolating vocals, drums, bass, and accompaniment from stereo music, achieving state-of-the-art results in benchmarks like MUSDB18. Demucs supports GPU-accelerated inference and can process multi-channel audio with chunked streaming for real-time or batch operation. It also provides training scripts and utilities to fine-tune on custom datasets, along with remixing and enhancement tools.
Features
- Neural music source separation from raw waveforms (vocals, drums, bass, others)
- U-Net-like architecture with recurrent and transformer layers for temporal modeling
- Pretrained models achieving top scores on MUSDB and similar benchmarks
- GPU-accelerated inference and streaming-based processing for real-time use
- Training utilities and scripts for custom datasets or fine-tuning
- Audio remixing, enhancement, and restoration tools integrated with model outputs