Music Source Separation is a PyTorch-based open-source implementation for the task of separating a music (or audio) recording into its constituent sources — for example isolating vocals, instruments, bass, accompaniment, or background from a mixed track. It aims to give users the ability to take any existing song and decompose it into separate stems (vocals, accompaniment, etc.), or to train custom separation models on their own datasets (e.g. for speech enhancement, instrument isolation, or other audio-separation tasks). The repository provides training scripts (e.g. using datasets such as MUSDB18), preprocessing steps (audio-to-HDF5 packing, indexing), evaluation pipelines, and inference scripts to perform separation on arbitrary audio files. This makes the project useful both for researchers in music information retrieval / audio machine learning and for hobbyists or practitioners who want to experiment with remixing, karaoke, or audio editing.
Features
- Ability to separate mixed audio into individual stems (vocals, instruments, accompaniment, bass, etc.)
- Support for training from scratch using public datasets (e.g., MUSDB18) or custom datasets
- Preprocessing pipeline (audio packaging, indexing, dataset preparation) to streamline training workflows
- Inference scripts for easy source-separation of arbitrary audio files (e.g. .mp3) without deep ML expertise
- Flexibility for tasks beyond music — e.g. speech enhancement, instrument isolation, general audio separation
- PyTorch-based architecture enabling custom modifications, finetuning, or integration into larger audio-processing pipelines