The voice-activity-detection model by pyannote is a neural pipeline for detecting when speech occurs in audio recordings. Built on pyannote.audio 2.1, it identifies segments of active speech within any audio file, making it valuable for preprocessing tasks like transcription, diarization, or voice-controlled systems. The model was trained using datasets such as AMI, DIHARD, and VoxConverse, and it requires users to authenticate via Hugging Face for access. To use the model, users must accept usage conditions and provide a Hugging Face access token. Once initialized, the pipeline returns time-stamped intervals of detected speech. The model is ideal for academic research and production environments seeking high-accuracy voice detection. It is released under the MIT license and supports applications in speech recognition, speaker segmentation, and conversational AI.
Features
- Detects precise speech activity segments in audio
- Built with pyannote.audio 2.1 framework
- Trained on robust datasets including AMI and VoxConverse
- Requires Hugging Face access token for model use
- Easy integration with PyTorch and Python pipelines
- Ideal for speaker diarization, ASR, and voice-based systems
- Supports timeline-based voice activity outputs
- Open-source under the MIT license