The voice-activity-detection model by pyannote is a neural pipeline for detecting when speech occurs in audio recordings. Built on pyannote.audio 2.1, it identifies segments of active speech within any audio file, making it valuable for preprocessing tasks like transcription, diarization, or voice-controlled systems. The model was trained using datasets such as AMI, DIHARD, and VoxConverse, and it requires users to authenticate via Hugging Face for access. To use the model, users must accept usage conditions and provide a Hugging Face access token. Once initialized, the pipeline returns time-stamped intervals of detected speech. The model is ideal for academic research and production environments seeking high-accuracy voice detection. It is released under the MIT license and supports applications in speech recognition, speaker segmentation, and conversational AI.

Features

  • Detects precise speech activity segments in audio
  • Built with pyannote.audio 2.1 framework
  • Trained on robust datasets including AMI and VoxConverse
  • Requires Hugging Face access token for model use
  • Easy integration with PyTorch and Python pipelines
  • Ideal for speaker diarization, ASR, and voice-based systems
  • Supports timeline-based voice activity outputs
  • Open-source under the MIT license

Project Samples

Project Activity

See All Activity >

Categories

AI Models

Follow voice-activity-detection

voice-activity-detection Web Site

Other Useful Business Software
AI-powered service management for IT and enterprise teams Icon
AI-powered service management for IT and enterprise teams

Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
Try it Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of voice-activity-detection!

Additional Project Details

Registered

2025-07-01