Showing 13 open source projects for "vad"

View related business solutions
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Bailing

    Bailing

    Bailing is a voice dialogue robot similar to GPT-4o

    Bailing is an open-source voice-dialogue assistant designed to deliver natural voice-based conversations by combining automatic speech recognition (ASR), voice activity detection (VAD), a large language model (LLM), and text-to-speech (TTS) in a single pipeline. Its goal is to offer a “voice-first” chat experience similar to what one might expect from a system like GPT-4o, but fully open and deployable by users. The project is modular: each core function — ASR, VAD, LLM, TTS — exists as a separately replaceable component, which allows flexibility in picking your preferred models depending on resources or languages. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    RealtimeSTT

    RealtimeSTT

    A robust, efficient, low-latency speech-to-text library

    RealtimeSTT is a Python-based realtime speech-to-text engine emphasizing low latency, wake-word detection, voice activity detection, and automatic speech segmentation. It provides asynchronous callbacks, nanosecond-precision timestamps, and CLI tools, suitable for building voice assistants, meeting transcribers, or live caption systems.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    WhisperJAV

    WhisperJAV

    Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD

    WhisperJAV is an open-source speech transcription pipeline designed specifically for generating subtitles for Japanese adult video content. The project addresses challenges that standard speech recognition models face when transcribing this type of audio, which often includes low signal-to-noise ratios and large numbers of non-verbal vocalizations. Traditional automatic speech recognition systems can misinterpret these sounds as words, leading to inaccurate transcripts. WhisperJAV introduces...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 4
    Handy STT

    Handy STT

    A free, open source, and extensible speech-to-text application

    ...Its backend leverages OpenAI’s Whisper models for GPU-accelerated speech recognition and Parakeet V3 for efficient CPU-only transcription with automatic language detection. To further refine accuracy and responsiveness, Handy integrates Silero’s Voice Activity Detection (VAD) for silence filtering, ensuring only speech segments are processed.
    Downloads: 17 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 5

    WhisperJAV

    A subtitle generator for Japanese Adult Videos.

    A subtitle generator for Japanese Adult Videos. Transformer-based ASR architectures like Whisper suffer significant performance degradation when applied to the spontaneous and noisy domain of JAV. This degradation is driven by specific acoustic and temporal characteristics that defy the statistical distributions of standard training data.
    Downloads: 105 This Week
    Last Update:
    See Project
  • 6
    Amica

    Amica

    Amica is an open source interface for interactive communication

    ...Under the hood, Amica leverages modern web and desktop technologies: three.js and three-vrm for 3D rendering, Transformers.js for running models in the browser, Whisper and Silero VAD for speech recognition and voice-activity detection, and a variety of LLM backends such as llama.cpp servers, ChatGPT-compatible APIs, Ollama, KoboldCpp, and others. It also integrates multiple text-to-speech providers, including ElevenLabs, OpenAI, Coqui, RVC, and AllTalkTTS.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 7

    OpenAppTarifas

    Cálculo de tarifas típicas del sector de distribución eléctrica

    OpenAppTarifas es una aplicación que permite calcular, en esta primer versión beta, 2 (dos) tarifas típicas del sector de distribución de energía eléctrica: una T1G con el VAD (Valor Agregado de Distribución) energizado y calculando los correspondientes segmentos de linealización, obteniendo un cargo fijo y un cargo variable; y la otra T2MD con el VAD asignado a un cargo fijo (comercial) y un cargo por potencia, dejando solo en un cargo variable la compra (pass-through) de la energía del Mercado Eléctrico Mayorista (MEM). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    VAD

    VAD

    Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM

    This repository is a voice activity detection (VAD) toolkit that implements multiple models (DNN, bDNN, LSTM, ACAM) for detecting speech versus non-speech in audio. It also provides a recorded dataset in varied real-world settings (e.g. bus stop, construction site, park, room) with ground truth labeling. Acoustic feature extraction (multi-resolution cochleagram, MRCG). Post-processing modules (e.g. smoothing, thresholds).
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Live Transcribe Speech Engine

    Live Transcribe Speech Engine

    Live Transcribe is an Android application

    Live Transcribe Speech Engine provides on-device speech recognition components that power real-time transcription for accessibility and everyday voice-first experiences. Its design prioritizes latency and robustness in noisy, far-field environments, enabling continuous transcription with low delay on mobile hardware. The engine manages audio front-end processing—such as noise suppression and voice activity detection—before feeding audio into compact, accurate acoustic and language models....
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 10
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    MCU Media Server

    MCU Media Server

    SIP Video Multiconference Media Server with WebRTC support.

    REPOSITORY MOVED TO GITHUB!! https://github.com/medooze/media-server Video Multiconference Media Server with WebRTC support. Provide Multiconference and video broadcasting services to any SIP service. Supports VP8, H264, MP4V-ES, H263 and H263P, continuous presence, RTMP flash broadcasting, adhoc conferences, load balancing and administrative WEB interface. JSR309 driver implementation under development. .
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    The VAD tools are a set of scripts for working with Virtual Address Descriptor structures in dumps of Windows physical memory to provide detailed information about a process's memory allocations to a forensic investigator.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    A program for displaying the more esoteric NEXRAD products on a Macintosh running OS X 10.4 (Tiger) or later. TVS, VAD Winds, and Hail probability are some of the products this program will display. Others will be available in later releases.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo