Showing 204 open source projects for "voice"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    VoxCPM

    VoxCPM

    TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

    ...Trained on a large 1.8-million-hour bilingual corpus, VoxCPM can infer appropriate speaking style from context, dynamically adjusting intonation, rhythm, and emotional tone. It supports zero-shot voice cloning from a short reference audio clip, capturing timbre, accent, and pacing to closely mimic a target speaker without per-speaker fine-tuning.
    Downloads: 41 This Week
    Last Update:
    See Project
  • 2
    Rasa

    Rasa

    Open source machine learning framework to automate text conversations

    Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual assistants on Facebook Messenger, Slack, Google Hangouts, Webex Teams, Microsoft Bot Framework, Rocket.Chat, Mattermost, Telegram, and Twilio or on your own custom conversational channels. Rasa helps you build contextual assistants capable of having layered conversations with lots of back-and-forths.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 3
    VibeVoice ComfyUI

    VibeVoice ComfyUI

    ComfyUI integration for Microsoft's VibeVoice text-to-speech model

    ...The project also introduces first-class LoRA support, making it possible to fine-tune and load custom LoRA adapters that modify voice identity or style while keeping the base VibeVoice model intact.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 4
    OmniVoice

    OmniVoice

    High-Quality Voice Cloning TTS for 600+ Languages

    The OmniVoice project is a cutting-edge multilingual text-to-speech system designed to generate high-quality speech across more than 600 languages. Built on a diffusion language model-style architecture, it combines scalability with strong performance, enabling both natural-sounding voice synthesis and efficient inference speeds. One of its most notable capabilities is zero-shot voice cloning, allowing users to replicate a speaker’s voice using only a short reference audio clip. In addition, it supports voice design through configurable attributes such as gender, accent, pitch, and speaking style, giving users fine-grained control over generated speech. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    PersonaPlex

    PersonaPlex

    PersonaPlex code

    ...This architectural approach eliminates awkward pauses and makes conversations feel much more human-like, with natural behaviors such as overlapping speech, interruptions, and fluent turn-taking, traits that traditional AI assistants typically lack. PersonaPlex also supports persona and voice control, allowing developers to define the role and speaking style of the agent using text prompts and voice conditioning, making it suitable for applications like customized voice assistants, interactive character agents, or domain-specific conversational tools. Internally, it processes continuous audio streams in a hybrid input format so that speech understanding and generation occur jointly.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    VoxCPM2

    VoxCPM2

    Tokenizer-Free TTS for Multilingual Speech Generation

    ...The system is trained on massive multilingual datasets, enabling support for dozens of languages and dialects while maintaining high fidelity and realism in generated audio. VoxCPM stands out for its ability to perform voice cloning with minimal input, capturing not only the speaker’s timbre but also nuanced features such as rhythm, accent, and emotional delivery. It also introduces voice design capabilities, allowing users to generate entirely new voices from natural language descriptions without requiring reference audio.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    Ultravox

    Ultravox

    Fast multimodal LLM for real-time voice interaction and AI apps

    Ultravox is an open source multimodal large language model designed specifically for real-time voice-based interactions. It is built to process both text and spoken audio directly, eliminating the need for a separate speech recognition stage and enabling more seamless conversational experiences. Ultravox works by combining text prompts with encoded audio inputs, allowing it to understand spoken language alongside written instructions in a unified pipeline.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 8
    Sopro TTS

    Sopro TTS

    A lightweight text-to-speech model with zero-shot voice cloning

    Sopro TTS is an open-source text-to-speech (TTS) project that implements a lightweight model capable of producing speech from text with zero-shot voice cloning, meaning it can mimic a speaker’s voice from only a few seconds of reference audio. Built with a 169 million-parameter architecture that uses dilated convolutions and cross-attention layers instead of large Transformer stacks, it achieves relatively fast real-time performance even on CPUs (about a 0.25 real-time factor measured on an M3 base). ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    OpenAI-Compatible Edge-TTS API

    OpenAI-Compatible Edge-TTS API

    Free, high-quality text-to-speech API endpoint to replace OpenAI

    ...A Docker image is provided for one-command deployment, and environment variables can be used to configure default voice, language, response format, authentication, and logging options.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 10
    Spark TTS

    Spark TTS

    Spark-TTS Inference Code

    ...It uses an efficient single-stream architecture where speech tokens are directly reconstructed from the predictions of an LLM, removing the need for external acoustic models or complex vocoders and making the generation pipeline cleaner and faster. The project supports zero-shot voice cloning, meaning it can imitate a new speaker’s voice without dedicated training for that specific voice, and works across languages, including English and Chinese, even in cross-lingual code-switching scenarios. Spark-TTS allows users to control speech characteristics like gender, pitch, and speaking rate to customize synthesized output and support virtual speaker creation.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    GLM-TTS

    GLM-TTS

    Controllable & emotion-expressive zero-shot TTS

    GLM-TTS is an advanced text-to-speech synthesis system built on large language model technologies that focuses on producing high-quality, expressive, and controllable spoken output, including features like emotion modulation and zero-shot voice cloning. It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice character even for unseen speakers. The system introduces a multi-reward reinforcement learning framework that jointly optimizes for voice similarity, emotional expressiveness, pronunciation, and intelligibility, yielding output that can rival commercial options in naturalness and expressiveness. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    Kitten TTS

    Kitten TTS

    State-of-the-art TTS model under 25MB

    ...It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 13
    Fun Audio Chat

    Fun Audio Chat

    Large Audio Language Model built for natural interactions

    Fun Audio Chat is an interactive voice-first conversational AI platform designed to let users engage in natural spoken dialogue with large language models in real time, turning speech into context-aware responses while maintaining a smooth back-and-forth experience. It combines speech recognition, audio processing, and AI generation so users can speak simply and receive spoken replies, enabling applications such as virtual assistants, voice bots, and hands-free chat interfaces. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Luna AI

    Luna AI

    Virtual AI anchor that combines state-of-the-art technology

    ...The project supports multiple rendering backends for the avatar, such as Live2D, Unreal Engine (UE), and “xuniren,” and can output to streaming platforms like Bilibili, Douyin, Kuaishou, WeChat Channels, Pinduoduo, Douyu, YouTube, Twitch, and TikTok. For voice, it integrates with numerous TTS engines (Edge-TTS, VITS-Fast, ElevenLabs, VALL-E-X, OpenVoice, GPT-SoVITS, Azure TTS, fish-speech, ChatTTS, CosyVoice, F5-TTS, MultiTTS, MeloTTS, and others), and can optionally pass the output through voice conversion systems like so-vits-svc or DDSP-SVC to change timbre.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 15
    BasedHardware

    BasedHardware

    Open source AI wearable platform for recording and summarizing speech

    Omi is an open source AI wearable platform designed to capture spoken conversations and convert them into useful digital information such as transcripts, summaries, and action items. It combines hardware, firmware, mobile applications, and backend services to create a complete ecosystem for voice-driven interaction. Users can connect the wearable device to a mobile phone and automatically record and transcribe meetings, conversations, and voice memos. Omi includes firmware for wearable hardware, a Flutter-based mobile companion application, backend services built with Python and FastAPI, and various SDKs for developers. These components work together to process audio, perform speech recognition, and integrate AI features such as summaries and automated actions. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 16
    MCP Server Home Assistant

    MCP Server Home Assistant

    A Model Context Protocol Server for Home Assistant

    The Home Assistant MCP Server is an MCP server that integrates with Home Assistant, enabling AI assistants to interact with smart home devices and systems. It exposes Home Assistant voice intents through the Model Context Protocol for enhanced home control. ​
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    FFsubsync

    FFsubsync

    Automagically synchronize subtitles with video

    ...In this case, you can use the correctly synchronized srt file directly as a reference for synchronization, instead of using the video as the reference. ffsubsync uses the file extension to decide whether to perform voice activity detection on the audio or to directly extract speech from an srt file. ffsubsync usually finishes in 20 to 30 seconds, depending on the length of the video.
    Downloads: 51 This Week
    Last Update:
    See Project
  • 18
    pyVideoTrans

    pyVideoTrans

    Translate the video from one language to another and embed dubbing

    pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the translated subtitles, and then merges that speech back into the video, creating a fully localized media file. ...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 19
    ChatTTS_colab

    ChatTTS_colab

    One-click deployment (including offline integration package)

    ...The repository includes Colab notebooks that launch a Gradio-based web UI and expose streaming TTS, making it possible to listen to generated audio as it is produced. A distinctive feature is the “voice gacha” system, which batch-generates many distinct voice timbres and allows users to save the ones they like into a curated voice library. It has first-class support for long-form audio generation, making it suitable for audiobooks, podcasts, or long narration tasks. The project also implements multi-speaker or role-based reading, letting users assign different voices to different characters in a script and even use a large language model to generate that script in one step.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    AgentScope

    AgentScope

    Build and run agents you can see, understand and trust

    ...It provides essential abstractions that evolve with advancing LLM capabilities, emphasizing reasoning, tool use, and flexible orchestration rather than rigid prompt constraints. With built-in support for ReAct agents, memory, planning, human-in-the-loop control, and real-time voice interaction, developers can create powerful agents in minutes. AgentScope integrates seamlessly with tools, long-term memory systems, MCP, A2A (Agent-to-Agent) protocols, and observability frameworks. It also supports reinforcement learning workflows for tuning agents and improving performance across complex tasks. Deployable locally, serverless in the cloud, or on Kubernetes with OpenTelemetry support, AgentScope is built for both experimentation and production environments.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 21
    VCClient

    VCClient

    Software that uses AI to perform real-time voice conversion

    VCClient is a real-time voice conversion system that uses machine learning models to transform a speaker’s voice into another voice with minimal latency. It is designed for live applications such as streaming, gaming, and virtual communication, where immediate feedback is essential. The system supports multiple voice conversion models, including RVC and other neural network-based approaches, allowing users to switch between different voices or customize their output. ...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 22
    SafeClaw

    SafeClaw

    Chat with it via text and voice

    ...It emphasizes privacy and predictability by using traditional programming, rule-based intent parsing, and established machine learning tools rather than large language models, meaning there are no per-token API costs and deterministic behavior. The assistant offers features such as voice control using fully local speech-to-text (Whisper) and text-to-speech (Piper) capabilities, news aggregation with extractive summarization, and smart home or Bluetooth device control. SafeClaw supports multiple channels, including CLI and Telegram, and avoids prompt injection risk because it doesn’t rely on LLMs for core operations.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    Step-Audio

    Step-Audio

    Open-source framework for intelligent speech interaction

    ...The design moves beyond traditional separate-component pipelines (ASR → text model → TTS), instead offering a multimodal model that ingests speech or audio and produces speech accordingly, enabling natural dialogue, voice cloning, and expressive speech synthesis. Through its architecture, Step-Audio supports multilingual interaction, dialects, emotional tones (joy, sadness, etc.), and even more creative speech styles (like rap or singing), while allowing dynamic control over speech characteristics. It also provides a “generative data engine,” which can produce synthetic speech data (cloning voices, varying style) to support TTS training.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    WhisperX

    WhisperX

    Automatic Speech Recognition with Word-level Timestamps

    WhisperX is an advanced speech recognition system built on top of OpenAI’s Whisper model, designed to improve transcription accuracy and timing precision for long-form audio. It addresses key limitations of standard Whisper implementations by introducing voice activity detection and forced alignment techniques to produce word-level timestamps. The system enables batched inference, significantly increasing transcription speed while maintaining high accuracy. It is particularly effective for long recordings, where traditional approaches may suffer from drift, repetition, or misalignment. whisperx also supports speaker diarization, allowing identification of different speakers within a conversation. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 25
    OpenaiBot

    OpenaiBot

    Refractoring ChatBot+LLM, Gpt-3.5-turbo, ChatGPT Bot/Voice Assistant

    If you don't have the instant messaging platform you need or you want to develop a new application, you are welcome to contribute to this repository. You can develop a new Controller by using Event.py. Compatibility with multiple LLMs and integration with GPT and third-party systems is handled by our llm-kira project on GitHub. It can accurately limit billing, with limits and ID binding. Supports asynchronous operations and can handle multiple requests simultaneously. Allows for private and...
    Downloads: 4 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB