Showing 1302 open source projects for "speech"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    Fish Speech

    Fish Speech

    SOTA Open Source TTS

    Fish Speech is a state-of-the-art open-source text-to-speech project that has evolved into the OpenAudio series of advanced TTS models. The repository hosts the code and tooling for training, fine-tuning, and serving high-quality TTS, while the current flagship models (OpenAudio-S1 and S1-mini) are distributed via Fish Audio’s playground and Hugging Face.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 2
    Speech Note

    Speech Note

    Speech Note Linux app. Note taking, reading and translating

    Speech Note is a Linux desktop and Sailfish OS application for taking, reading, and translating notes with integrated offline speech technology. It combines speech-to-text, text-to-speech, and machine translation in a single interface, allowing users to dictate notes, listen back to them, and translate them without ever sending data to the cloud.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 3
    Hugging Face - Speech To Speech

    Hugging Face - Speech To Speech

    Open speech-to-speech models and pipelines by Hugging Face toolkit AI

    This project from Hugging Face focuses on enabling direct speech-to-speech processing using modern machine learning models. It provides tools and reference implementations that allow audio input to be transformed into audio output without requiring an intermediate text representation. Hugging Face - Speech To Speech builds on recent advances in speech modeling, combining components such as speech recognition, translation, and synthesis into unified pipelines. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Vosk Speech Recognition Toolkit

    Vosk Speech Recognition Toolkit

    Offline speech recognition API for Android, iOS, Raspberry Pi

    Speech recognition bindings are implemented for various programming languages like Python, Java, Node.JS, C#, C++, Rust, Go and others. Vosk supplies speech recognition for chatbots, smart home appliances, and virtual assistants. It can also create subtitles for movies, and transcription for lectures and interviews. Vosk scales from small devices like Raspberry Pi or Android smartphones to big clusters.
    Downloads: 83 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    Speech-AI-Forge

    Speech-AI-Forge

    Speech-AI-Forge is a project developed around TTS generation model

    Speech-AI-Forge is a full-stack project built around modern text-to-speech generation models, providing both an API server and a Gradio-based web UI for interactive use. At its core, it acts as a hub that wires together multiple speech-related capabilities, including TTS, speech-to-text and LLM-based control flows, behind a consistent interface.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    sherpa-onnx

    sherpa-onnx

    Speech-to-text, text-to-speech, and speaker recognition

    Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without an Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter.
    Downloads: 392 This Week
    Last Update:
    See Project
  • 7
    Whisper

    Whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

    OpenAI Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. ...
    Downloads: 62 This Week
    Last Update:
    See Project
  • 8
    SpeechRecognition

    SpeechRecognition

    Speech recognition module for Python

    Library for performing speech recognition, with support for several engines and APIs, online and offline. Recognize speech input from the microphone, transcribe an audio file, save audio data to an audio file. Show extended recognition results, calibrate the recognizer energy threshold for ambient noise levels (see recognizer_instance.energy_threshold for details).
    Downloads: 22 This Week
    Last Update:
    See Project
  • 9
    Handy STT

    Handy STT

    A free, open source, and extensible speech-to-text application

    ...Its backend leverages OpenAI’s Whisper models for GPU-accelerated speech recognition and Parakeet V3 for efficient CPU-only transcription with automatic language detection. To further refine accuracy and responsiveness, Handy integrates Silero’s Voice Activity Detection (VAD) for silence filtering, ensuring only speech segments are processed.
    Downloads: 89 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    TTS Voice Wizard

    TTS Voice Wizard

    Speech to Text to Speech, sends text as OSC messages

    Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System) Use TTS Voice Wizard's accessibility features to improve your VRChat experience (it works outside of VRChat too!) You can convert your Speech-to-Text and back to Speech through various Speech Recognition and Text-to-Speech methods.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 11
    OpenAI.fm

    OpenAI.fm

    Code for openai.fm, a demo for the OpenAI Speech API

    OpenAI.fm is an official interactive demo application built to showcase the OpenAI Speech API and its advanced text-to-speech capabilities, providing developers and creators with a hands-on web interface to convert text into high-quality, customizable audio using state-of-the-art TTS models. Developed using Next.js and the OpenAI Speech API, this demo illustrates how the latest neural voice models can produce natural, expressive speech with adjustable styles and voices, highlighting features like emotional range, tone, and real-time playback. ...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 12
    MLX-Audio

    MLX-Audio

    A text-to-speech, speech-to-text and speech-to-speech library

    MLX-Audio is a speech library built on Apple’s MLX framework and optimized for Apple Silicon machines (M-series Macs). It focuses on text-to-speech and speech-to-speech workflows, with APIs and a command-line interface that make it easy to generate high-quality audio from text. Because it uses MLX and targets Apple Silicon, inference is fast and can take advantage of hardware acceleration and quantization for efficient on-device performance.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 13
    RHVoice

    RHVoice

    Free open source speech synthesizer for Russian and other languages

    RHVoice is a free and open-source multilingual speech synthesizer. Its developers hope to give more visually impaired people the ability to use a good free synthesis voice reading in their native language with their screen reader. We are especially interested in supporting those languages for which there are currently no good voices that could be used with a screen reader. The creator of RHVoice, Olga Yakovleva, is blind herself.
    Downloads: 39 This Week
    Last Update:
    See Project
  • 14
    SenseVoice

    SenseVoice

    Multilingual speech recognition and audio understanding model

    SenseVoice is a speech foundation model designed to perform multiple voice understanding tasks from audio input. It provides capabilities such as automatic speech recognition, spoken language identification, speech emotion recognition, and audio event detection within a single system. SenseVoice is trained on more than 400,000 hours of speech data and supports over 50 languages for multilingual recognition tasks.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 15
    PersonaPlex

    PersonaPlex

    PersonaPlex code

    PersonaPlex is an open-source real-time conversational speech AI model that goes beyond traditional text chat by providing full-duplex speech-to-speech interaction, meaning it can listen and talk at the same time instead of waiting for you to finish speaking before responding. This architectural approach eliminates awkward pauses and makes conversations feel much more human-like, with natural behaviors such as overlapping speech, interruptions, and fluent turn-taking, traits that traditional AI assistants typically lack. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    abogen

    abogen

    Generate audiobooks from EPUBs, PDFs and text with captions

    abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 17
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    Qwen3-TTS

    Qwen3-TTS

    Qwen3-TTS is an open-source series of TTS models

    Qwen3-TTS is an open-source text-to-speech (TTS) project built around the Qwen3 large language model family, focused on generating high-quality, natural-sounding speech from plain text input. It provides researchers and developers with tools to transform text into expressive, intelligible audio, supporting multiple languages and voice characteristics tuned for clarity and fluidity.
    Downloads: 33 This Week
    Last Update:
    See Project
  • 19
    Moonshine Voice

    Moonshine Voice

    Fast and accurate automatic speech recognition (ASR) for edge devices

    moonshine is an open-source automatic speech recognition toolkit optimized for fast and accurate transcription on edge devices and local environments. The project is designed to enable real-time voice applications such as live transcription, voice commands, and embedded speech interfaces without requiring heavy cloud infrastructure. Its architecture emphasizes low latency and flexible input handling, allowing audio streams of varying durations rather than relying on fixed processing windows. ...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 20
    pyVideoTrans

    pyVideoTrans

    Translate the video from one language to another and embed dubbing

    pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the translated subtitles, and then merges that speech back into the video, creating a fully localized media file. ...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 21
    Polyglot

    Polyglot

    Cross-platform AI language practice app

    Polyglot is a cross platform AI language practice application that runs as a desktop app and also offers a web version. It is built around conversational large language models and Azure based text to speech services, turning them into an interactive environment for speaking practice in multiple languages. Users can define custom AI personas, choose languages, and configure their own OpenAI and Azure keys so they retain control over which backends they use. The app supports speech recognition with quick keyboard shortcuts, allowing learners to hold down a key to speak and release it to submit for recognition and response. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 22
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    edge-tts is a Python module and command-line tool that gives you direct access to Microsoft Edge’s online text-to-speech service without needing the Edge browser, Windows, or any API key. It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common formats like MP3 or WAV. ...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 23
    ChatTTS

    ChatTTS

    A generative speech model for daily dialogue

    ChatTTS is an open-source conversational text-to-speech model optimized for dialogue, developed by 2Noise. Trained on 100,000+ hours of English and Chinese conversation data, it excels at generating expressive prosody—pauses, interjections, laughter—for more natural-sounding speech synthesis in assistant and chatbot applications.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    StreamSpeech

    StreamSpeech

    StreamSpeech is a seamless model for offline speech recognition

    StreamSpeech is an “all-in-one” speech model designed to perform offline and simultaneous speech recognition, speech translation, and speech synthesis within a single unified architecture. Developed as part of an ACL 2024 paper, it targets streaming and low-latency scenarios where intermediate results and final translations or synthetic speech must be produced continuously as audio is being received.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    pyttsx3

    pyttsx3

    Offline Text To Speech synthesis for python

    pyttsx3 is an offline text-to-speech library for Python that wraps native speech engines instead of calling cloud APIs. It is designed to work entirely without an internet connection, making it suitable for local automation, kiosks, accessibility tools, and embedded applications. On Windows it uses SAPI5, on Linux it typically uses eSpeak or eSpeak-NG, and on macOS it can use NSSpeechSynthesizer or AVSpeechSynthesizer, giving it broad cross-platform compatibility.
    Downloads: 26 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB