Showing 207 open source projects for "artificial intelligence"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Stop vibe-debugging. Icon
    Stop vibe-debugging.

    Plug Claude into your app's actual errors.

    AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.
    Free 30 days.
  • 1
    kokoro-onnx

    kokoro-onnx

    TTS with kokoro and onnx runtime

    kokoro-onnx is a text-to-speech toolkit that wraps the Kokoro neural TTS model in an easy-to-use ONNX Runtime interface, so you can generate speech from Python with minimal setup. It focuses on running efficiently on commodity hardware, including macOS with Apple Silicon, while still delivering near real-time performance for many use cases. The project ships prebuilt model files and a simple example script, so you can go from installation to producing an audio.wav file in just a few steps....
    Downloads: 455 This Week
    Last Update:
    See Project
  • 2
    ChatTTS

    ChatTTS

    A generative speech model for daily dialogue

    ChatTTS is an open-source conversational text-to-speech model optimized for dialogue, developed by 2Noise. Trained on 100,000+ hours of English and Chinese conversation data, it excels at generating expressive prosody—pauses, interjections, laughter—for more natural-sounding speech synthesis in assistant and chatbot applications.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 3
    NVIDIA NeMo

    NVIDIA NeMo

    Toolkit for conversational AI

    NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Kitten TTS

    Kitten TTS

    State-of-the-art TTS model under 25MB

    KittenTTS is an open-source, ultra-lightweight, and high-quality text-to-speech model featuring just 15 million parameters and a binary size under 25 MB. It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.
    Downloads: 13 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Qwen3-TTS

    Qwen3-TTS

    Qwen3-TTS is an open-source series of TTS models

    Qwen3-TTS is an open-source text-to-speech (TTS) project built around the Qwen3 large language model family, focused on generating high-quality, natural-sounding speech from plain text input. It provides researchers and developers with tools to transform text into expressive, intelligible audio, supporting multiple languages and voice characteristics tuned for clarity and fluidity. The project includes pre-trained models and inference scripts that let users synthesize speech locally or...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 6
    IndexTTS2

    IndexTTS2

    Industrial-level controllable zero-shot text-to-speech system

    IndexTTS is a modern, zero-shot text-to-speech (TTS) system engineered to deliver high-quality, natural-sounding speech synthesis with few requirements and strong voice-cloning capabilities. It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output. The system supports zero-shot voice...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 7
    OmniVoice

    OmniVoice

    High-Quality Voice Cloning TTS for 600+ Languages

    The OmniVoice project is a cutting-edge multilingual text-to-speech system designed to generate high-quality speech across more than 600 languages. Built on a diffusion language model-style architecture, it combines scalability with strong performance, enabling both natural-sounding voice synthesis and efficient inference speeds. One of its most notable capabilities is zero-shot voice cloning, allowing users to replicate a speaker’s voice using only a short reference audio clip. In addition,...
    Downloads: 44 This Week
    Last Update:
    See Project
  • 8
    OpenVoice

    OpenVoice

    Instant voice cloning by MIT and MyShell. Audio foundation model

    OpenVoice is a versatile instant voice cloning system that can replicate a speaker’s tone color from just a short audio clip and then generate speech in multiple languages. It is designed not only to match the timbre of the reference voice, but also to give granular control over style parameters such as emotion, accent, rhythm, pauses, and intonation. The model supports cross-lingual and even zero-shot cross-lingual voice cloning, so a speaker recorded in one language can be made to speak...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 9
    VoxCPM2

    VoxCPM2

    Tokenizer-Free TTS for Multilingual Speech Generation

    VoxCPM2 is an advanced open-source text-to-speech system that redefines speech synthesis by eliminating traditional tokenization and instead generating continuous speech representations through a diffusion-based autoregressive architecture. Built on top of the MiniCPM model family, it enables highly natural, expressive, and context-aware speech generation that adapts tone, emotion, and pacing directly from input text. The system is trained on massive multilingual datasets, enabling support...
    Downloads: 36 This Week
    Last Update:
    See Project
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 10
    DramaBox

    DramaBox

    super expressive prompting model based on ltx2.3

    DramaBox is an expressive text-to-speech and voice cloning project from Resemble AI built on top of the LTX-2.3 audio branch. It generates speech from prompts that control not only the spoken text, but also speaker identity, emotion, delivery style, laughs, sighs, pauses, and transitions. Users can optionally provide a voice reference of around 10 seconds or more to clone the target timbre while still guiding performance through scene-style prompting. The project includes a warm inference...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Voice-Pro

    Voice-Pro

    Comprehensive Gradio WebUI for audio processing

    Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.
    Downloads: 42 This Week
    Last Update:
    See Project
  • 12
    Pocket TTS

    Pocket TTS

    A TTS that fits in your CPU (and pocket)

    Pocket TTS is a lightweight text-to-speech project designed to run efficiently on CPUs, targeting developers who want local speech generation without depending on GPUs or hosted web APIs. It is built to feel practical in everyday applications, where installation and usage should be as simple as adding a dependency and calling a function. The project focuses on keeping the runtime footprint manageable while still producing natural-sounding speech, which makes it attractive for offline tools,...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 13
    Miso TTS

    Miso TTS

    Miso TTS is an 8 billion, highly emotive text-to-speech model

    Miso TTS is an advanced 8-billion-parameter text-to-speech model developed by Miso Labs for generating highly expressive and natural-sounding conversational speech. Built on an RVQ Transformer architecture inspired by Sesame CSM, it combines a powerful Llama-based backbone with an autoregressive audio decoder to produce high-quality audio from text. The model supports both standard speech synthesis and voice-conditioned generation using optional audio prompts for voice cloning. Miso TTS...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 14
    Applio

    Applio

    A simple, high-quality voice conversion tool focused on ease of use

    Applio is a high-quality voice conversion toolkit designed to make modern RVC/VITS-based voice cloning accessible to non-experts. It focuses strongly on ease of use: installation scripts for Windows, Linux, and macOS set up dependencies and then launch a browser-based Gradio interface. Within that interface, users can train and run voice conversion models for tasks like singing conversion, speech-to-speech transformation, and voice cloning. The project is structured to be flexible through...
    Downloads: 93 This Week
    Last Update:
    See Project
  • 15
    Voicebox

    Voicebox

    The open-source voice synthesis studio powered by Qwen3-TTS

    Voicebox is a local-first voice synthesis studio that aims to bring professional, DAW-like voice generation workflows to a desktop app while keeping models and voice data entirely on your machine. It positions itself as an open-source alternative to cloud voice platforms by emphasizing privacy, offline use, and freedom from subscriptions or usage caps. The tool supports downloading voice models, cloning voices from short audio samples, and generating speech locally, then organizing the...
    Downloads: 90 This Week
    Last Update:
    See Project
  • 16
    Fish Speech

    Fish Speech

    SOTA Open Source TTS

    Fish Speech is a state-of-the-art open-source text-to-speech project that has evolved into the OpenAudio series of advanced TTS models. The repository hosts the code and tooling for training, fine-tuning, and serving high-quality TTS, while the current flagship models (OpenAudio-S1 and S1-mini) are distributed via Fish Audio’s playground and Hugging Face. The models are evaluated with Seed TTS metrics and achieve exceptionally low word and character error rates, indicating strong...
    Downloads: 37 This Week
    Last Update:
    See Project
  • 17
    Audiblez

    Audiblez

    Generate audiobooks from e-books

    Audiblez is a tool for generating high-quality .m4b audiobooks directly from .epub e-books using the Kokoro-82M neural text-to-speech model. It focuses on making audiobook creation easy and fast: from a single command, the tool splits an e-book into chapters, synthesizes audio for each section, and then merges the results into a structured audiobook with chapter-based WAV files and a final .m4b container. The Kokoro-82M model it uses is compact (82M parameters) yet natural sounding, trained...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 18
    WhisperLive

    WhisperLive

    A nearly-live implementation of OpenAI's Whisper

    WhisperLive is a “nearly live” implementation of OpenAI’s Whisper model focused on real-time transcription. It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 19
    ebook2audiobook

    ebook2audiobook

    Generate audiobooks from e-books, voice cloning & 1107+ languages

    ebook2audiobook is a tool to convert legally obtained eBooks (non-DRM) into fully narrated audiobooks, complete with chapters and metadata. It automates the pipeline: it reads the eBook file, splits it into appropriate segments (chapters, paragraphs), uses text-to-speech (TTS) models to synthesize audio, optionally applies voice cloning, and outputs a final audiobook — ideal for people who prefer listening over reading, or for accessibility purposes. The tool supports a wide array of...
    Downloads: 41 This Week
    Last Update:
    See Project
  • 20
    CosyVoice

    CosyVoice

    Multi-lingual large voice generation model, providing inference

    CosyVoice is a multilingual large voice generation model that offers a full-stack solution for training, inference, and deployment of high-quality TTS systems. The model supports multiple languages, including Chinese, English, Japanese, Korean, and a range of Chinese dialects such as Cantonese, Sichuanese, Shanghainese, Tianjinese, and Wuhanese. It is designed for zero-shot voice cloning and cross-lingual or mix-lingual scenarios, so a single reference voice can be used to synthesize speech...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 21
    gTTS

    gTTS

    Python library and CLI tool to interface with Google Translate

    gTTS (Google Text-to-Speech) is a Python library and command-line tool that wraps the speech functionality of Google Translate. It lets you send text to the Google Translate TTS endpoint and receive spoken audio back as MP3 data, either written to a file, a file-like object, or standard output. The library is designed to handle long texts, using a speech-specific sentence tokenizer that keeps intonation and punctuation natural while splitting requests into acceptable chunks. It supports...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 22
    Readest

    Readest

    Readest is a modern, feature-rich ebook reader

    Readest is a project meant to facilitate reading, studying, or consuming content by integrating reading tools with AI-powered assistance. Although the repository is not as widely documented or popular as some, the idea is that Readest supports features to help with reading comprehension — likely combining OCR / text retrieval, translation, note-taking, or summarization for reading materials (eBooks, articles, PDFs). The goal appears to be to let users feed in arbitrary reading material and...
    Downloads: 28 This Week
    Last Update:
    See Project
  • 23
    FireRedTTS-2

    FireRedTTS-2

    Long-form streaming TTS system for multi-speaker dialogue generation

    FireRedTTS2 is a next-generation open-source text-to-speech (TTS) system focused on long-form, streaming speech synthesis for multi-speaker dialogue, delivering stable natural speech with context-aware prosody and reliable speaker transitions that support real-time and conversational applications. It features a specialized streaming speech tokenizer and a dual-transformer architecture that enables low latency and high-quality synthesis, making it suitable for interactive systems like...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    MLX-Audio

    MLX-Audio

    A text-to-speech, speech-to-text and speech-to-speech library

    MLX-Audio is a speech library built on Apple’s MLX framework and optimized for Apple Silicon machines (M-series Macs). It focuses on text-to-speech and speech-to-speech workflows, with APIs and a command-line interface that make it easy to generate high-quality audio from text. Because it uses MLX and targets Apple Silicon, inference is fast and can take advantage of hardware acceleration and quantization for efficient on-device performance. The project provides a straightforward CLI...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 25
    TTS Voice Wizard

    TTS Voice Wizard

    Speech to Text to Speech, sends text as OSC messages

    Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System) Use TTS Voice Wizard's accessibility features to improve your VRChat experience (it works outside of VRChat too!) You can convert your Speech-to-Text and back to Speech through various Speech Recognition and Text-to-Speech methods. You can send what you say as OSC messages to VRChat to be displayed on your avatar using KillFrenzyAvatarText or VRChats...
    Downloads: 12 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Auth0 Logo