Showing 67 open source projects for "dvd-audio"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    MLX-Audio

    MLX-Audio

    A text-to-speech, speech-to-text and speech-to-speech library

    ...The project provides a straightforward CLI (mlx_audio.tts.generate) as well as a Python API for programmatic generation of audio, including parameters for voice choice, speed, language hints, output format, and sample rate. It includes examples such as audiobook generation to demonstrate long-form synthesis and joined audio segments. On top of that, MLX-Audio offers a modern web interface powered by FastAPI, with real-time waveform and 3D visualizations, file upload, and audio management.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    VoxCPM2

    VoxCPM2

    Tokenizer-Free TTS for Multilingual Speech Generation

    ...The system is trained on massive multilingual datasets, enabling support for dozens of languages and dialects while maintaining high fidelity and realism in generated audio. VoxCPM stands out for its ability to perform voice cloning with minimal input, capturing not only the speaker’s timbre but also nuanced features such as rhythm, accent, and emotional delivery. It also introduces voice design capabilities, allowing users to generate entirely new voices from natural language descriptions without requiring reference audio.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 3
    OpenVoice

    OpenVoice

    Instant voice cloning by MIT and MyShell. Audio foundation model

    OpenVoice is a versatile instant voice cloning system that can replicate a speaker’s tone color from just a short audio clip and then generate speech in multiple languages. It is designed not only to match the timbre of the reference voice, but also to give granular control over style parameters such as emotion, accent, rhythm, pauses, and intonation. The model supports cross-lingual and even zero-shot cross-lingual voice cloning, so a speaker recorded in one language can be made to speak naturally in others. ...
    Downloads: 82 This Week
    Last Update:
    See Project
  • 4
    Miso TTS

    Miso TTS

    Miso TTS is an 8 billion, highly emotive text-to-speech model

    Miso TTS is an advanced 8-billion-parameter text-to-speech model developed by Miso Labs for generating highly expressive and natural-sounding conversational speech. Built on an RVQ Transformer architecture inspired by Sesame CSM, it combines a powerful Llama-based backbone with an autoregressive audio decoder to produce high-quality audio from text. The model supports both standard speech synthesis and voice-conditioned generation using optional audio prompts for voice cloning. Miso TTS generates Mimi audio codes and can leverage conversation history to create more contextually aware and realistic dialogue. Designed for local deployment, it offers watermarking by default to help promote responsible use of generated audio. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 5
    ebook2audiobook

    ebook2audiobook

    Generate audiobooks from e-books, voice cloning & 1107+ languages

    ebook2audiobook is a tool to convert legally obtained eBooks (non-DRM) into fully narrated audiobooks, complete with chapters and metadata. It automates the pipeline: it reads the eBook file, splits it into appropriate segments (chapters, paragraphs), uses text-to-speech (TTS) models to synthesize audio, optionally applies voice cloning, and outputs a final audiobook — ideal for people who prefer listening over reading, or for accessibility purposes. The tool supports a wide array of underlying TTS backends (XTTSv2, Bark, VITS, Fairseq, Tacotron2, YourTTS and more), which gives flexibility depending on hardware availability, voice preference, and language. ...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 6
    WavTokenizer

    WavTokenizer

    SOTA discrete acoustic codec models with 40/75 tokens per second

    WavTokenizer is a state-of-the-art discrete acoustic codec designed specifically for audio language modeling, capable of compressing 24 kHz audio into just 40 or 75 tokens per second while preserving high perceptual quality. It is built to represent speech, music, and general audio with extremely low bitrate, making it ideal as a front-end for large audio language models like GPT-4o and similar architectures. The model uses a single-quantizer design together with temporal compression to achieve extreme compression without sacrificing reconstruction fidelity. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    WhisperLive

    WhisperLive

    A nearly-live implementation of OpenAI's Whisper

    WhisperLive is a “nearly live” implementation of OpenAI’s Whisper model focused on real-time transcription. It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and network streams such as RTSP and HLS, making it flexible for live events, monitoring, or accessibility workflows. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 8
    Voice-Pro

    Voice-Pro

    Comprehensive Gradio WebUI for audio processing

    Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.
    Downloads: 41 This Week
    Last Update:
    See Project
  • 9
    Fish Speech

    Fish Speech

    SOTA Open Source TTS

    ...The models are evaluated with Seed TTS metrics and achieve exceptionally low word and character error rates, indicating strong intelligibility and alignment between text and audio. Fish Speech emphasizes expressive and controllable voices: it supports a long list of emotion tags, tone markers, and special audio effect markers that can be embedded in the text to drive prosody and vocal style, from basic emotions to nuanced states like sarcastic, conciliative, or hysterical. The system is multilingual and cross-lingual, handling multiple languages in a single input without explicit phoneme markup, and is trained on large-scale datasets.
    Downloads: 28 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 10
    abogen

    abogen

    Generate audiobooks from EPUBs, PDFs and text with captions

    ...In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on the go, or for users who prefer audio over reading. The repository supports handling common ebook formats and generating outputs that combine audio plus caption metadata. By automating text-to-speech for arbitrary documents, abogen reduces the friction of producing audiobooks and could be integrated into larger workflows (e.g., batch converting a library of texts).
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    OpenAI-Compatible Edge-TTS API

    OpenAI-Compatible Edge-TTS API

    Free, high-quality text-to-speech API endpoint to replace OpenAI

    OpenAI-Compatible Edge-TTS API is a local, OpenAI-compatible text-to-speech API that uses edge-tts—Microsoft Edge’s online TTS service—as the backend. The project emulates the /v1/audio/speech endpoint used by OpenAI, so any client that can talk to the OpenAI TTS API can be redirected to this service with minimal changes. It exposes parameters for input text, voice selection, audio format, and playback speed, mirroring the OpenAI interface while mapping popular OpenAI voice names to equivalent Edge voices. Because it relies on Edge’s TTS, the audio generation itself is free, and the project essentially acts as a smart proxy that handles formatting and streaming. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    MARS5

    MARS5

    MARS5 speech model (TTS) from CAMB.AI

    ...The model is built to handle prosodically challenging content such as sports commentary, anime dialogue, and other high-energy or highly varied speech patterns with realistic rhythm and intonation. To control speaker identity, MARS5 uses a short reference audio clip, typically between 2 and 12 seconds, from which it learns the voice characteristics. It supports two main inference modes: shallow clone, which is faster and only needs the reference audio, and deep clone, which additionally uses the transcript of the reference audio to increase similarity and naturalness at the cost of more computation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Audiblez

    Audiblez

    Generate audiobooks from e-books

    Audiblez is a tool for generating high-quality .m4b audiobooks directly from .epub e-books using the Kokoro-82M neural text-to-speech model. It focuses on making audiobook creation easy and fast: from a single command, the tool splits an e-book into chapters, synthesizes audio for each section, and then merges the results into a structured audiobook with chapter-based WAV files and a final .m4b container. The Kokoro-82M model it uses is compact (82M parameters) yet natural sounding, trained on under 100 hours of audio, and supports multiple languages, including English (US/UK), Spanish, French, Hindi, Italian, Japanese, Brazilian Portuguese, and Mandarin Chinese. ...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 14
    ElevenLabs Python

    ElevenLabs Python

    The official Python SDK for the ElevenLabs API

    ...The SDK is designed for quick setup: after installing the package and setting an API key, you can generate speech in multiple languages and play or process the resulting audio bytes. It includes helper utilities (like play and stream) so you can either play audio locally or integrate it into your own playback or networking pipeline.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    OmniVoice

    OmniVoice

    High-Quality Voice Cloning TTS for 600+ Languages

    ...Built on a diffusion language model-style architecture, it combines scalability with strong performance, enabling both natural-sounding voice synthesis and efficient inference speeds. One of its most notable capabilities is zero-shot voice cloning, allowing users to replicate a speaker’s voice using only a short reference audio clip. In addition, it supports voice design through configurable attributes such as gender, accent, pitch, and speaking style, giving users fine-grained control over generated speech. The system also includes advanced features like non-verbal expression tags and pronunciation overrides, enabling expressive and precise output. With support for both API-based and command-line usage, it is designed for research, production, and experimentation alike.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 16
    OuteTTS

    OuteTTS

    Interface for OuteTTS models

    ...It also includes a notion of speaker profiles: you can create a speaker from a short audio sample, save it as JSON, and reuse it for consistent voice identity across generations and sessions. For best quality, the model is designed to work with a reference speaker clip and will inherit emotion, style, and accent from that reference.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    pyttsx3

    pyttsx3

    Offline Text To Speech synthesis for python

    ...The library exposes a simple but flexible API for controlling voice selection, speaking rate, volume, and other synthesis parameters from Python code. It supports both a high-level speak convenience function and a lower-level engine object with event hooks, queuing, and saving output to audio files. The repository includes examples and documentation that show how to adjust properties dynamically, persist synthesized output, and integrate pyttsx3 into GUIs or background services.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 18
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    ...It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common formats like MP3 or WAV. It also supports generating subtitle files (such as SRT or VTT) alongside the speech, which is handy for video narration, e-learning, or accessibility workflows. From the CLI you can adjust parameters such as speaking rate, volume, and pitch, giving you some control over prosody without diving into SSML. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 19
    Auto Synced & Translated Dubs

    Auto Synced & Translated Dubs

    Automatically translates the text of a video based on a subtitle file

    ...It assumes you have a human-made SRT (or similar) subtitle file; the script then uses translation services such as Google Cloud or DeepL to generate translated subtitle tracks in one or more target languages. Using the timestamps of each subtitle line, it computes the required duration of each spoken segment and synthesizes audio via neural TTS services, producing one audio clip per subtitle entry. The tool then time-stretches or compresses each TTS clip to match the original speech duration exactly, which preserves lip-sync and rhythm as closely as possible without manual editing. Finally, it combines all the clips into a single dubbed audio track that can be muxed with the original video, along with new translated subtitle files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    EPUB to Audiobook Converter

    EPUB to Audiobook Converter

    EPUB to audiobook converter, optimized for Audiobookshelf

    EPUB to Audiobook Converter is a tool designed to convert EPUB ebooks into chaptered audiobooks, optimized specifically for Audiobookshelf servers. It reads each chapter from an EPUB file, generates audio using a chosen text-to-speech backend, and outputs separate MP3 files with chapter titles preserved as metadata to make navigation easier. The project supports multiple TTS providers, including Microsoft Azure TTS, EdgeTTS, OpenAI TTS, local Piper, and Kokoro via an OpenAI-compatible endpoint, allowing users to choose between cloud and self-hosted voices. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 21
    VoxCPM

    VoxCPM

    TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

    ...Trained on a large 1.8-million-hour bilingual corpus, VoxCPM can infer appropriate speaking style from context, dynamically adjusting intonation, rhythm, and emotional tone. It supports zero-shot voice cloning from a short reference audio clip, capturing timbre, accent, and pacing to closely mimic a target speaker without per-speaker fine-tuning.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 22
    ChatTTS_colab

    ChatTTS_colab

    One-click deployment (including offline integration package)

    ...It provides an integrated offline bundle and scripts for Windows and macOS so users can run ChatTTS locally without wrestling with complex environment setup. The repository includes Colab notebooks that launch a Gradio-based web UI and expose streaming TTS, making it possible to listen to generated audio as it is produced. A distinctive feature is the “voice gacha” system, which batch-generates many distinct voice timbres and allows users to save the ones they like into a curated voice library. It has first-class support for long-form audio generation, making it suitable for audiobooks, podcasts, or long narration tasks. The project also implements multi-speaker or role-based reading, letting users assign different voices to different characters in a script and even use a large language model to generate that script in one step.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Dia

    Dia

    A TTS model capable of generating ultra-realistic dialogue

    Dia is a neural text-to-speech model designed specifically for generating ultra-realistic dialogue in a single pass. Instead of focusing on isolated sentences or flat narration, it is optimized for conversational audio, complete with natural turn-taking, prosody, and pacing. The model can be conditioned on a reference audio sample, allowing you to control emotion, tone, and other stylistic aspects of the speech. It can also produce nonverbal vocalizations like laughter, coughs, clearing the throat, and similar sounds, which are crucial for making synthetic conversations feel human. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Orpheus TTS

    Orpheus TTS

    Towards Human-Sounding Speech

    ...Inference is provided through a Python package that uses vLLM under the hood for high-throughput, low-latency generation, including streaming examples that show how to generate audio chunks in real time. The maintainers provide Colab notebooks, a standardized prompting format, and one-click deployment via Baseten for production-grade, FP8/FP16 optimized inference with ~200 ms streaming latency.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    WhisperSpeech

    WhisperSpeech

    An Open Source text-to-speech system built by inverting Whisper

    WhisperSpeech is an open-source text-to-speech system created by “inverting” OpenAI’s Whisper, reusing its strengths as a semantic audio model to generate speech instead of only transcribing it. The project aims to be for speech what Stable Diffusion is for images: powerful, hackable, and safe for commercial use, with code under Apache-2.0/MIT and models trained only on properly licensed data. Its architecture follows a token-based, multi-stage pipeline inspired by AudioLM and SPEAR-TTS: Whisper is used to produce semantic tokens, EnCodec compresses the waveform into acoustic tokens, and Vocos reconstructs high-fidelity audio from those tokens. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
Auth0 Logo