Showing 230 open source projects for "dvd-audio"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 1
    Text Generation Web UI

    Text Generation Web UI

    Oobabooga - The definitive Web UI for local AI, with powerful features

    ...Instruct mode compatible with Alpaca and Open Assistant formats. Nice HTML output for GPT-4chan. Markdown output for GALACTICA, including LaTeX rendering. Custom chat characters. Advanced chat features (send images, get audio responses with TTS). Very efficient text streaming. Parameter presets, 8-bit mode. Layers splitting across GPU(s), CPU, and disk. CPU mode, FlexGen, DeepSpeed ZeRO-3, API with streaming and without streaming. LLaMA model, including 4-bit GPTQ. RWKV model, LoRA (loading and training), Softprompts, and extensions.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 2
    Open Notebook

    Open Notebook

    An Open Source implementation of Notebook LM with more flexibility

    ...The platform supports 16+ AI providers—including OpenAI, Anthropic, Ollama, Google, and LM Studio—allowing flexible model choice and cost optimization. Open Notebook enables users to organize and analyze multi-modal content such as PDFs, videos, audio files, web pages, and Office documents. It combines full-text and vector search with context-aware AI chat to deliver insights grounded in your own research materials. With advanced features like multi-speaker podcast generation, customizable content transformations, and a comprehensive REST API, Open Notebook provides a powerful and extensible research environment.
    Downloads: 32 This Week
    Last Update:
    See Project
  • 3
    JoyAI-Echo

    JoyAI-Echo

    Pushing the Frontier of Long Audio-Visual Generation

    ...JoyAI-Echo focuses on text-to-video and multi-shot long-video generation, while image-to-video support is not part of the current release scope. It is most useful for research and experimental video workflows that need synchronized audio, coherent characters, and editable story-level generation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Audiblez

    Audiblez

    Generate audiobooks from e-books

    Audiblez is a tool for generating high-quality .m4b audiobooks directly from .epub e-books using the Kokoro-82M neural text-to-speech model. It focuses on making audiobook creation easy and fast: from a single command, the tool splits an e-book into chapters, synthesizes audio for each section, and then merges the results into a structured audiobook with chapter-based WAV files and a final .m4b container. The Kokoro-82M model it uses is compact (82M parameters) yet natural sounding, trained on under 100 hours of audio, and supports multiple languages, including English (US/UK), Spanish, French, Hindi, Italian, Japanese, Brazilian Portuguese, and Mandarin Chinese. ...
    Downloads: 25 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    MOSS-TTS Family

    MOSS-TTS Family

    MOSS‑TTS Family open‑source speech and sound generation model

    ...The broader family also includes dialogue generation, prompt-based voice creation, streaming voice-agent support, and a unified audio tokenizer. It is especially useful for developers building dubbing, podcasts, audiobooks, voice assistants, character voices, and creative audio tools.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    ElevenLabs Python

    ElevenLabs Python

    The official Python SDK for the ElevenLabs API

    ...The SDK is designed for quick setup: after installing the package and setting an API key, you can generate speech in multiple languages and play or process the resulting audio bytes. It includes helper utilities (like play and stream) so you can either play audio locally or integrate it into your own playback or networking pipeline.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Generative AI

    Generative AI

    Sample code and notebooks for Generative AI on Google Cloud

    Generative AI is a comprehensive collection of code samples, notebooks, and demo applications designed to help developers build generative-AI workflows on the Vertex AI platform. It spans multiple modalities—text, image, audio, search (RAG/grounding) and more—showing how to integrate foundation models like the Gemini family into cloud projects. The README emphasises getting started with prompts, datasets, environments and sample apps, making it ideal for both experimentation and production-ready usage. The repository architecture is organised into folders like gemini/, search/, vision/, audio/, and rag-grounding/, which helps developers locate use cases by modality. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    video-use

    video-use

    Edit videos with Claude Code

    ...Designed to work with Claude Code, it automates the entire editing process—from cutting clips to rendering the final output—without requiring manual timelines or complex software interfaces. The system intelligently analyzes audio transcripts and visual cues to make precise, context-aware editing decisions. It supports a wide range of content types, including interviews, tutorials, montages, and talking-head videos. By combining structured text representations with on-demand visual previews, it minimizes processing overhead while maintaining high-quality results. ...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 9
    Unsloth Studio

    Unsloth Studio

    Unified web UI for training and running open models locally

    Unsloth Studio is a web-based interface for running and training AI models locally with a unified and user-friendly experience. It allows users to work with a wide range of models for text, audio, vision, embeddings, and more without relying heavily on cloud infrastructure. Built on top of the Unsloth framework, it focuses on high-performance training with reduced VRAM usage and faster speeds compared to traditional methods. The platform supports fine-tuning, pretraining, and reinforcement learning workflows, making it suitable for both experimentation and production use. ...
    Downloads: 18 This Week
    Last Update:
    See Project
  • Your monitoring isn't a stack. It's a pile. Fix that. Icon
    Your monitoring isn't a stack. It's a pile. Fix that.

    Errors, performance, logs, uptime. One install, one invoice, one UI.

    Replace Datadog, New Relic, and Sentry without adding three more dashboards.
    Free 30 days.
  • 10
    AudioLM - Pytorch

    AudioLM - Pytorch

    Implementation of AudioLM audio generation model in Pytorch

    Implementation of AudioLM, a Language Modeling Approach to Audio Generation out of Google Research, in Pytorch It also extends the work for conditioning with classifier free guidance with T5. This allows for one to do text-to-audio or TTS, not offered in the paper. Yes, this means VALL-E can be trained from this repository. It is essentially the same. This repository now also contains a MIT licensed version of SoundStream.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    OmniVoice

    OmniVoice

    High-Quality Voice Cloning TTS for 600+ Languages

    ...Built on a diffusion language model-style architecture, it combines scalability with strong performance, enabling both natural-sounding voice synthesis and efficient inference speeds. One of its most notable capabilities is zero-shot voice cloning, allowing users to replicate a speaker’s voice using only a short reference audio clip. In addition, it supports voice design through configurable attributes such as gender, accent, pitch, and speaking style, giving users fine-grained control over generated speech. The system also includes advanced features like non-verbal expression tags and pronunciation overrides, enabling expressive and precise output. With support for both API-based and command-line usage, it is designed for research, production, and experimentation alike.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 12
    notebooklm-py

    notebooklm-py

    Unofficial Python API and agentic skill for Google NotebookLM

    ...The project covers notebook management, source ingestion, conversational querying, research workflows, and sharing controls, while also enabling the generation of a wide range of study and media artifacts. These outputs include audio overviews, videos, slide decks, infographics, quizzes, flashcards, reports, data tables, and mind maps, with configurable formats and export options.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 13
    OuteTTS

    OuteTTS

    Interface for OuteTTS models

    ...It also includes a notion of speaker profiles: you can create a speaker from a short audio sample, save it as JSON, and reuse it for consistent voice identity across generations and sessions. For best quality, the model is designed to work with a reference speaker clip and will inherit emotion, style, and accent from that reference.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Real-Time Voice Cloning

    Real-Time Voice Cloning

    Clone a voice in 5 seconds to generate arbitrary speech in real-time

    Real-Time Voice Cloning is an influential deep-learning repository that demonstrates how to clone a voice from just a few seconds of audio and then generate arbitrary speech in that voice in near real time. It implements the SV2TTS pipeline (“Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”) in three stages: a speaker encoder, a synthesizer, and a vocoder. In the first stage, short audio clips are converted into a fixed-dimensional speaker embedding that captures voice characteristics; this embedding is then used by a Tacotron-style synthesizer to generate spectrograms from text, which a WaveRNN-based vocoder finally turns into audio. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    HunyuanCustom

    HunyuanCustom

    Multimodal-Driven Architecture for Customized Video Generation

    HunyuanCustom is a multimodal video customization framework by Tencent Hunyuan, aimed at generating customized videos featuring particular subjects (people, characters) under flexible conditions, while maintaining subject/identity consistency. It supports conditioning via image, audio, video, and text, and can perform subject replacement in videos, generate avatars speaking given audio, or combine multiple subject images. The architecture builds on HunyuanVideo, with added modules for identity reinforcement and modality-specific condition injection. Text-image fusion module based on LLaVA for improved multimodal understanding. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Whisper-WebUI

    Whisper-WebUI

    A Web UI for easy subtitle using whisper model

    Whisper WebUI is an open-source browser-based interface that simplifies the use of Whisper speech recognition models by providing an intuitive graphical environment for transcription, translation, and subtitle generation. Built with Gradio, it allows users to upload audio or video files, process them locally, and generate accurate text outputs without relying on command-line tools. The platform integrates optimized implementations such as faster-whisper, significantly improving transcription speed and reducing memory usage compared to standard models. It supports multiple input sources including local files, YouTube content, and microphone input, making it versatile for different workflows. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 17
    pyttsx3

    pyttsx3

    Offline Text To Speech synthesis for python

    ...The library exposes a simple but flexible API for controlling voice selection, speaking rate, volume, and other synthesis parameters from Python code. It supports both a high-level speak convenience function and a lower-level engine object with event hooks, queuing, and saving output to audio files. The repository includes examples and documentation that show how to adjust properties dynamically, persist synthesized output, and integrate pyttsx3 into GUIs or background services.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 18
    Label Studio

    Label Studio

    Label Studio is a multi-type data labeling and annotation tool

    ...Configurable label formats let you customize the visual interface to meet your specific labeling needs. Support for multiple data types including images, audio, text, HTML, time-series, and video.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 19
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    ...It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common formats like MP3 or WAV. It also supports generating subtitle files (such as SRT or VTT) alongside the speech, which is handy for video narration, e-learning, or accessibility workflows. From the CLI you can adjust parameters such as speaking rate, volume, and pitch, giving you some control over prosody without diving into SSML. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 20
    TorchAudio

    TorchAudio

    Data manipulation and transformation for audio signal processing

    The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the same philosophy of providing strong GPU acceleration, having a focus on trainable features through the autograd system, and having consistent style (tensor names and dimension names). Therefore, it is primarily a machine learning library and not a general signal processing library.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    stt

    stt

    Voice Recognition to Text Tool

    ...The project is designed to be easy to deploy: you can run a local Python server that exposes an HTTP API for uploading audio/video files and retrieving transcriptions in different formats. It supports GPU acceleration if available, enabling faster processing on compatible hardware but still offers reliable performance on CPUs alone.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    Auto Synced & Translated Dubs

    Auto Synced & Translated Dubs

    Automatically translates the text of a video based on a subtitle file

    ...It assumes you have a human-made SRT (or similar) subtitle file; the script then uses translation services such as Google Cloud or DeepL to generate translated subtitle tracks in one or more target languages. Using the timestamps of each subtitle line, it computes the required duration of each spoken segment and synthesizes audio via neural TTS services, producing one audio clip per subtitle entry. The tool then time-stretches or compresses each TTS clip to match the original speech duration exactly, which preserves lip-sync and rhythm as closely as possible without manual editing. Finally, it combines all the clips into a single dubbed audio track that can be muxed with the original video, along with new translated subtitle files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    EPUB to Audiobook Converter

    EPUB to Audiobook Converter

    EPUB to audiobook converter, optimized for Audiobookshelf

    EPUB to Audiobook Converter is a tool designed to convert EPUB ebooks into chaptered audiobooks, optimized specifically for Audiobookshelf servers. It reads each chapter from an EPUB file, generates audio using a chosen text-to-speech backend, and outputs separate MP3 files with chapter titles preserved as metadata to make navigation easier. The project supports multiple TTS providers, including Microsoft Azure TTS, EdgeTTS, OpenAI TTS, local Piper, and Kokoro via an OpenAI-compatible endpoint, allowing users to choose between cloud and self-hosted voices. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 24
    VoxCPM

    VoxCPM

    TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

    ...Trained on a large 1.8-million-hour bilingual corpus, VoxCPM can infer appropriate speaking style from context, dynamically adjusting intonation, rhythm, and emotional tone. It supports zero-shot voice cloning from a short reference audio clip, capturing timbre, accent, and pacing to closely mimic a target speaker without per-speaker fine-tuning.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 25
    AI-Media2Doc

    AI-Media2Doc

    AI tool converting video/audio into structured documents instantly

    AI-Media2Doc is a web-based application that uses large language models to convert video and audio content into structured, readable documents in a single workflow. It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse. AI-Media2Doc emphasizes privacy by processing media locally in the browser using WebAssembly-based ffmpeg, ensuring that original video files are not uploaded externally. ...
    Downloads: 1 This Week
    Last Update:
    See Project
Auth0 Logo