Python inference and LoRA trainer package for the LTX-2 audio–video
Multi-user UI for managing and running Stable Diffusion workflows tool
Framework for building real-time voice and multimodal AI agents
Code and models for ICML 2024 paper, NExT-GPT
Spring AI Alibaba examples for building and testing AI apps
Industrial-level controllable zero-shot text-to-speech system
Voice Recognition to Text Tool
The official Python library for the OpenAI API
A sound cloning tool with a web interface, using your voice
ImageBind One Embedding Space to Bind Them All
GenAI Processors is a lightweight Python library
A2M is a desktop app that converts AUDIO TO MIDI in one click.
Open source codebase for Scale Agentex
An opinionated CLI to transcribe Audio files w/ Whisper on-device
Controllable & emotion-expressive zero-shot TTS
The official Python SDK for the ElevenLabs API
Reusable workflow library for Django
The official Python Library for the Groq API
Video editing with Python
A high-quality rapid TTS voice cloning model
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Converts text to speech in realtime
Qwen3-TTS is an open-source series of TTS models
Translate the video from one language to another and embed dubbing
StreamSpeech is a seamless model for offline speech recognition