Industrial-level controllable zero-shot text-to-speech system
Data Infrastructure providing an approach to multimodal AI workloads
Towards Human-Sounding Speech
A sound cloning tool with a web interface, using your voice
Spring AI Alibaba examples for building and testing AI apps
A Systematic Framework for Interactive World Modeling
The official Python library for the OpenAI API
A youtube-dl fork with additional features and fixes
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Build Vision Agents quickly with any model or video provider
Open Source Speech Language Model
Qwen3-ASR is an open-source series of ASR models
Python library and CLI tool to interface with Google Translate
The official Python Library for the Groq API
Translate the video from one language to another and embed dubbing
pyglet is a cross-platform windowing and multimedia library for Python
Code and models for ICML 2024 paper, NExT-GPT
Build AI-powered semantic search applications
Qwen3-TTS is an open-source series of TTS models
State-of-the-art diffusion models for image and audio generation
Python inference and LoRA trainer package for the LTX-2 audio–video
A lightweight text-to-speech model with zero-shot voice cloning
Adversarial Robustness Toolbox (ART) - Python Library for ML security
Video editing with Python
AI-powered tool for generating, optimizing, and translating subtitles