Easy-to-use Speech Toolkit including Self-Supervised Learning model
SoTA open-source TTS
Generate high-definition story short videos with one click using AI
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
A Web UI for easy subtitle using whisper model
Chat & pretrained large audio language model proposed by Alibaba Cloud
World's first open-source, agentic video production system
SOTA Open Source TTS
Aider is AI pair programming in your terminal
Multi-modal large language model designed for audio understanding
Powerful Android AI agent with tools, automation, and Linux shell
A nearly-live implementation of OpenAI's Whisper
Voice Recognition to Text Tool
AI framework for automated short video creation and editing tools
A Python library for audio
Open source AI model for generating full songs from lyrics prompts
Fully Local Manus AI. No APIs, No $200 monthly bills
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
Open source personal AI Assistant for Linux, Windows and Mac
A natural language interface for computers
Qwen3-ASR is an open-source series of ASR models
Multilingual speech recognition and audio understanding model
Converts text to speech in realtime
LLM Large Model of Selling Anchor
Extract audio and video content and organize it into a Markdown note