Give Claude the ability to watch and understand videos
Edit videos with Claude Code
The python library for real-time communication
The Triton Inference Server provides an optimized cloud
Capable of understanding text, audio, vision, video
A HTML5 video player with a parser that saves traffic
A react-based starter app for using the Live API over websockets
Document Image Parsing via Heterogeneous Anchor Prompting”
Qwen3-omni is a natively end-to-end, omni-modal LLM
Make videos programmatically with React
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Open source text-to-speech tool, supports extra-long text
Taming Stable Diffusion for Lip Sync
Video translation and dubbing tool powered by LLMs
Cross-platform, customizable ML solutions
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
AI-powered MCP server for desktop file and terminal automation
Free, high-quality text-to-speech API endpoint to replace OpenAI
WhatsApp MCP server enabling AI access to chats and messaging
A lightweight text-to-speech model with zero-shot voice cloning
Large Multimodal Models for Video Understanding and Editing
Build Vision Agents quickly with any model or video provider
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Access to Anthropic's safety-first language model APIs
Official MiniMax Model Context Protocol (MCP) server