Hub of ready-to-use datasets for ML models
WhatsApp MCP server enabling AI access to chats and messaging
Open-source abilities for OpenHome agents
Generate high-definition story short videos with one click using AI
Private chat with local GPT with document, images, video, etc.
Controllable and fast Text-to-Speech for over 7000 languages
Framework for building realtime multimodal voice AI agents apps
GLM-4-Voice | End-to-End Chinese-English Conversational Model
"VideoRAG: Chat with Your Videos
Context data platform for building observable, self-learning AI agents
Foundational model for human-like, expressive TTS
Official MiniMax Model Context Protocol (MCP) server
Software that uses AI to perform real-time voice conversion
Build multimodal AI applications with cloud-native stack
Marrying Grounding DINO with Segment Anything & Stable Diffusion
Improve human sleep through scientifically
Meta-Datenbank-Anwendung für die Audio- und TV-Sendungen des CC2.TV
An AI for Music Generation
TorchMultimodal is a PyTorch library
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Data Lake for Deep Learning. Build, manage, and query datasets
Build cross-modal and multimodal applications on the cloud
Toolkit for audio, music, and speech generation
Towards Human-Level Text-to-Speech through Style Diffusion
EasyABC is an open source ABC editor