Image polygonal annotation with Python
Data Infrastructure providing an approach to multimodal AI workloads
StreamSpeech is a seamless model for offline speech recognition
Private chat with local GPT with document, images, video, etc.
The AI toolkit for the AI developer
Weaving the Digital Agent Galaxy
AnyTool: Universal Tool-Use Layer for AI Agents
Build multimodal language agents for fast prototype and production
A fast TTS architecture with conditional flow matching
Agent S: an open agentic framework that uses computers like a human
WhatsApp MCP server enabling AI access to chats and messaging
Meta-Datenbank-Anwendung für die Audio- und TV-Sendungen des CC2.TV
Graphical User Interface Face Anonymization Tool
A graphical manager for ollama that can manage your LLMs
Real-time behaviour synthesis with MuJoCo, using Predictive Control
An extremely simple tool for separating vocals and background music
Unlimited, private and free Speech-To-Text program
Official Code for DragGAN (SIGGRAPH 2023)
A webui for different audio related Neural Networks
Multimodal AI Story Teller, built with Stable Diffusion, GPT, etc.
Txt-2-Mp3 6.3 Mark 2 [Improved.Simplified.Alternative]
Img2Txt - Extract Text From Images using AI
Real-time music generation using stable diffusion techniques AI
Based on the Disco Diffusion, version of the AI art creation software
Clone a voice in 5 seconds to generate arbitrary speech in real-time