Multimodal Diffusion with Representation Alignment
Self-hosted AI audio transcription
Interface for OuteTTS models
Offline Text To Speech synthesis for python
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
The python library for real-time communication
Python inference and LoRA trainer package for the LTX-2 audio–video
Controllable & emotion-expressive zero-shot TTS
Swift community driven package for OpenAI public API
Make videos programmatically with React
Manage Claude Code in style
Industrial-level controllable zero-shot text-to-speech system
Marrying Grounding DINO with Segment Anything & Stable Diffusion
WhatsApp tool for chatbots with advanced features
The official Python Library for the Groq API
Cross-platform, customizable ML solutions
(Golang) Go bindings for Discord
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
A HTML5 video player with a parser that saves traffic
Towards Human-Level Text-to-Speech through Style Diffusion
Generate high-definition story short videos with one click using AI
Open Source AI Dictation App
Deep Learning API and Server in C++14 support for Caffe, PyTorch
Hub of ready-to-use datasets for ML models
Transform your voice in real-time voxal voice changer