A high-quality rapid TTS voice cloning model
StreamSpeech is a seamless model for offline speech recognition
Multi-lingual large voice generation model, providing inference
Private AI platform for agents, enterprise search and RAG pipelines
Official PyTorch Implementation
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
The official Python Library for the Groq API
Build Vision Agents quickly with any model or video provider
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
The data structure for multimodal data
Build AI-powered semantic search applications
Open Vision Agents by Stream. Build voice and vision agents quickly
Minimal scripts to run the emulator in a container for various systems
Get your documents ready for gen AI
Large Multimodal Models for Video Understanding and Editing
Automated YouTube Shorts pipeline
Framework for building realtime multimodal voice AI agents apps
A Telegram RSS bot that cares about your reading experience
Pythonic bindings for FFmpeg's libraries
Open-source abilities for OpenHome agents
LLM Large Model of Selling Anchor
Synchronized Translation for Videos
Controllable and fast Text-to-Speech for over 7000 languages
GLM-4-Voice | End-to-End Chinese-English Conversational Model