A sound cloning tool with a web interface, using your voice
The official Python SDK for the ElevenLabs API
Stable Diffusion web UI
Faster Whisper transcription with CTranslate2
SoTA open-source TTS
Management of Yandex Station and other smart home devices
A lightweight text-to-speech model with zero-shot voice cloning
Official Python inference and LoRA trainer package
Capable of understanding text, audio, vision, video
Synchronized Translation for Videos
Open source machine learning framework to automate text conversations
An Open Source implementation of Notebook LM with more flexibility
Deep Research framework, combining language models with tools
Toolkit for conversational AI
Chat with it via text and voice
A Unified Framework for Text-to-3D and Image-to-3D Generation
Persian NLP Toolkit
Open-source multi-speaker long-form text-to-speech model
Collection of Gemma 3 variants that are trained for performance
TextWorld is a sandbox learning environment for the training
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
A Web UI for easy subtitle using whisper model
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Unified web UI for training and running open models locally
Towards Real-World Vision-Language Understanding