The behavior guidance framework for customer-facing LLM agents
A community-supported supercharged version of paperless
A 0.1B Omni model trained from scratch
Enhances Tesseract OCR output using LLMs (local or API)
CLIP, Predict the most relevant text snippet given an image
A high-quality rapid TTS voice cloning model
Qwen-Image is a powerful image generation foundation model
The official Python SDK for the ElevenLabs API
OCR model for complex documents with layout-aware structured outputs
Persian NLP Toolkit
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Collection of Gemma 3 variants that are trained for performance
A nearly-live implementation of OpenAI's Whisper
A lightweight text-to-speech model with zero-shot voice cloning
Generate audiobooks from e-books
Unified web UI for training and running open models locally
Implementation of Phenaki Video, which uses Mask GIT
State-of-the-art TTS model under 25MB
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Implementation of Imagen, Google's Text-to-Image Neural Network
Python binding to the Apache Tika™ REST services
A sound cloning tool with a web interface, using your voice
Easily compute clip embeddings and build a clip retrieval system
Generate blog articles from video or audio