Speech recognition module for Python
Implementation of Imagen, Google's Text-to-Image Neural Network
Python binding to the Apache Tika™ REST services
A community-supported supercharged version of paperless
The behavior guidance framework for customer-facing LLM agents
Implementation of Phenaki Video, which uses Mask GIT
An Open Source text-to-speech system built by inverting Whisper
Generate blog articles from video or audio
A text-to-speech, speech-to-text and speech-to-speech library
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Qwen-Image is a powerful image generation foundation model
Persian NLP Toolkit
An open-source toolkit for monitoring Language Learning Models (LLMs)
A TTS that fits in your CPU (and pocket)
OCR model for complex documents with layout-aware structured outputs
Foundation model for image generation
Chat with it via text and voice
Toolkit for conversational AI
A nearly-live implementation of OpenAI's Whisper
Qwen2.5-VL is the multimodal large language model series
Easily compute clip embeddings and build a clip retrieval system
Collection of Gemma 3 variants that are trained for performance
Synchronized Translation for Videos
Python library and CLI tool to interface with Google Translate
Easy-to-use and powerful NLP library with Awesome model zoo