OCR software, free and offline
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
SOTA Open Source TTS
Robust Speech Recognition via Large-Scale Weak Supervision
A Family of Open Sourced Music Foundation Models
Contexts Optical Compression
Implementation of Imagen, Google's Text-to-Image Neural Network
Official inference repo for FLUX.2 models
A Powerful Native Multimodal Model for Image Generation
A generative speech model for daily dialogue
Text and image to video generation: CogVideoX and CogVideo
Label Studio is a multi-type data labeling and annotation tool
Qwen3-TTS is an open-source series of TTS models
A fast TTS architecture with conditional flow matching
MTEB: Massive Text Embedding Benchmark
Tokenizer-Free TTS for Multilingual Speech Generation
Audiocraft is a library for audio processing and generation
A robust, efficient, low-latency speech-to-text library
Ready-to-use OCR with 80+ supported languages
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Official inference repo for FLUX.1 models
EPUB to audiobook converter, optimized for Audiobookshelf
Offline inference engine for art, real-time voice conversations
Converts text to speech in realtime
Offline Text To Speech synthesis for python