Qwen3-TTS is an open-source series of TTS models
Generate audiobooks from EPUBs, PDFs and text with captions
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Wan2.1: Open and Advanced Large-Scale Video Generative Model
OCR software, free and offline
Automatic Speech Recognition with Word-level Timestamps
A simple native web interface that uses ChatTTS to synthesize text
SOTA Open Source TTS
Robust Speech Recognition via Large-Scale Weak Supervision
High-Quality Voice Cloning TTS for 600+ Languages
Tokenizer-Free TTS for Multilingual Speech Generation
Official inference repo for FLUX.1 models
Offline Text To Speech synthesis for python
A Family of Open Sourced Music Foundation Models
A robust, efficient, low-latency speech-to-text library
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML
Library for OCR-related tasks powered by Deep Learning
Official inference repo for FLUX.2 models
A generative speech model for daily dialogue
Open source annotation tool for machine learning practitioners
Text and image to video generation: CogVideoX and CogVideo
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Converts text to speech in realtime
Open source no-code system for text annotation and building of text
A simple, high-quality voice conversion tool focused on ease of use