Document (PDF, Word, PPTX ...) extraction and parse API
Hypernetworks that adapt LLMs for specific benchmark tasks
Practical productivity tools for Claude Code, Codex-CLI
Text and image to video generation: CogVideoX and CogVideo
Qwen3-TTS is an open-source series of TTS models
Awesome multilingual OCR toolkits based on PaddlePaddle
Generate audiobooks from EPUBs, PDFs and text with captions
A TTS that fits in your CPU (and pocket)
A robust, efficient, low-latency speech-to-text library
Chat with it via text and voice
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Framework for building real-time voice and multimodal AI agents
A Web UI for easy subtitle using whisper model
Reading book source
A fast TTS architecture with conditional flow matching
Audiocraft is a library for audio processing and generation
Offline Text To Speech synthesis for python
Deep Research framework, combining language models with tools
Converts text to speech in realtime
Enhances Tesseract OCR output using LLMs (local or API)
Official inference repo for FLUX.1 models
Stable Diffusion WebUI optimized for AMD GPUs with editing tools
Using AI models to automatically provide commentary and edit videos
Offline inference engine for art, real-time voice conversations
Persian NLP Toolkit