Control Any Computer Using LLMs
Document (PDF, Word, PPTX ...) extraction and parse API
High-performance inference server for text embeddings models API layer
Focus on prompting and generating
A GUI tool for extracting hard-coded subtitle (hardsub) from videos
TTS with kokoro and onnx runtime
Stable Diffusion web UI
OCR software, free and offline
GUI for a Vocal Remover that uses Deep Neural Networks
Python binding to the Apache Tika™ REST services
A simple native web interface that uses ChatTTS to synthesize text
A sound cloning tool with a web interface, using your voice
A simple, high-quality voice conversion tool focused on ease of use
Generate audiobooks from e-books
A text-to-speech, speech-to-text and speech-to-speech library
Interface for OuteTTS models
Python library and CLI tool to interface with Google Translate
Google Gen AI Python SDK provides an interface for developers
Label Studio is a multi-type data labeling and annotation tool
Awesome multilingual OCR toolkits based on PaddlePaddle
Generate audiobooks from e-books, voice cloning & 1107+ languages
Stable Diffusion WebUI optimized for AMD GPUs with editing tools
The most powerful and modular diffusion model GUI, api and backend
Self-host the powerful Chatterbox TTS model
EPUB to audiobook converter, optimized for Audiobookshelf