Use Microsoft Edge's online text-to-speech service from Python
Python library and CLI tool to interface with Google Translate
SOTA Open Source TTS
Official MiniMax Model Context Protocol (MCP) server
Library for OCR-related tasks powered by Deep Learning
A simple native web interface that uses ChatTTS to synthesize text
Underthesea - Vietnamese NLP Toolkit
MTEB: Massive Text Embedding Benchmark
Generating Immersive, Explorable, and Interactive 3D Worlds
Qwen-Image is a powerful image generation foundation model
A high-quality rapid TTS voice cloning model
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Ready-to-use OCR with 80+ supported languages
Qwen3-omni is a natively end-to-end, omni-modal LLM
Generate audiobooks from e-books, voice cloning & 1107+ languages
Python binding to the Apache Tika™ REST services
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Towards Human-Level Text-to-Speech through Style Diffusion
Audiocraft is a library for audio processing and generation
text and image to video generation: CogVideoX (2024) and CogVideo
Persian NLP Toolkit
A robust, efficient, low-latency speech-to-text library
Reading book source
Industrial-level controllable zero-shot text-to-speech system
Open source no-code system for text annotation and building of text