Open source no-code system for text annotation and building of text
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Framework for building realtime multimodal voice AI agents apps
Converts text to speech in realtime
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Video-based AI memory library. Store millions of text chunks in MP4
Generate audiobooks from e-books, voice cloning & 1107+ languages
EPUB to audiobook converter, optimized for Audiobookshelf
Handwritten Text Recognition (HTR) system implemented with TensorFlow
Python library and CLI tool to interface with Google Translate
Speech-AI-Forge is a project developed around TTS generation model
Industrial-level controllable zero-shot text-to-speech system
Open source healthcare AI
AI-powered tool for generating, optimizing, and translating subtitles
Free, high-quality text-to-speech API endpoint to replace OpenAI
A 0.1B Omni model trained from scratch
Faster Whisper transcription with CTranslate2
Use Microsoft Edge's online text-to-speech service from Python
State-of-the-art (SoTA) text-to-video pre-trained model
An open-source toolkit for monitoring Language Learning Models (LLMs)
Qwen3-omni is a natively end-to-end, omni-modal LLM
Reading book source
A text-to-speech, speech-to-text and speech-to-speech library
Foundation model for image generation
The behavior guidance framework for customer-facing LLM agents