Voice Recognition to Text Tool
SOTA Open Source TTS
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Speech recognition module for Python
Wan2.2: Open and Advanced Large-Scale Video Generative Model
MTEB: Massive Text Embedding Benchmark
Stanford NLP Python library for many human languages
Generate audiobooks from e-books, voice cloning & 1107+ languages
Speech-AI-Forge is a project developed around TTS generation model
tiktoken is a fast BPE tokeniser for use with OpenAI's models
The behavior guidance framework for customer-facing LLM agents
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Tokenizer-Free TTS for Multilingual Speech Generation
Text and image to video generation: CogVideoX and CogVideo
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Implementation of Imagen, Google's Text-to-Image Neural Network
A simple, high-quality voice conversion tool focused on ease of use
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Easy-to-use and powerful NLP library with Awesome model zoo
CLIP, Predict the most relevant text snippet given an image
A simple native web interface that uses ChatTTS to synthesize text
Label Studio is a multi-type data labeling and annotation tool
A nearly-live implementation of OpenAI's Whisper
Framework for building realtime multimodal voice AI agents apps
A high-quality rapid TTS voice cloning model