SOTA Open Source TTS
Automatic Speech Recognition with Word-level Timestamps
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Wan2.1: Open and Advanced Large-Scale Video Generative Model
A text-to-speech, speech-to-text and speech-to-speech library
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Text and image to video generation: CogVideoX and CogVideo
Converts text to speech in realtime
Persian NLP Toolkit
Library for OCR-related tasks powered by Deep Learning
Generate audiobooks from e-books, voice cloning & 1107+ languages
A simple, high-quality voice conversion tool focused on ease of use
An open source implementation of CLIP
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
The behavior guidance framework for customer-facing LLM agents
A Unified Framework for Text-to-3D and Image-to-3D Generation
MTEB: Massive Text Embedding Benchmark
Lightning-fast, on-device TTS, running natively via ONNX
Handwritten Text Recognition (HTR) system implemented with TensorFlow
A TTS that fits in your CPU (and pocket)
Open source no-code system for text annotation and building of text
A Powerful Native Multimodal Model for Image Generation
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Agent harness to make your slop code well-engineered and beautiful