Text and image to video generation: CogVideoX and CogVideo
Converts text to speech in realtime
Persian NLP Toolkit
Generate audiobooks from e-books, voice cloning & 1107+ languages
A simple, high-quality voice conversion tool focused on ease of use
An open source implementation of CLIP
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
A Unified Framework for Text-to-3D and Image-to-3D Generation
The behavior guidance framework for customer-facing LLM agents
MTEB: Massive Text Embedding Benchmark
Lightning-fast, on-device TTS, running natively via ONNX
Handwritten Text Recognition (HTR) system implemented with TensorFlow
A TTS that fits in your CPU (and pocket)
Open source no-code system for text annotation and building of text
A Powerful Native Multimodal Model for Image Generation
Agent harness to make your slop code well-engineered and beautiful
A Family of Open Sourced Music Foundation Models
The python library for real-time communication
Multimodal-Driven Architecture for Customized Video Generation
Framework for building realtime multimodal voice AI agents apps
Underthesea - Vietnamese NLP Toolkit
Voice Recognition to Text Tool
An open-source toolkit for monitoring Language Learning Models (LLMs)
Generate audiobooks from e-books
Deep Research framework, combining language models with tools