A Family of Open Sourced Music Foundation Models
The python library for real-time communication
Multimodal-Driven Architecture for Customized Video Generation
Framework for building realtime multimodal voice AI agents apps
Faster Whisper transcription with CTranslate2
Underthesea - Vietnamese NLP Toolkit
Voice Recognition to Text Tool
An open-source toolkit for monitoring Language Learning Models (LLMs)
Generate audiobooks from e-books
CLIP, Predict the most relevant text snippet given an image
Deep Research framework, combining language models with tools
The most accurate natural language detection library for Python
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
Create videos with Stable Diffusion
Chat with it via text and voice
Label Studio is a multi-type data labeling and annotation tool
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Framework for building real-time voice and multimodal AI agents
Official Python inference and LoRA trainer package
Reading book source
Industrial-level controllable zero-shot text-to-speech system
Implementation of Phenaki Video, which uses Mask GIT
ComfyUI wrapper nodes for HunyuanVideo
AI-powered tool for generating, optimizing, and translating subtitles
Unifying 3D Mesh Generation with Language Models