A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Audio Language Models are Few-Shot Learners
Reading book source
Han Language Processing
One-click deployment (including offline integration package)
Open-source Video Translation Skill
Googles NotebookLM but local
A Web UI for easy subtitle using whisper model
Open source AI VTuber platform with voice chat and Live2D avatars
Conversational voice AI agents
Scalable generative AI framework built for researchers and developers
Multi-modal large language model designed for audio understanding
Chat with it via text and voice
Framework for building AI-powered interactive digital humans and agent
Flowly is 100x faster than OpenClaw
LLM Large Model of Selling Anchor
Powerful Android AI agent with tools, automation, and Linux shell
Easy-to-use Speech Toolkit including Self-Supervised Learning model
AI generative media user experience highlighting use of APIs
Generate blog articles from video or audio
Build Vision Agents quickly with any model or video provider
A Conversational Speech Generation Model
Official Python inference and LoRA trainer package
Pre-trained Deep Learning models and demos
A very simple framework for state-of-the-art NLP