A suite of advanced multi-modal LLMs
TextWorld is a sandbox learning environment for the training
Open Source Speech Language Model
Implementing large models into scenario-based applications
SQL-Driven RAG Engine
AI-assisted storyboard and video generation tool
Framework for building real-time voice and multimodal AI agents
Knowledge Graph Generation from Any Text
The python library for real-time communication
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Open-source multi-speaker long-form text-to-speech model
Towards Human-Sounding Speech
Automated translation solution for visual novels
Web-based tool converts GitHub repository contents
Semantic search and document parsing tools for the command line
Build Vision Agents quickly with any model or video provider
Visual Causal Flow
Improve your resumes with Resume Matcher
Fast multimodal LLM for real-time voice interaction and AI apps
Diffusion Transformer with Fine-Grained Chinese Understanding
Large-language-model & vision-language-model based on Linear Attention
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
AI tool that turns Hacker News posts into daily podcast updates
AI tool for automatic batch short video creation and editing
Running large language models on a single GPU