Interface for OuteTTS models
Open source NLP guide with models, methods, and real use cases
Visual Causal Flow
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Bidirectional token-classification model for identifiable info
Large-language-model & vision-language-model based on Linear Attention
Quick illustration of how one can easily read books together with LLMs
Fast multimodal LLM for real-time voice interaction and AI apps
Autoregressive Model Beats Diffusion
General-purpose image editing model that delivers high-fidelity
Diffusion Transformer with Fine-Grained Chinese Understanding
Build Vision Agents quickly with any model or video provider
MARS5 speech model (TTS) from CAMB.AI
A Web UI for easy subtitle using whisper model
Context-aware desktop AI assistant that understands screen content
Data Infrastructure providing an approach to multimodal AI workloads
Build multimodal language agents for fast prototype and production
Generate Any 3D Scene in Seconds
Multi-lingual large voice generation model, providing inference
Sample code and notebooks for Generative AI on Google Cloud
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Implementation of AudioLM audio generation model in Pytorch
Framework for building, orchestrating, and deploying AI agents
A single Gradio + React WebUI with extensions for ACE-Step
Framework for building AI-powered interactive digital humans and agent