Robust Speech Recognition Across Languages, Dialects
Easy Docker setup for Stable Diffusion with user-friendly UI
Open-source large language model family from Tencent Hunyuan
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Sharp Monocular Metric Depth in Less Than a Second
Renderer for the harmony response format to be used with gpt-oss
A Powerful Native Multimodal Model for Image Generation
A SOTA open-source image editing model
Repo of Qwen2-Audio chat & pretrained large audio language model
Repo for SeedVR2 & SeedVR
Programmatic access to the AlphaGenome model
MOSS‑TTS Family open‑source speech and sound generation model
26m function call model that runs on incredibly small devices
Qwen3-ASR is an open-source series of ASR models
Fast-stable-diffusion + DreamBooth
Collection of Gemma 3 variants that are trained for performance
Tool for exploring and debugging transformer model behaviors
Multimodal-Driven Architecture for Customized Video Generation
Multimodal Diffusion with Representation Alignment
Bidirectional token-classification model for identifiable info
Project Lyra: Open Generative 3D World Models
Accurate × Fast × Comprehensive
HY-Motion model for 3D character animation generation
4M: Massively Multimodal Masked Modeling
GLM-4-Voice | End-to-End Chinese-English Conversational Model