Advanced language and coding AI model
Multimodal Diffusion with Representation Alignment
Fast stable diffusion on CPU and AI PC
Native and Compact Structured Latents for 3D Generation
GLM-4-Voice | End-to-End Chinese-English Conversational Model
code for Mesh R-CNN, ICCV 2019
Foundation Models for Time Series
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Qwen3-omni is a natively end-to-end, omni-modal LLM
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
A Production-ready Reinforcement Learning AI Agent Library
Multi-modal large language model designed for audio understanding
State-of-the-art (SoTA) text-to-video pre-trained model
OCR expert VLM powered by Hunyuan's native multimodal architecture
Audio foundation model excelling in audio understanding
MiniMax-M2, a model built for Max coding & agentic workflows
Open-source industrial-grade ASR models
Implementation of "MobileCLIP" CVPR 2024
Pokee Deep Research Model Open Source Repo
Capable of understanding text, audio, vision, video
Extension index for stable-diffusion-webui
Ultra-Efficient LLMs on End Device
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
Runtime extension of Proximus enabling Deployment on AMD Ryzen™ AI