Advanced language and coding AI model
Multimodal Diffusion with Representation Alignment
Fast stable diffusion on CPU and AI PC
Native and Compact Structured Latents for 3D Generation
A trainable PyTorch reproduction of AlphaFold 3
code for Mesh R-CNN, ICCV 2019
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Foundation Models for Time Series
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
Qwen3-omni is a natively end-to-end, omni-modal LLM
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
A Production-ready Reinforcement Learning AI Agent Library
Multi-modal large language model designed for audio understanding
State-of-the-art (SoTA) text-to-video pre-trained model
OCR expert VLM powered by Hunyuan's native multimodal architecture
Audio foundation model excelling in audio understanding
Open-source industrial-grade ASR models
Implementation of "MobileCLIP" CVPR 2024
Pokee Deep Research Model Open Source Repo
Capable of understanding text, audio, vision, video
Ultra-Efficient LLMs on End Device
AI Suite for upscaling, interpolating & restoring images/videos
StudioOllamaUI is a local, portable interface for Ollama
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation