Qwen-Image is a powerful image generation foundation model
General-purpose image editing model that delivers high-fidelity
OCR expert VLM powered by Hunyuan's native multimodal architecture
Controllable & emotion-expressive zero-shot TTS
Pokee Deep Research Model Open Source Repo
Official implementation of Watermark Anything with Localized Messages
The official repo of Qwen chat & pretrained large language model
Robust Speech Recognition Across Languages, Dialects
Video Object and Interaction Deletion
A state-of-the-art open visual language model
Qwen3-omni is a natively end-to-end, omni-modal LLM
PyTorch code and models for the DINOv2 self-supervised learning
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
GPT4V-level open-source multi-modal model based on Llama3-8B
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
RGBD video generation model conditioned on camera input
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
Fast and Universal 3D reconstruction model for versatile tasks
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Global weather forecasting model using graph neural networks and JAX
Tooling for the Common Objects In 3D dataset
ChatGPT interface with better UI
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning