Qwen-Image is a powerful image generation foundation model
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding
A 0.1B Omni model trained from scratch
Ling is a MoE LLM provided and open-sourced by InclusionAI
General-purpose image editing model that delivers high-fidelity
Towards self-verifiable mathematical reasoning
New family of code large language models (LLMs)
Robust Speech Recognition Across Languages, Dialects
A Powerful Native Multimodal Model for Image Generation
26m function call model that runs on incredibly small devices
Python SDK for Claude Agent
CLIP, Predict the most relevant text snippet given an image
Inference script for Oasis 500M
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
Renderer for the harmony response format to be used with gpt-oss
CodeGeeX2: A More Powerful Multilingual Code Generation Model
A Unified Framework for Text-to-3D and Image-to-3D Generation
The Clay Foundation Model - An open source AI model and interface
Open Source Speech Language Model
Open-source industrial-grade ASR models
Claude Code image, a one-stop open source transit service
A SOTA open-source image editing model
OCR expert VLM powered by Hunyuan's native multimodal architecture
GPT4V-level open-source multi-modal model based on Llama3-8B