Open source demo platform where you can easily showcase your AI models
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System
A Pioneering Open-Source Alternative to GPT-4o
A state-of-the-art open visual language model
Skywork-R1V is an advanced multimodal AI model series
Autoregressive Model Beats Diffusion
StarVector is a foundation model for SVG generation
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Visual intelligence for your home.
Driving with Graph Visual Question Answering
LISA: Reasoning Segmentation via Large Language Model
Refer and Ground Anything Anywhere at Any Granularity
Weaving the Digital Agent Galaxy
Qwen3-omni is a natively end-to-end, omni-modal LLM
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
Open-source evaluation toolkit of large multi-modality models (LMMs)
Extension of Google Research’s PaperBanana
From Paper to Presentation in One Click
Chinese and English multimodal conversational language model
Gracefully face hCaptcha challenge with multimodal llms
Phi-3.5 for Mac: Locally-run Vision and Language Models
Gemma open-weight LLM library, from Google DeepMind
A frontier, first-principles handbook
Large-language-model & vision-language-model based on Linear Attention
Unifying 3D Mesh Generation with Language Models