InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System
Open source demo platform where you can easily showcase your AI models
Visual intelligence for your home.
Skywork-R1V is an advanced multimodal AI model series
A state-of-the-art open visual language model
LISA: Reasoning Segmentation via Large Language Model
StarVector is a foundation model for SVG generation
Driving with Graph Visual Question Answering
Autoregressive Model Beats Diffusion
Refer and Ground Anything Anywhere at Any Granularity
Weaving the Digital Agent Galaxy
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Gemma open-weight LLM library, from Google DeepMind
Open-source evaluation toolkit of large multi-modality models (LMMs)
A Pioneering Open-Source Alternative to GPT-4o
Phi-3.5 for Mac: Locally-run Vision and Language Models
Extension of Google Research’s PaperBanana
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
A frontier, first-principles handbook
Chinese and English multimodal conversational language model
Qwen3-omni is a natively end-to-end, omni-modal LLM
GitLab automatic code review tool based on large models
From Paper to Presentation in One Click
Unifying 3D Mesh Generation with Language Models
Gracefully face hCaptcha challenge with multimodal llms