InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System
Open source demo platform where you can easily showcase your AI models
Skywork-R1V is an advanced multimodal AI model series
Visual intelligence for your home.
LISA: Reasoning Segmentation via Large Language Model
StarVector is a foundation model for SVG generation
Driving with Graph Visual Question Answering
Autoregressive Model Beats Diffusion
Refer and Ground Anything Anywhere at Any Granularity
Open-source evaluation toolkit of large multi-modality models (LMMs)
A Pioneering Open-Source Alternative to GPT-4o
Extension of Google Research’s PaperBanana
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
A frontier, first-principles handbook
Phi-3.5 for Mac: Locally-run Vision and Language Models
GitLab automatic code review tool based on large models
Flock is a workflow-based low-code platform for building chatbots
From Paper to Presentation in One Click
Unifying 3D Mesh Generation with Language Models
Gracefully face hCaptcha challenge with multimodal llms
Large-language-model & vision-language-model based on Linear Attention
Guiding Instruction-based Image Editing via Multimodal Large Language
Open-source tool to visualise your RAG