Phi-3.5 for Mac: Locally-run Vision and Language Models
Visual Instruction Tuning: Large Language-and-Vision Assistant
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Chat & pretrained large vision language model
Large-language-model & vision-language-model based on Linear Attention
A state-of-the-art open visual language model
Qwen2.5-VL is the multimodal large language model series
Chinese and English multimodal conversational language model
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Refer and Ground Anything Anywhere at Any Granularity
Capable of understanding text, audio, vision, video
Database system for building simpler and faster AI-powered application
An open-source framework for training large multimodal models
Codes for "Chameleon: Plug-and-Play Compositional Reasoning