Qwen-Image is a powerful image generation foundation model
Autoregressive Model Beats Diffusion
All-in-one WebUI for AI generative image and video creation
GPT4V-level open-source multi-modal model based on Llama3-8B
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Chinese and English multimodal conversational language model
AI-powered code assistant for Vim. OpenAI and ChatGPT plugin for Vim
A Pioneering Open-Source Alternative to GPT-4o
Gracefully face hCaptcha challenge with multimodal llms
Tensor search for humans
Capable of understanding text, audio, vision, video
Phi-3.5 for Mac: Locally-run Vision and Language Models
Chat & pretrained large vision language model
A state-of-the-art open visual language model
The Multi-Agent Framework
Qwen3-omni is a natively end-to-end, omni-modal LLM
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System
Open source libraries and APIs to build custom preprocessing pipelines
Multilingual sentence & image embeddings with BERT
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Data Lake for Deep Learning. Build, manage, and query datasets
LISA: Reasoning Segmentation via Large Language Model
Gemma open-weight LLM library, from Google DeepMind
Open-source evaluation toolkit of large multi-modality models (LMMs)
Skywork-R1V is an advanced multimodal AI model series