Qwen-Image is a powerful image generation foundation model
Guiding Instruction-based Image Editing via Multimodal Large Language
Chat & pretrained large vision language model
CogView4, CogView3-Plus and CogView3(ECCV 2024)
GPT4V-level open-source multi-modal model based on Llama3-8B
Chinese and English multimodal conversational language model
Flutter-based cross-platform app integrating major AI models
Tensor search for humans
Capable of understanding text, audio, vision, video
Phi-3.5 for Mac: Locally-run Vision and Language Models
A state-of-the-art open visual language model
Open source libraries and APIs to build custom preprocessing pipelines
Qwen3-omni is a natively end-to-end, omni-modal LLM
Multilingual sentence & image embeddings with BERT
Refer and Ground Anything Anywhere at Any Granularity
The unofficial python package that returns response of Google Bard
The Multi-Agent Framework
Gemma open-weight LLM library, from Google DeepMind
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Data Lake for Deep Learning. Build, manage, and query datasets
Large-language-model & vision-language-model based on Linear Attention
An open-source framework for training large multimodal models