Qwen3-omni is a natively end-to-end, omni-modal LLM
We write your reusable computer vision tools
A Pioneering Open-Source Alternative to GPT-4o
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Harmonized and Coherent Human Image Animation
21 Lessons, Get Started Building with Generative AI
A Telegram bot that integrates with OpenAI's official ChatGPT APIs
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Convert AI papers to GUI
The data structure for multimodal data
Code for running inference and finetuning with SAM 3 model
Effortless data labeling with AI support from Segment Anything
Segmentation models with pretrained backbones. PyTorch
A lightweight vision library for performing large object detection
Document Image Parsing via Heterogeneous Anchor Prompting”
Advancing Open-source World Models
Multimodal embedding and reranking models built on Qwen3-VL
This is a background removing tool powered by InSPyReNet
Easily pair images with audio file counterparts in bulk
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Spring AI Alibaba examples for building and testing AI apps
Project Lyra: Open Generative 3D World Models
An extensive node suite that enables ComfyUI to process 3D inputs
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
PyTorch code and models for V-JEPA self-supervised learning from video