Convert (animated) stickers to/from WhatsApp, Telegram, Signal
Generating Immersive, Explorable, and Interactive 3D Worlds
Sharp Monocular Metric Depth in Less Than a Second
Dealing with all unstructured data, such as reverse image search
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Qwen3-omni is a natively end-to-end, omni-modal LLM
A powerful, free and open-source tool for TextureAtlases/Spritesheets
A Pioneering Open-Source Alternative to GPT-4o
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Harmonized and Coherent Human Image Animation
Advancing Open-source World Models
A Telegram bot that integrates with OpenAI's official ChatGPT APIs
21 Lessons, Get Started Building with Generative AI
The data structure for multimodal data
Effortless data labeling with AI support from Segment Anything
Segmentation models with pretrained backbones. PyTorch
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Code for running inference and finetuning with SAM 3 model
A lightweight vision library for performing large object detection
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
This is a background removing tool powered by InSPyReNet
Spring AI Alibaba examples for building and testing AI apps
Windrecorder is a memory search app by records everything
Multimodal embedding and reranking models built on Qwen3-VL
Easily pair images with audio file counterparts in bulk