Refer and Ground Anything Anywhere at Any Granularity
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Qwen-Image is a powerful image generation foundation model
Qwen2.5-VL is the multimodal large language model series
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Concatenate a directory full of files into a single prompt
local-first semantic code search engine
Tool-integrated Reasoning LLM Agents
Guiding Instruction-based Image Editing via Multimodal Large Language
Codes for "Chameleon: Plug-and-Play Compositional Reasoning