Visual intelligence for your home.
Benchmark LLMs by fighting in Street Fighter 3
Generate short videos with one click using AI LLM
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
"VideoRAG: Chat with Your Videos
Generate blog articles from video or audio
All-in-one WebUI for AI generative image and video creation
Moonshot's most powerful AI model
The media player for language learning, with dual subtitles
GPT4V-level open-source multi-modal model based on Llama3-8B
Search all of YouTube from the command line
Qwen3-omni is a natively end-to-end, omni-modal LLM
Capable of understanding text, audio, vision, video
From nobody to big model (LLM) hero
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Code and models for ICML 2024 paper, NExT-GPT
Lightweight Python library for adding real-time multi-object tracking
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Workflow and speech recognition app
Data Infrastructure providing an approach to multimodal AI workloads
Build multimodal language agents for fast prototype and production
Qwen2.5-VL is the multimodal large language model series
Secure open source cloud runtime for AI apps & AI agents
Adversarial Robustness Toolbox (ART) - Python Library for ML security
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA