Visual intelligence for your home.
Benchmark LLMs by fighting in Street Fighter 3
Generate short videos with one click using AI LLM
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
"VideoRAG: Chat with Your Videos
Generate blog articles from video or audio
All-in-one WebUI for AI generative image and video creation
GPT4V-level open-source multi-modal model based on Llama3-8B
The media player for language learning, with dual subtitles
Moonshot's most powerful AI model
Search all of YouTube from the command line
Qwen3-omni is a natively end-to-end, omni-modal LLM
Capable of understanding text, audio, vision, video
From nobody to big model (LLM) hero
Code and models for ICML 2024 paper, NExT-GPT
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Lightweight Python library for adding real-time multi-object tracking
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Workflow and speech recognition app
Data Infrastructure providing an approach to multimodal AI workloads
Build multimodal language agents for fast prototype and production
Secure open source cloud runtime for AI apps & AI agents
Adversarial Robustness Toolbox (ART) - Python Library for ML security
Qwen2.5-VL is the multimodal large language model series
Deep Learning API and Server in C++14 support for Caffe, PyTorch