Visual intelligence for your home.
Benchmark LLMs by fighting in Street Fighter 3
Generate short videos with one click using AI LLM
"VideoRAG: Chat with Your Videos
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Generate blog articles from video or audio
All-in-one WebUI for AI generative image and video creation
Search all of YouTube from the command line
Qwen3-omni is a natively end-to-end, omni-modal LLM
Capable of understanding text, audio, vision, video
GPT4V-level open-source multi-modal model based on Llama3-8B
From nobody to big model (LLM) hero
Code and models for ICML 2024 paper, NExT-GPT
Lightweight Python library for adding real-time multi-object tracking
Qwen2.5-VL is the multimodal large language model series
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Data Infrastructure providing an approach to multimodal AI workloads
Build multimodal language agents for fast prototype and production
Adversarial Robustness Toolbox (ART) - Python Library for ML security
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
The Triton Inference Server provides an optimized cloud
The Cradle framework is a first attempt at General Computer Control
A Pioneering Open-Source Alternative to GPT-4o
Data Lake for Deep Learning. Build, manage, and query datasets