AI tool that converts GitHub repositories into interactive diagrams
Qwen3-omni is a natively end-to-end, omni-modal LLM
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Code for running inference and finetuning with SAM 3 model
Visual intelligence for your home.
Multimodal embedding and reranking models built on Qwen3-VL
Search all of YouTube from the command line
ChatGPT interface with better UI
Qwen2.5-VL is the multimodal large language model series
Models for object and human mesh reconstruction
An extensive node suite that enables ComfyUI to process 3D inputs
SDK for building interactive UI components over MCP for AI tools
Open-source MCP server that gives your coding agent
Tiny vision language model
Official implementation of DreamCraft3D
Easy to use Python library for creating 2D arcade games
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
NVR with realtime local object detection for IP cameras
Automatically translates the text of a video based on a subtitle file
A general fine-tuning kit geared toward image/video/audio diffusion
A Python library for audio
AI based photo editing website for changing image background
Data Lake for Deep Learning. Build, manage, and query datasets
Code and models for ICML 2024 paper, NExT-GPT
The Triton Inference Server provides an optimized cloud