A GUI Agent app based on UI-TARS to control your computer using AI
Official Python inference and LoRA trainer package
The visual feedback tool for agents
Go efficient multilingual NLP and text segmentation
Agent S: an open agentic framework that uses computers like a human
Refer and Ground Anything Anywhere at Any Granularity
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Vision AI browser agent for automation, testing, and extraction
A Model Context Protocol server that provides network asset info
AICI: Prompts as (Wasm) Programs
Official inference repo for FLUX.2 models
Self-hosted AI audio transcription
A light-weight and powerful meta-prompting, context engineering
Canvas-based WYSIWYG rich text editor with advanced layout tools
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
A Powerful Native Multimodal Model for Image Generation
Aider is AI pair programming in your terminal
ContextGem: Effortless LLM extraction from documents
Koishi plugin for NovelAI image generation with advanced controls
Automate browser-based workflows with LLMs and Computer Vision
Browser action engine for AI agents. 10× faster, resilient by design
Motion-controllable Video Generation via Latent Trajectory Guidance
A tool to snap pixels to a perfect grid
Pushing the Limits of Mathematical Reasoning in Open Language Models
Qwen2.5-VL is the multimodal large language model series