Official Python inference and LoRA trainer package
Agent S: an open agentic framework that uses computers like a human
Refer and Ground Anything Anywhere at Any Granularity
Wan2.2: Open and Advanced Large-Scale Video Generative Model
A Model Context Protocol server that provides network asset info
ContextGem: Effortless LLM extraction from documents
Automate browser-based workflows with LLMs and Computer Vision
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Official inference repo for FLUX.2 models
A Powerful Native Multimodal Model for Image Generation
Aider is AI pair programming in your terminal
Qwen-Image is a powerful image generation foundation model
Motion-controllable Video Generation via Latent Trajectory Guidance
AI-powered document analysis and tagging for Paperless-ngx
Qwen2.5-VL is the multimodal large language model series
Marrying Grounding DINO with Segment Anything & Stable Diffusion
An open source python library for automated feature engineering
DeepMind model for tracking arbitrary points across videos & robotics
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
local-first semantic code search engine
Real-time Claude Code usage monitor with predictions and warnings
Agent toolkit providing semantic retrieval and editing capabilities
Pushing the Limits of Mathematical Reasoning in Open Language Models
Controllable & emotion-expressive zero-shot TTS
An open sourced end-to-end VLM-based GUI Agent