A high-throughput and memory-efficient inference and serving engine
A lightweight vLLM implementation built from scratch
System Level Intelligent Router for Mixture-of-Models at Cloud
Personal AI, On Personal Devices
Visual Causal Flow
Private Open AI on Kubernetes
Moonshot's most powerful AI model
A unified library of SOTA model optimization techniques
NVIDIA plugin for secure installation of OpenClaw
Towards Human-Sounding Speech
Run a full local LLM stack with one command using Docker
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
Accelerate local LLM inference and finetuning
Interface for OuteTTS models
From Vibe Coding to Agentic Engineering
The free, Open Source alternative to OpenAI, Claude and others
Open source AI IDE and Cursor alternative
Qwen3 is the large language model series developed by Qwen team
Advanced language and coding AI model
Accurate × Fast × Comprehensive
Full-stack Open-source Self-Evolving General AI Agent
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs
MiniMax M2.1, a SOTA model for real-world dev & agents.
Qwen2.5-VL is the multimodal large language model series
Performance-optimized AI inference on your GPUs