A high-throughput and memory-efficient inference and serving engine
A lightweight vLLM implementation built from scratch
Visual Causal Flow
Personal AI, On Personal Devices
A unified library of SOTA model optimization techniques
Towards Human-Sounding Speech
Run a full local LLM stack with one command using Docker
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
Accelerate local LLM inference and finetuning
Interface for OuteTTS models
Qwen3 is the large language model series developed by Qwen team
Advanced language and coding AI model
Open-source large language model family from Tencent Hunyuan
Multilingual Document Layout Parsing in a Single Vision-Language Model
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs
Accurate × Fast × Comprehensive
Renderer for the harmony response format to be used with gpt-oss
A course of learning LLM inference serving on Apple Silicon
Qwen2.5-VL is the multimodal large language model series
High-performance Inference and Deployment Toolkit for LLMs and VLMs
LightLLM is a Python-based LLM (Large Language Model) inference
Performance-optimized AI inference on your GPUs
FAIR Sequence Modeling Toolkit 2
Agent framework and applications built upon Qwen>=3.0
Qwen3-omni is a natively end-to-end, omni-modal LLM