Real-time NVIDIA GPU dashboard
High-speed Large Language Model Serving for Local Deployment
157 models, 30 providers, one command to find what runs on hardware
How to optimize some algorithm in cuda
LightLLM is a Python-based LLM (Large Language Model) inference
ChatGLM2-6B: An Open Bilingual Chat LLM
A high-performance ML model serving framework, offers dynamic batching
Mooncake is the serving platform for Kimi
Calculate token/s & GPU memory requirement for any LLM