Real-time NVIDIA GPU dashboard
157 models, 30 providers, one command to find what runs on hardware
High-speed Large Language Model Serving for Local Deployment
How to optimize some algorithm in cuda
LightLLM is a Python-based LLM (Large Language Model) inference
A high-performance ML model serving framework, offers dynamic batching
Mooncake is the serving platform for Kimi
ChatGLM2-6B: An Open Bilingual Chat LLM
Calculate token/s & GPU memory requirement for any LLM