High-speed Large Language Model Serving for Local Deployment
LLM inference in C/C++
A high-performance ML model serving framework, offers dynamic batching
Run AI models locally on your machine with node.js bindings for llama
Real-time NVIDIA GPU dashboard
Low-latency REST API for serving text-embeddings
Chinese Llama-3 LLMs) developed from Meta Llama 3