API-for-Open-LLM is a lightweight API server designed for deploying and serving open large language models (LLMs), offering a simple way to integrate LLMs into applications.
Text Generation Inference is a high-performance inference server for text generation models, optimized for Hugging Face's Transformers. It is designed to serve large language models efficiently with optimizations for performance and scalability.