A scalable inference server for models optimized with OpenVINO
Operating LLMs in production
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Easiest and laziest way for building multi-agent LLMs applications
Large Language Model Text Generation Inference
An MLOps framework to package, deploy, monitor and manage models
Standardized Serverless ML Inference Platform on Kubernetes
Library for serving Transformers models on Amazon SageMaker
Openai style api for open large language models
Visual Instruction Tuning: Large Language-and-Vision Assistant
LLM Chatbot Assistant for Openfire server
Serve machine learning models within a Docker container
Toolkit for allowing inference and serving with MXNet in SageMaker
CPU/GPU inference server for Hugging Face transformer models
Deploy a ML inference service on a budget in 10 lines of code