Get inferencing running on Kubernetes: LLMs, Embeddings, Speech-to-Text. KubeAI serves an OpenAI compatible HTTP API. Admins can configure ML models by using the Model Kubernetes Custom Resources. KubeAI can be thought of as a Model Operator (See Operator Pattern) that manages vLLM and Ollama servers.
Features
- Drop-in replacement for OpenAI with API compatibility
- Serve top OSS models (LLMs, Whisper, etc.)
- Multi-platform: CPU-only, GPU, coming soon: TPU
- Scale from zero, autoscale based on load
- Zero dependencies (does not depend on Istio, Knative, etc.)
- Chat UI included (OpenWebUI)
- Operates OSS model servers (vLLM, Ollama, FasterWhisper, Infinity)
- Stream/batch inference via messaging integrations (Kafka, PubSub, etc.)
License
Apache License V2.0Follow KubeAI
Other Useful Business Software
Go From AI Idea to AI App Fast
Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of KubeAI!