A high-performance ML model serving framework, offers dynamic batching
A set of Docker images for training and serving models in TensorFlow
Simplifies the local serving of AI models from any source
Multilingual Automatic Speech Recognition with word-level timestamps
Standardized Serverless ML Inference Platform on Kubernetes
Sparsity-aware deep learning inference runtime for CPUs
AIMET is a library that provides advanced quantization and compression
A unified framework for scalable computing
Low-latency REST API for serving text-embeddings
Openai style api for open large language models
Tensor search for humans
Open platform for training, serving, and evaluating language models
High quality, fast, modular reference implementation of SSD in PyTorch
OpenMMLab Model Deployment Framework
A computer vision framework to create and deploy apps in minutes
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
Toolkit for allowing inference and serving with MXNet in SageMaker
CPU/GPU inference server for Hugging Face transformer models