A set of Docker images for training and serving models in TensorFlow
A high-performance ML model serving framework, offers dynamic batching
Sparsity-aware deep learning inference runtime for CPUs
Simplifies the local serving of AI models from any source
Openai style api for open large language models
Library for OCR-related tasks powered by Deep Learning
Tensor search for humans
Multilingual Automatic Speech Recognition with word-level timestamps
Low-latency REST API for serving text-embeddings
Standardized Serverless ML Inference Platform on Kubernetes
Lightweight Python library for adding real-time multi-object tracking
AIMET is a library that provides advanced quantization and compression
Python Package for ML-Based Heterogeneous Treatment Effects Estimation
The Triton Inference Server provides an optimized cloud
A unified framework for scalable computing
Open platform for training, serving, and evaluating language models
Uplift modeling and causal inference with machine learning algorithms
High quality, fast, modular reference implementation of SSD in PyTorch
A computer vision framework to create and deploy apps in minutes
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
Lightweight anchor-free object detection model
Sequence-to-sequence framework, focused on Neural Machine Translation
OpenMMLab Video Perception Toolbox
Toolkit for allowing inference and serving with MXNet in SageMaker
CPU/GPU inference server for Hugging Face transformer models