C++ library for high performance inference on NVIDIA GPUs
A high-performance inference system for large language models
A high-throughput and memory-efficient inference and serving engine
Port of OpenAI's Whisper model in C/C++
High-performance neural network inference framework for mobile
20+ high-performance LLMs with recipes to pretrain, finetune at scale
A scalable inference server for models optimized with OpenVINO
ONNX Runtime: cross-platform, high performance ML inferencing
Large Language Model Text Generation Inference
Build Production-ready Agentic Workflow with Natural Language
Connect home devices into a powerful cluster to accelerate LLM
On-device AI across mobile, embedded and edge for PyTorch
Run serverless GPU workloads with fast cold starts on bare-metal
Bolt is a deep learning library with high performance
OpenMLDB is an open-source machine learning database
A high-performance ML model serving framework, offers dynamic batching
FlashInfer: Kernel Library for LLM Serving
Standardized Serverless ML Inference Platform on Kubernetes
Deep learning optimization library: makes distributed training easy
Create HTML profiling reports from pandas DataFrame objects
Serving system for machine learning models
MII makes low-latency and high-throughput inference possible
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
An Open-Source Programming Framework for Agentic AI
Sparsity-aware deep learning inference runtime for CPUs