A high-throughput and memory-efficient inference and serving engine
Port of OpenAI's Whisper model in C/C++
High-performance neural network inference framework for mobile
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Connect home devices into a powerful cluster to accelerate LLM
ONNX Runtime: cross-platform, high performance ML inferencing
Large Language Model Text Generation Inference
Build Production-ready Agentic Workflow with Natural Language
On-device AI across mobile, embedded and edge for PyTorch
Run serverless GPU workloads with fast cold starts on bare-metal
Bolt is a deep learning library with high performance
OpenMLDB is an open-source machine learning database
A high-performance ML model serving framework, offers dynamic batching
Standardized Serverless ML Inference Platform on Kubernetes
Deep learning optimization library: makes distributed training easy
Create HTML profiling reports from pandas DataFrame objects
Serving system for machine learning models
MII makes low-latency and high-throughput inference possible
An Open-Source Programming Framework for Agentic AI
Sparsity-aware deep learning inference runtime for CPUs
lightweight, standalone C++ inference engine for Google's Gemma models
Efficient few-shot learning with Sentence Transformers
PArallel Distributed Deep LEarning: Machine Learning Framework
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Low-latency REST API for serving text-embeddings