A library for accelerating Transformer models on NVIDIA GPUs
OpenMLDB is an open-source machine learning database
A toolkit to optimize ML models for deployment for Keras & TensorFlow
An innovative library for efficient LLM inference
LLM.swift is a simple and readable library
Optimizing inference proxy for LLMs
AICI: Prompts as (Wasm) Programs
Neural Network Compression Framework for enhanced OpenVINO
Fast inference engine for Transformer models
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
GPU environment management and cluster orchestration
Lightweight Python library for adding real-time multi-object tracking
Unofficial (Golang) Go bindings for the Hugging Face Inference API
A unified framework for scalable computing
Build Production-ready Agentic Workflow with Natural Language
Phi-3.5 for Mac: Locally-run Vision and Language Models
Libraries for applying sparsification recipes to neural networks
An easy-to-use LLMs quantization package with user-friendly apis
MII makes low-latency and high-throughput inference possible
AIMET is a library that provides advanced quantization and compression
Powering Amazon custom machine learning chips
Bolt is a deep learning library with high performance
A library to communicate with ChatGPT, Claude, Copilot, Gemini
Sparsity-aware deep learning inference runtime for CPUs
Large Language Model Text Generation Inference