A high-throughput and memory-efficient inference and serving engine
The official Python client for the Huggingface Hub
A set of Docker images for training and serving models in TensorFlow
Operating LLMs in production
Uncover insights, surface problems, monitor, and fine tune your LLM
Single-cell analysis in Python
Optimizing inference proxy for LLMs
State-of-the-art diffusion models for image and audio generation
Easiest and laziest way for building multi-agent LLMs applications
Pure C++ implementation of several models for real-time chatting
Connect home devices into a powerful cluster to accelerate LLM
A RWKV management and startup tool, full automation, only 8MB
Open-Source AI Camera. Empower any camera/CCTV
Standardized Serverless ML Inference Platform on Kubernetes
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
A GPU-accelerated library containing highly optimized building blocks
Replace OpenAI GPT with another LLM in your app
A unified framework for scalable computing
Powering Amazon custom machine learning chips
On-device AI across mobile, embedded and edge for PyTorch
Phi-3.5 for Mac: Locally-run Vision and Language Models
PArallel Distributed Deep LEarning: Machine Learning Framework
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Create HTML profiling reports from pandas DataFrame objects
MII makes low-latency and high-throughput inference possible