The official Python client for the Huggingface Hub
A set of Docker images for training and serving models in TensorFlow
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
Open-Source AI Camera. Empower any camera/CCTV
Single-cell analysis in Python
State-of-the-art diffusion models for image and audio generation
Optimizing inference proxy for LLMs
GPU environment management and cluster orchestration
Standardized Serverless ML Inference Platform on Kubernetes
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
A unified framework for scalable computing
Pure C++ implementation of several models for real-time chatting
On-device AI across mobile, embedded and edge for PyTorch
Phi-3.5 for Mac: Locally-run Vision and Language Models
Operating LLMs in production
A GPU-accelerated library containing highly optimized building blocks
Replace OpenAI GPT with another LLM in your app
Create HTML profiling reports from pandas DataFrame objects
MII makes low-latency and high-throughput inference possible
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
Connect home devices into a powerful cluster to accelerate LLM
Bayesian inference with probabilistic programming
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
AI interface for tinkerers (Ollama, Haystack RAG, Python)
A high-performance inference system for large language models