Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
A library for accelerating Transformer models on NVIDIA GPUs
The official Python client for the Huggingface Hub
GPU environment management and cluster orchestration
Deep learning optimization library: makes distributed training easy
State-of-the-art diffusion models for image and audio generation
Training and deploying machine learning models on Amazon SageMaker
Standardized Serverless ML Inference Platform on Kubernetes
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Bring the notion of Model-as-a-Service to life
Neural Network Compression Framework for enhanced OpenVINO
AIMET is a library that provides advanced quantization and compression
Operating LLMs in production
Uncover insights, surface problems, monitor, and fine tune your LLM
A set of Docker images for training and serving models in TensorFlow
Phi-3.5 for Mac: Locally-run Vision and Language Models
Everything you need to build state-of-the-art foundation models
Libraries for applying sparsification recipes to neural networks
Gaussian processes in TensorFlow
Single-cell analysis in Python
Optimizing inference proxy for LLMs
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Openai style api for open large language models
Sparsity-aware deep learning inference runtime for CPUs