Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
A library for accelerating Transformer models on NVIDIA GPUs
Deep learning optimization library: makes distributed training easy
GPU environment management and cluster orchestration
The official Python client for the Huggingface Hub
Everything you need to build state-of-the-art foundation models
State-of-the-art diffusion models for image and audio generation
Training and deploying machine learning models on Amazon SageMaker
Standardized Serverless ML Inference Platform on Kubernetes
Replace OpenAI GPT with another LLM in your app
AIMET is a library that provides advanced quantization and compression
FlashInfer: Kernel Library for LLM Serving
Neural Network Compression Framework for enhanced OpenVINO
Operating LLMs in production
Multilingual Automatic Speech Recognition with word-level timestamps
Uncover insights, surface problems, monitor, and fine tune your LLM
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
A set of Docker images for training and serving models in TensorFlow
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Libraries for applying sparsification recipes to neural networks
Gaussian processes in TensorFlow
Single-cell analysis in Python
Optimizing inference proxy for LLMs
Openai style api for open large language models