Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
A library for accelerating Transformer models on NVIDIA GPUs
Everything you need to build state-of-the-art foundation models
GPU environment management and cluster orchestration
Training and deploying machine learning models on Amazon SageMaker
The official Python client for the Huggingface Hub
Deep learning optimization library: makes distributed training easy
State-of-the-art diffusion models for image and audio generation
Neural Network Compression Framework for enhanced OpenVINO
Standardized Serverless ML Inference Platform on Kubernetes
20+ high-performance LLMs with recipes to pretrain, finetune at scale
A Pythonic framework to simplify AI service building
Replace OpenAI GPT with another LLM in your app
Operating LLMs in production
AIMET is a library that provides advanced quantization and compression
Multilingual Automatic Speech Recognition with word-level timestamps
Libraries for applying sparsification recipes to neural networks
Gaussian processes in TensorFlow
Single-cell analysis in Python
Phi-3.5 for Mac: Locally-run Vision and Language Models
Optimizing inference proxy for LLMs
Openai style api for open large language models
Sparsity-aware deep learning inference runtime for CPUs
Large Language Model Text Generation Inference