A high-throughput and memory-efficient inference and serving engine
Everything you need to build state-of-the-art foundation models
Ready-to-use OCR with 80+ supported languages
Official inference library for Mistral models
Training and deploying machine learning models on Amazon SageMaker
The official Python client for the Huggingface Hub
Library for OCR-related tasks powered by Deep Learning
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
FlashInfer: Kernel Library for LLM Serving
Efficient few-shot learning with Sentence Transformers
A library for accelerating Transformer models on NVIDIA GPUs
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Standardized Serverless ML Inference Platform on Kubernetes
Neural Network Compression Framework for enhanced OpenVINO
Large Language Model Text Generation Inference
Trainable models and NN optimization tools
Bring the notion of Model-as-a-Service to life
Library for serving Transformers models on Amazon SageMaker
State-of-the-art Parameter-Efficient Fine-Tuning
A set of Docker images for training and serving models in TensorFlow
Integrate, train and manage any AI models and APIs with your database
Unified Model Serving Framework
Powering Amazon custom machine learning chips
Libraries for applying sparsification recipes to neural networks
Gaussian processes in TensorFlow