Ready-to-use OCR with 80+ supported languages
Uncover insights, surface problems, monitor, and fine tune your LLM
AIMET is a library that provides advanced quantization and compression
Bring the notion of Model-as-a-Service to life
A high-performance ML model serving framework, offers dynamic batching
Unified Model Serving Framework
Neural Network Compression Framework for enhanced OpenVINO
Everything you need to build state-of-the-art foundation models
Run Local LLMs on Any Device. Open-source
Trainable models and NN optimization tools
A toolkit to optimize ML models for deployment for Keras & TensorFlow
The Triton Inference Server provides an optimized cloud
PyTorch extensions for fast R&D prototyping and Kaggle farming
Library for serving Transformers models on Amazon SageMaker
Libraries for applying sparsification recipes to neural networks
Official inference library for Mistral models
Simplifies the local serving of AI models from any source
Probabilistic reasoning and statistical analysis in TensorFlow
Standardized Serverless ML Inference Platform on Kubernetes
State-of-the-art Parameter-Efficient Fine-Tuning
An MLOps framework to package, deploy, monitor and manage models
A unified framework for scalable computing
PyTorch library of curated Transformer models and their components
Deep learning optimization library: makes distributed training easy
The official Python client for the Huggingface Hub