A toolkit to optimize ML models for deployment for Keras & TensorFlow
Ready-to-use OCR with 80+ supported languages
AIMET is a library that provides advanced quantization and compression
Bring the notion of Model-as-a-Service to life
Uncover insights, surface problems, monitor, and fine tune your LLM
A high-performance ML model serving framework, offers dynamic batching
Everything you need to build state-of-the-art foundation models
Unified Model Serving Framework
Trainable models and NN optimization tools
Neural Network Compression Framework for enhanced OpenVINO
Simplifies the local serving of AI models from any source
The Triton Inference Server provides an optimized cloud
Library for serving Transformers models on Amazon SageMaker
Official inference library for Mistral models
Run Local LLMs on Any Device. Open-source
A unified framework for scalable computing
An MLOps framework to package, deploy, monitor and manage models
PyTorch extensions for fast R&D prototyping and Kaggle farming
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Libraries for applying sparsification recipes to neural networks
Standardized Serverless ML Inference Platform on Kubernetes
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Probabilistic reasoning and statistical analysis in TensorFlow
State-of-the-art Parameter-Efficient Fine-Tuning
Integrate, train and manage any AI models and APIs with your database