A library to communicate with ChatGPT, Claude, Copilot, Gemini
A high-performance ML model serving framework, offers dynamic batching
A unified framework for scalable computing
An easy-to-use LLMs quantization package with user-friendly apis
Official inference library for Mistral models
Neural Network Compression Framework for enhanced OpenVINO
A Pythonic framework to simplify AI service building
Uncover insights, surface problems, monitor, and fine tune your LLM
Pytorch domain library for recommendation systems
PyTorch extensions for fast R&D prototyping and Kaggle farming
The Triton Inference Server provides an optimized cloud
Lightweight Python library for adding real-time multi-object tracking
Library for OCR-related tasks powered by Deep Learning
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
MII makes low-latency and high-throughput inference possible
Superduper: Integrate AI models and machine learning workflows
Libraries for applying sparsification recipes to neural networks
PyTorch library of curated Transformer models and their components
State-of-the-art Parameter-Efficient Fine-Tuning
Sparsity-aware deep learning inference runtime for CPUs
Large Language Model Text Generation Inference
Trainable models and NN optimization tools
Probabilistic reasoning and statistical analysis in TensorFlow
A set of Docker images for training and serving models in TensorFlow
Build your chatbot within minutes on your favorite device