A lightweight vision library for performing large object detection
The Triton Inference Server provides an optimized cloud
Uplift modeling and causal inference with machine learning algorithms
LLM training code for MosaicML foundation models
Neural Network Compression Framework for enhanced OpenVINO
Unified Model Serving Framework
Build your chatbot within minutes on your favorite device
The official Python client for the Huggingface Hub
Standardized Serverless ML Inference Platform on Kubernetes
Simplifies the local serving of AI models from any source
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
GPU environment management and cluster orchestration
Probabilistic reasoning and statistical analysis in TensorFlow
Phi-3.5 for Mac: Locally-run Vision and Language Models
Libraries for applying sparsification recipes to neural networks
An easy-to-use LLMs quantization package with user-friendly apis
Gaussian processes in TensorFlow
MII makes low-latency and high-throughput inference possible
Sparsity-aware deep learning inference runtime for CPUs
Large Language Model Text Generation Inference
Images to inference with no labeling
Efficient few-shot learning with Sentence Transformers
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Open platform for training, serving, and evaluating language models
Deep learning optimization library: makes distributed training easy