A high-throughput and memory-efficient inference and serving engine
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Large Language Model Text Generation Inference
A high-performance ML model serving framework, offers dynamic batching
FlashInfer: Kernel Library for LLM Serving
Standardized Serverless ML Inference Platform on Kubernetes
Deep learning optimization library: makes distributed training easy
Create HTML profiling reports from pandas DataFrame objects
MII makes low-latency and high-throughput inference possible
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Sparsity-aware deep learning inference runtime for CPUs
Efficient few-shot learning with Sentence Transformers
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Low-latency REST API for serving text-embeddings
PyTorch extensions for fast R&D prototyping and Kaggle farming
Powering Amazon custom machine learning chips
The unofficial python package that returns response of Google Bard
High quality, fast, modular reference implementation of SSD in PyTorch
A computer vision framework to create and deploy apps in minutes
Database system for building simpler and faster AI-powered application
Lightweight anchor-free object detection model