Run Local LLMs on Any Device. Open-source
Everything you need to build state-of-the-art foundation models
A high-throughput and memory-efficient inference and serving engine
FlashInfer: Kernel Library for LLM Serving
A library for accelerating Transformer models on NVIDIA GPUs
Simplifies the local serving of AI models from any source
Lightweight Python library for adding real-time multi-object tracking
AIMET is a library that provides advanced quantization and compression
Easiest and laziest way for building multi-agent LLMs applications
Uncover insights, surface problems, monitor, and fine tune your LLM
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
The official Python client for the Huggingface Hub
Standardized Serverless ML Inference Platform on Kubernetes
Multilingual Automatic Speech Recognition with word-level timestamps
A high-performance ML model serving framework, offers dynamic batching
Optimizing inference proxy for LLMs
Trainable models and NN optimization tools
Efficient few-shot learning with Sentence Transformers
LLM training code for MosaicML foundation models
A unified framework for scalable computing
Uplift modeling and causal inference with machine learning algorithms
Large Language Model Text Generation Inference
Replace OpenAI GPT with another LLM in your app
Create HTML profiling reports from pandas DataFrame objects
Official inference library for Mistral models