Run Local LLMs on Any Device. Open-source
FlashInfer: Kernel Library for LLM Serving
A high-throughput and memory-efficient inference and serving engine
Everything you need to build state-of-the-art foundation models
A library for accelerating Transformer models on NVIDIA GPUs
Simplifies the local serving of AI models from any source
AIMET is a library that provides advanced quantization and compression
Optimizing inference proxy for LLMs
Lightweight Python library for adding real-time multi-object tracking
Easiest and laziest way for building multi-agent LLMs applications
An MLOps framework to package, deploy, monitor and manage models
Create HTML profiling reports from pandas DataFrame objects
Large Language Model Text Generation Inference
DoWhy is a Python library for causal inference
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Single-cell analysis in Python
Replace OpenAI GPT with another LLM in your app
LLM training code for MosaicML foundation models
Efficient few-shot learning with Sentence Transformers
Multilingual Automatic Speech Recognition with word-level timestamps
Standardized Serverless ML Inference Platform on Kubernetes
Neural Network Compression Framework for enhanced OpenVINO
Python Package for ML-Based Heterogeneous Treatment Effects Estimation
A high-performance ML model serving framework, offers dynamic batching