Run Local LLMs on Any Device. Open-source
Operating LLMs in production
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
An easy-to-use LLMs quantization package with user-friendly apis
Optimizing inference proxy for LLMs
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Openai style api for open large language models
20+ high-performance LLMs with recipes to pretrain, finetune at scale
State-of-the-art Parameter-Efficient Fine-Tuning
Build your chatbot within minutes on your favorite device
FlashInfer: Kernel Library for LLM Serving
Large Language Model Text Generation Inference
Sparsity-aware deep learning inference runtime for CPUs
LLM training code for MosaicML foundation models
Easiest and laziest way for building multi-agent LLMs applications
Uncover insights, surface problems, monitor, and fine tune your LLM
Superduper: Integrate AI models and machine learning workflows
The unofficial python package that returns response of Google Bard
PyTorch library of curated Transformer models and their components
A graphical manager for ollama that can manage your LLMs
LLMFlows - Simple, Explicit and Transparent LLM Apps
Framework for Accelerating LLM Generation with Multiple Decoding Heads