State-of-the-art Parameter-Efficient Fine-Tuning
A high-performance ML model serving framework, offers dynamic batching
LLM training code for MosaicML foundation models
Easiest and laziest way for building multi-agent LLMs applications
Bring the notion of Model-as-a-Service to life
Optimizing inference proxy for LLMs
A Unified Library for Parameter-Efficient Learning
Low-latency REST API for serving text-embeddings
Probabilistic reasoning and statistical analysis in TensorFlow
FlashInfer: Kernel Library for LLM Serving
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Library for OCR-related tasks powered by Deep Learning
Build your chatbot within minutes on your favorite device
Tensor search for humans
LLMFlows - Simple, Explicit and Transparent LLM Apps
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
Run 100B+ language models at home, BitTorrent-style
Framework for Accelerating LLM Generation with Multiple Decoding Heads
A computer vision framework to create and deploy apps in minutes
Implementation of "Tree of Thoughts
Implementation of model parallel autoregressive transformers on GPUs
CPU/GPU inference server for Hugging Face transformer models