Openai style api for open large language models
Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
Ready-to-use OCR with 80+ supported languages
Everything you need to build state-of-the-art foundation models
Bring the notion of Model-as-a-Service to life
Simplifies the local serving of AI models from any source
Library for OCR-related tasks powered by Deep Learning
FlashInfer: Kernel Library for LLM Serving
The official Python client for the Huggingface Hub
State-of-the-art diffusion models for image and audio generation
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Uncover insights, surface problems, monitor, and fine tune your LLM
Optimizing inference proxy for LLMs
A set of Docker images for training and serving models in TensorFlow
GPU environment management and cluster orchestration
Sparsity-aware deep learning inference runtime for CPUs
Easy-to-use Speech Toolkit including Self-Supervised Learning model
A Pythonic framework to simplify AI service building
State-of-the-art Parameter-Efficient Fine-Tuning
Open-source tool designed to enhance the efficiency of workloads
An MLOps framework to package, deploy, monitor and manage models
Operating LLMs in production
A library for accelerating Transformer models on NVIDIA GPUs
LLM training code for MosaicML foundation models