A high-throughput and memory-efficient inference and serving engine
Official inference library for Mistral models
Everything you need to build state-of-the-art foundation models
Ready-to-use OCR with 80+ supported languages
Unified Model Serving Framework
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
FlashInfer: Kernel Library for LLM Serving
Easiest and laziest way for building multi-agent LLMs applications
Training and deploying machine learning models on Amazon SageMaker
A Pythonic framework to simplify AI service building
The official Python client for the Huggingface Hub
Bring the notion of Model-as-a-Service to life
State-of-the-art diffusion models for image and audio generation
Large Language Model Text Generation Inference
A set of Docker images for training and serving models in TensorFlow
Operating LLMs in production
Visual Instruction Tuning: Large Language-and-Vision Assistant
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Libraries for applying sparsification recipes to neural networks
Gaussian processes in TensorFlow
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Optimizing inference proxy for LLMs
Neural Network Compression Framework for enhanced OpenVINO
Openai style api for open large language models
Images to inference with no labeling