Openai style api for open large language models
Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
Ready-to-use OCR with 80+ supported languages
Everything you need to build state-of-the-art foundation models
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
FlashInfer: Kernel Library for LLM Serving
Official inference library for Mistral models
Bring the notion of Model-as-a-Service to life
Easiest and laziest way for building multi-agent LLMs applications
Simplifies the local serving of AI models from any source
The official Python client for the Huggingface Hub
A Pythonic framework to simplify AI service building
State-of-the-art diffusion models for image and audio generation
Operating LLMs in production
Unified Model Serving Framework
Low-latency REST API for serving text-embeddings
An MLOps framework to package, deploy, monitor and manage models
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Data manipulation and transformation for audio signal processing
Single-cell analysis in Python
Training and deploying machine learning models on Amazon SageMaker
The Triton Inference Server provides an optimized cloud
Uncover insights, surface problems, monitor, and fine tune your LLM
Easy-to-use Speech Toolkit including Self-Supervised Learning model