Openai style api for open large language models
Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
Ready-to-use OCR with 80+ supported languages
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Official inference library for Mistral models
Single-cell analysis in Python
Bring the notion of Model-as-a-Service to life
FlashInfer: Kernel Library for LLM Serving
An MLOps framework to package, deploy, monitor and manage models
Easiest and laziest way for building multi-agent LLMs applications
Everything you need to build state-of-the-art foundation models
Simplifies the local serving of AI models from any source
A Pythonic framework to simplify AI service building
The official Python client for the Huggingface Hub
Unified Model Serving Framework
Operating LLMs in production
State-of-the-art diffusion models for image and audio generation
Low-latency REST API for serving text-embeddings
Data manipulation and transformation for audio signal processing
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Training and deploying machine learning models on Amazon SageMaker
The Triton Inference Server provides an optimized cloud
Uncover insights, surface problems, monitor, and fine tune your LLM
Easy-to-use Speech Toolkit including Self-Supervised Learning model