Openai style api for open large language models
Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
The Triton Inference Server provides an optimized cloud
Ready-to-use OCR with 80+ supported languages
Replace OpenAI GPT with another LLM in your app
Everything you need to build state-of-the-art foundation models
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
A Pythonic framework to simplify AI service building
Bring the notion of Model-as-a-Service to life
State-of-the-art diffusion models for image and audio generation
Official inference library for Mistral models
The official Python client for the Huggingface Hub
Unified Model Serving Framework
FlashInfer: Kernel Library for LLM Serving
Simplifies the local serving of AI models from any source
Operating LLMs in production
Training and deploying machine learning models on Amazon SageMaker
Low-latency REST API for serving text-embeddings
Single-cell analysis in Python
An MLOps framework to package, deploy, monitor and manage models
Create HTML profiling reports from pandas DataFrame objects
Python Package for ML-Based Heterogeneous Treatment Effects Estimation
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Phi-3.5 for Mac: Locally-run Vision and Language Models