The Triton Inference Server provides an optimized cloud
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
A Pythonic framework to simplify AI service building
Library for serving Transformers models on Amazon SageMaker
Ready-to-use OCR with 80+ supported languages
State-of-the-art diffusion models for image and audio generation
Bring the notion of Model-as-a-Service to life
Easiest and laziest way for building multi-agent LLMs applications
Powering Amazon custom machine learning chips
LLM training code for MosaicML foundation models
A high-performance ML model serving framework, offers dynamic batching
GPU environment management and cluster orchestration
A unified framework for scalable computing
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Large Language Model Text Generation Inference
Library for OCR-related tasks powered by Deep Learning
Openai style api for open large language models
Open-source tool designed to enhance the efficiency of workloads
A library for accelerating Transformer models on NVIDIA GPUs
Lightweight Python library for adding real-time multi-object tracking
PyTorch library of curated Transformer models and their components
Unified Model Serving Framework
Low-latency REST API for serving text-embeddings
Standardized Serverless ML Inference Platform on Kubernetes
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction