Openai style api for open large language models
Everything you need to build state-of-the-art foundation models
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
A toolkit to optimize ML models for deployment for Keras & TensorFlow
Uncover insights, surface problems, monitor, and fine tune your LLM
An MLOps framework to package, deploy, monitor and manage models
Probabilistic reasoning and statistical analysis in TensorFlow
Trainable models and NN optimization tools
Run Local LLMs on Any Device. Open-source
Adversarial Robustness Toolbox (ART) - Python Library for ML security
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
FlashInfer: Kernel Library for LLM Serving
Ready-to-use OCR with 80+ supported languages
A high-throughput and memory-efficient inference and serving engine
Simplifies the local serving of AI models from any source
A library for accelerating Transformer models on NVIDIA GPUs
Library for OCR-related tasks powered by Deep Learning
Bring the notion of Model-as-a-Service to life
Powering Amazon custom machine learning chips
The Triton Inference Server provides an optimized cloud
Optimizing inference proxy for LLMs
AIMET is a library that provides advanced quantization and compression
Lightweight Python library for adding real-time multi-object tracking
LLM training code for MosaicML foundation models
Build your chatbot within minutes on your favorite device