Operating LLMs in production
An MLOps framework to package, deploy, monitor and manage models
Training and deploying machine learning models on Amazon SageMaker
On-device Speech Recognition for Apple Silicon
OpenVINO™ Toolkit repository
Replace OpenAI GPT with another LLM in your app
Low-latency REST API for serving text-embeddings
Probabilistic reasoning and statistical analysis in TensorFlow
Easiest and laziest way for building multi-agent LLMs applications
C++ library for high performance inference on NVIDIA GPUs
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
Run serverless GPU workloads with fast cold starts on bare-metal
High-performance neural network inference framework for mobile
lightweight, standalone C++ inference engine for Google's Gemma models
Official inference library for Mistral models
Open-Source AI Camera. Empower any camera/CCTV
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Serve, optimize and scale PyTorch models in production
The Triton Inference Server provides an optimized cloud
A scalable inference server for models optimized with OpenVINO
Superduper: Integrate AI models and machine learning workflows
A toolkit to optimize ML models for deployment for Keras & TensorFlow
Powering Amazon custom machine learning chips
Integrate, train and manage any AI models and APIs with your database
LLM training code for MosaicML foundation models