Run serverless GPU workloads with fast cold starts on bare-metal
The free, Open Source alternative to OpenAI, Claude and others
Pure C++ implementation of several models for real-time chatting
Standardized Serverless ML Inference Platform on Kubernetes
A GPU-accelerated library containing highly optimized building blocks
Fast inference engine for Transformer models
Run Local LLMs on Any Device. Open-source
Training and deploying machine learning models on Amazon SageMaker
State-of-the-art Parameter-Efficient Fine-Tuning
Deep Learning API and Server in C++14 support for Caffe, PyTorch
MNN is a blazing fast, lightweight deep learning framework
Low-latency REST API for serving text-embeddings
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
A set of Docker images for training and serving models in TensorFlow
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
PyTorch extensions for fast R&D prototyping and Kaggle farming
GPU environment management and cluster orchestration
MII makes low-latency and high-throughput inference possible
C++ library for high performance inference on NVIDIA GPUs
High-performance neural network inference framework for mobile
OpenVINO™ Toolkit repository
Large Language Model Text Generation Inference
A library for accelerating Transformer models on NVIDIA GPUs
lightweight, standalone C++ inference engine for Google's Gemma models
Private Open AI on Kubernetes