Run Local LLMs on Any Device. Open-source
FlashInfer: Kernel Library for LLM Serving
An MLOps framework to package, deploy, monitor and manage models
Pytorch domain library for recommendation systems
Low-latency REST API for serving text-embeddings
Operating LLMs in production
Large Language Model Text Generation Inference
Replace OpenAI GPT with another LLM in your app
Single-cell analysis in Python
Training and deploying machine learning models on Amazon SageMaker
Standardized Serverless ML Inference Platform on Kubernetes
Uncover insights, surface problems, monitor, and fine tune your LLM
Bring the notion of Model-as-a-Service to life
A set of Docker images for training and serving models in TensorFlow
Python Package for ML-Based Heterogeneous Treatment Effects Estimation
A high-performance ML model serving framework, offers dynamic batching
Probabilistic reasoning and statistical analysis in TensorFlow
Tensor search for humans
Database system for building simpler and faster AI-powered application
Serve machine learning models within a Docker container
Sequence-to-sequence framework, focused on Neural Machine Translation
OpenMMLab Video Perception Toolbox
CPU/GPU inference server for Hugging Face transformer models
Deploy a ML inference service on a budget in 10 lines of code