Open platform for training, serving, and evaluating language models
Standardized Serverless ML Inference Platform on Kubernetes
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
High quality, fast, modular reference implementation of SSD in PyTorch
Database system for building simpler and faster AI-powered application
Deep learning optimization library: makes distributed training easy
AIMET is a library that provides advanced quantization and compression
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
LLMFlows - Simple, Explicit and Transparent LLM Apps
Serve machine learning models within a Docker container
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
Framework for Accelerating LLM Generation with Multiple Decoding Heads
A computer vision framework to create and deploy apps in minutes
Run 100B+ language models at home, BitTorrent-style
A graphical manager for ollama that can manage your LLMs
Toolbox of models, callbacks, and datasets for AI/ML researchers
Implementation of "Tree of Thoughts
Implementation of model parallel autoregressive transformers on GPUs
Sequence-to-sequence framework, focused on Neural Machine Translation
Training & Implementation of chatbots leveraging GPT-like architecture
OpenMMLab Video Perception Toolbox
Guide to deploying deep-learning inference networks
Toolkit for allowing inference and serving with MXNet in SageMaker
CPU/GPU inference server for Hugging Face transformer models
Deploy a ML inference service on a budget in 10 lines of code