High quality, fast, modular reference implementation of SSD in PyTorch
Serve machine learning models within a Docker container
Toolkit for allowing inference and serving with MXNet in SageMaker
LLMs and Machine Learning done easily
A high-performance inference system for large language models
Sequence-to-sequence framework, focused on Neural Machine Translation
Libraries for applying sparsification recipes to neural networks
Uniform deep learning inference framework for mobile
A toolkit to optimize ML models for deployment for Keras & TensorFlow
Probabilistic reasoning and statistical analysis in TensorFlow
Fast and user-friendly runtime for transformer inference
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
Open-source tool designed to enhance the efficiency of workloads
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model