Sparsity-aware deep learning inference runtime for CPUs
Large Language Model Text Generation Inference
Unified Model Serving Framework
Data manipulation and transformation for audio signal processing
Neural Network Compression Framework for enhanced OpenVINO
Standardized Serverless ML Inference Platform on Kubernetes
Efficient few-shot learning with Sentence Transformers
Bring the notion of Model-as-a-Service to life
Libraries for applying sparsification recipes to neural networks
An easy-to-use LLMs quantization package with user-friendly apis
Openai style api for open large language models
The Triton Inference Server provides an optimized cloud
A Unified Library for Parameter-Efficient Learning
Lightweight Python library for adding real-time multi-object tracking
Integrate, train and manage any AI models and APIs with your database
Library for serving Transformers models on Amazon SageMaker
A toolkit to optimize ML models for deployment for Keras & TensorFlow
A unified framework for scalable computing
OpenMMLab Model Deployment Framework
Framework that is dedicated to making neural data processing
Database system for building simpler and faster AI-powered application
Framework for Accelerating LLM Generation with Multiple Decoding Heads
Toolkit for allowing inference and serving with MXNet in SageMaker