Libraries for applying sparsification recipes to neural networks
An easy-to-use LLMs quantization package with user-friendly apis
A set of Docker images for training and serving models in TensorFlow
Lightweight Python library for adding real-time multi-object tracking
Bring the notion of Model-as-a-Service to life
Library for OCR-related tasks powered by Deep Learning
OpenMMLab Model Deployment Framework
Optimizing inference proxy for LLMs
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Neural Network Compression Framework for enhanced OpenVINO
Openai style api for open large language models
Sparsity-aware deep learning inference runtime for CPUs
Large Language Model Text Generation Inference
Images to inference with no labeling
A high-performance ML model serving framework, offers dynamic batching
Framework that is dedicated to making neural data processing
Easiest and laziest way for building multi-agent LLMs applications
Efficient few-shot learning with Sentence Transformers
Trainable models and NN optimization tools
Probabilistic reasoning and statistical analysis in TensorFlow
Multilingual Automatic Speech Recognition with word-level timestamps
State-of-the-art diffusion models for image and audio generation
PyTorch extensions for fast R&D prototyping and Kaggle farming
The Triton Inference Server provides an optimized cloud
20+ high-performance LLMs with recipes to pretrain, finetune at scale