State-of-the-art diffusion models for image and audio generation
FlashInfer: Kernel Library for LLM Serving
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Neural Network Compression Framework for enhanced OpenVINO
A library for accelerating Transformer models on NVIDIA GPUs
The official Python client for the Huggingface Hub
Data manipulation and transformation for audio signal processing
Python Package for ML-Based Heterogeneous Treatment Effects Estimation
PyTorch library of curated Transformer models and their components
MII makes low-latency and high-throughput inference possible
A lightweight vision library for performing large object detection
Superduper: Integrate AI models and machine learning workflows
Integrate, train and manage any AI models and APIs with your database
The unofficial python package that returns response of Google Bard
Replace OpenAI GPT with another LLM in your app
Framework that is dedicated to making neural data processing
LLM training code for MosaicML foundation models
Large Language Model Text Generation Inference
Training and deploying machine learning models on Amazon SageMaker
LLMFlows - Simple, Explicit and Transparent LLM Apps
The Triton Inference Server provides an optimized cloud
Sparsity-aware deep learning inference runtime for CPUs
Multilingual Automatic Speech Recognition with word-level timestamps
A high-performance ML model serving framework, offers dynamic batching
Bring the notion of Model-as-a-Service to life