A library to communicate with ChatGPT, Claude, Copilot, Gemini
Efficient few-shot learning with Sentence Transformers
Uplift modeling and causal inference with machine learning algorithms
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Python Package for ML-Based Heterogeneous Treatment Effects Estimation
Operating LLMs in production
The Triton Inference Server provides an optimized cloud
A high-performance ML model serving framework, offers dynamic batching
Open-source tool designed to enhance the efficiency of workloads
FlashInfer: Kernel Library for LLM Serving
A Unified Library for Parameter-Efficient Learning
Trainable models and NN optimization tools
Probabilistic reasoning and statistical analysis in TensorFlow
Uncover insights, surface problems, monitor, and fine tune your LLM
Multilingual Automatic Speech Recognition with word-level timestamps
State-of-the-art diffusion models for image and audio generation
OpenMMLab Model Deployment Framework
Optimizing inference proxy for LLMs
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
A lightweight vision library for performing large object detection
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
PyTorch extensions for fast R&D prototyping and Kaggle farming
Unified Model Serving Framework
PyTorch library of curated Transformer models and their components
State-of-the-art Parameter-Efficient Fine-Tuning