Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
A library for accelerating Transformer models on NVIDIA GPUs
Everything you need to build state-of-the-art foundation models
FlashInfer: Kernel Library for LLM Serving
AIMET is a library that provides advanced quantization and compression
Standardized Serverless ML Inference Platform on Kubernetes
Single-cell analysis in Python
Powering Amazon custom machine learning chips
Uncover insights, surface problems, monitor, and fine tune your LLM
Trainable models and NN optimization tools
Efficient few-shot learning with Sentence Transformers
Integrate, train and manage any AI models and APIs with your database
DoWhy is a Python library for causal inference
Training and deploying machine learning models on Amazon SageMaker
A high-performance ML model serving framework, offers dynamic batching
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
The official Python client for the Huggingface Hub
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Data manipulation and transformation for audio signal processing
A Pythonic framework to simplify AI service building
PyTorch library of curated Transformer models and their components
Deep learning optimization library: makes distributed training easy
Python Package for ML-Based Heterogeneous Treatment Effects Estimation
Low-latency REST API for serving text-embeddings