Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
Single-cell analysis in Python
Official inference library for Mistral models
Easiest and laziest way for building multi-agent LLMs applications
A Pythonic framework to simplify AI service building
Unified Model Serving Framework
State-of-the-art diffusion models for image and audio generation
FlashInfer: Kernel Library for LLM Serving
Low-latency REST API for serving text-embeddings
Everything you need to build state-of-the-art foundation models
Simplifies the local serving of AI models from any source
The official Python client for the Huggingface Hub
Data manipulation and transformation for audio signal processing
Training and deploying machine learning models on Amazon SageMaker
An MLOps framework to package, deploy, monitor and manage models
Operating LLMs in production
Uncover insights, surface problems, monitor, and fine tune your LLM
A library for accelerating Transformer models on NVIDIA GPUs
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
A set of Docker images for training and serving models in TensorFlow
Visual Instruction Tuning: Large Language-and-Vision Assistant
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
State-of-the-art Parameter-Efficient Fine-Tuning
Python Package for ML-Based Heterogeneous Treatment Effects Estimation