Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
Everything you need to build state-of-the-art foundation models
Official inference library for Mistral models
State-of-the-art diffusion models for image and audio generation
FlashInfer: Kernel Library for LLM Serving
Easiest and laziest way for building multi-agent LLMs applications
Unified Model Serving Framework
A Pythonic framework to simplify AI service building
Low-latency REST API for serving text-embeddings
Simplifies the local serving of AI models from any source
The official Python client for the Huggingface Hub
Data manipulation and transformation for audio signal processing
Training and deploying machine learning models on Amazon SageMaker
Single-cell analysis in Python
Operating LLMs in production
Uncover insights, surface problems, monitor, and fine tune your LLM
A library for accelerating Transformer models on NVIDIA GPUs
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Create HTML profiling reports from pandas DataFrame objects
A set of Docker images for training and serving models in TensorFlow
Large Language Model Text Generation Inference
Visual Instruction Tuning: Large Language-and-Vision Assistant
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
State-of-the-art Parameter-Efficient Fine-Tuning