Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
The official Python client for the Huggingface Hub
A high-performance ML model serving framework, offers dynamic batching
Tensor search for humans
Everything you need to build state-of-the-art foundation models
Uncover insights, surface problems, monitor, and fine tune your LLM
Official inference library for Mistral models
FlashInfer: Kernel Library for LLM Serving
Deep learning optimization library: makes distributed training easy
Neural Network Compression Framework for enhanced OpenVINO
State-of-the-art diffusion models for image and audio generation
Powering Amazon custom machine learning chips
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Single-cell analysis in Python
Large Language Model Text Generation Inference
Pytorch domain library for recommendation systems
A Pythonic framework to simplify AI service building
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
Operating LLMs in production
A library for accelerating Transformer models on NVIDIA GPUs
Trainable models and NN optimization tools
Library for serving Transformers models on Amazon SageMaker
Lightweight Python library for adding real-time multi-object tracking
A unified framework for scalable computing