Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
The official Python client for the Huggingface Hub
A high-performance ML model serving framework, offers dynamic batching
Tensor search for humans
Operating LLMs in production
Uncover insights, surface problems, monitor, and fine tune your LLM
Official inference library for Mistral models
FlashInfer: Kernel Library for LLM Serving
Neural Network Compression Framework for enhanced OpenVINO
Deep learning optimization library: makes distributed training easy
Everything you need to build state-of-the-art foundation models
State-of-the-art diffusion models for image and audio generation
Powering Amazon custom machine learning chips
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Single-cell analysis in Python
Large Language Model Text Generation Inference
Pytorch domain library for recommendation systems
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
PyTorch library of curated Transformer models and their components
A library for accelerating Transformer models on NVIDIA GPUs
Trainable models and NN optimization tools
PyTorch extensions for fast R&D prototyping and Kaggle farming
Library for serving Transformers models on Amazon SageMaker
Lightweight Python library for adding real-time multi-object tracking