Run Local LLMs on Any Device. Open-source
Ready-to-use OCR with 80+ supported languages
A high-throughput and memory-efficient inference and serving engine
Operating LLMs in production
A high-performance ML model serving framework, offers dynamic batching
Tensor search for humans
The official Python client for the Huggingface Hub
Everything you need to build state-of-the-art foundation models
Official inference library for Mistral models
FlashInfer: Kernel Library for LLM Serving
Deep learning optimization library: makes distributed training easy
State-of-the-art diffusion models for image and audio generation
Single-cell analysis in Python
Powering Amazon custom machine learning chips
Bring the notion of Model-as-a-Service to life
Create HTML profiling reports from pandas DataFrame objects
Large Language Model Text Generation Inference
The Triton Inference Server provides an optimized cloud
PyTorch library of curated Transformer models and their components
Multilingual Automatic Speech Recognition with word-level timestamps
A library for accelerating Transformer models on NVIDIA GPUs
Trainable models and NN optimization tools
Library for OCR-related tasks powered by Deep Learning
Pytorch domain library for recommendation systems
PyTorch extensions for fast R&D prototyping and Kaggle farming