Openai style api for open large language models
Run Local LLMs on Any Device. Open-source
A library for accelerating Transformer models on NVIDIA GPUs
A high-throughput and memory-efficient inference and serving engine
Ready-to-use OCR with 80+ supported languages
The Triton Inference Server provides an optimized cloud
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Operating LLMs in production
FlashInfer: Kernel Library for LLM Serving
Library for OCR-related tasks powered by Deep Learning
Everything you need to build state-of-the-art foundation models
Training and deploying machine learning models on Amazon SageMaker
Neural Network Compression Framework for enhanced OpenVINO
State-of-the-art diffusion models for image and audio generation
AIMET is a library that provides advanced quantization and compression
The official Python client for the Huggingface Hub
DoWhy is a Python library for causal inference
Official inference library for Mistral models
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Single-cell analysis in Python
Pytorch domain library for recommendation systems
Standardized Serverless ML Inference Platform on Kubernetes
Trainable models and NN optimization tools
Uncover insights, surface problems, monitor, and fine tune your LLM
Efficient few-shot learning with Sentence Transformers