A high-performance inference system for large language models
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
LLMs as Copilots for Theorem Proving in Lean
Unofficial (Golang) Go bindings for the Hugging Face Inference API
Framework which allows you transform your Vector Database
Libraries for applying sparsification recipes to neural networks
An easy-to-use LLMs quantization package with user-friendly apis
Gaussian processes in TensorFlow
Open platform for training, serving, and evaluating language models
An innovative library for efficient LLM inference
Neural Network Compression Framework for enhanced OpenVINO
Openai style api for open large language models
Sparsity-aware deep learning inference runtime for CPUs
Large Language Model Text Generation Inference
Images to inference with no labeling
lightweight, standalone C++ inference engine for Google's Gemma models
Efficient few-shot learning with Sentence Transformers
The official Python client for the Huggingface Hub
Pytorch domain library for recommendation systems
Run serverless GPU workloads with fast cold starts on bare-metal
Official inference library for Mistral models
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Data manipulation and transformation for audio signal processing
A Pythonic framework to simplify AI service building
Adversarial Robustness Toolbox (ART) - Python Library for ML security