Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Unofficial (Golang) Go bindings for the Hugging Face Inference API
Phi-3.5 for Mac: Locally-run Vision and Language Models
LLMs and Machine Learning done easily
Libraries for applying sparsification recipes to neural networks
Sparsity-aware deep learning inference runtime for CPUs
An easy-to-use LLMs quantization package with user-friendly apis
Gaussian processes in TensorFlow
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Visual Instruction Tuning: Large Language-and-Vision Assistant
AI interface for tinkerers (Ollama, Haystack RAG, Python)
A high-performance inference system for large language models
Framework which allows you transform your Vector Database
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Neural Network Compression Framework for enhanced OpenVINO
Efficient few-shot learning with Sentence Transformers
Openai style api for open large language models
A Unified Library for Parameter-Efficient Learning
Large Language Model Text Generation Inference
Private Open AI on Kubernetes
On-device AI across mobile, embedded and edge for PyTorch
Images to inference with no labeling
Data manipulation and transformation for audio signal processing