Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
Low-latency REST API for serving text-embeddings
Unofficial (Golang) Go bindings for the Hugging Face Inference API
State-of-the-art Parameter-Efficient Fine-Tuning
Private Open AI on Kubernetes
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
OpenAI swift async text to image for SwiftUI app using OpenAI
Powering Amazon custom machine learning chips
A graphical manager for ollama that can manage your LLMs
Run 100B+ language models at home, BitTorrent-style
Guide to deploying deep-learning inference networks
CPU/GPU inference server for Hugging Face transformer models
Deploy a ML inference service on a budget in 10 lines of code