Low-latency REST API for serving text-embeddings
Lightweight inference library for ONNX files, written in C++
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Sparsity-aware deep learning inference runtime for CPUs
LLM.swift is a simple and readable library
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
Operating LLMs in production
Create HTML profiling reports from pandas DataFrame objects
Phi-3.5 for Mac: Locally-run Vision and Language Models
An MLOps framework to package, deploy, monitor and manage models
Superduper: Integrate AI models and machine learning workflows
Large Language Model Text Generation Inference
Data manipulation and transformation for audio signal processing
lightweight, standalone C++ inference engine for Google's Gemma models
Openai style api for open large language models
Fast inference engine for Transformer models
The deep learning toolkit for speech-to-text
PyTorch library of curated Transformer models and their components
State-of-the-art diffusion models for image and audio generation
A general-purpose probabilistic programming system
LLMFlows - Simple, Explicit and Transparent LLM Apps
A Pythonic framework to simplify AI service building
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
A high-performance ML model serving framework, offers dynamic batching