A scalable inference server for models optimized with OpenVINO
Port of Facebook's LLaMA model in C/C++
The free, Open Source alternative to OpenAI, Claude and others
Simplifies the local serving of AI models from any source
Unofficial (Golang) Go bindings for the Hugging Face Inference API
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
Bring the notion of Model-as-a-Service to life
Integrate, train and manage any AI models and APIs with your database
Superduper: Integrate AI models and machine learning workflows
AIMET is a library that provides advanced quantization and compression
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
User-friendly AI Interface
Open standard for machine learning interoperability
Phi-3.5 for Mac: Locally-run Vision and Language Models
The Triton Inference Server provides an optimized cloud
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Neural Network Compression Framework for enhanced OpenVINO
Private Open AI on Kubernetes
State-of-the-art diffusion models for image and audio generation
Official inference library for Mistral models
OpenAI swift async text to image for SwiftUI app using OpenAI
Run Local LLMs on Any Device. Open-source
OpenVINO™ Toolkit repository
Sparsity-aware deep learning inference runtime for CPUs
An innovative library for efficient LLM inference