Port of Facebook's LLaMA model in C/C++
Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Large Language Model Text Generation Inference
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
OpenVINO™ Toolkit repository
Self-hosted, community-driven, local OpenAI compatible API
Operating LLMs in production
Phi-3.5 for Mac: Locally-run Vision and Language Models
Visual Instruction Tuning: Large Language-and-Vision Assistant
An easy-to-use LLMs quantization package with user-friendly apis
Openai style api for open large language models
Sparsity-aware deep learning inference runtime for CPUs
Replace OpenAI GPT with another LLM in your app
LLM.swift is a simple and readable library
A RWKV management and startup tool, full automation, only 8MB
State-of-the-art Parameter-Efficient Fine-Tuning
LLMs and Machine Learning done easily
Low-latency REST API for serving text-embeddings
A high-performance inference system for large language models
Open platform for training, serving, and evaluating language models
Libraries for applying sparsification recipes to neural networks
Neural Network Compression Framework for enhanced OpenVINO
Build Production-ready Agentic Workflow with Natural Language