Real-time NVIDIA GPU dashboard
157 models, 30 providers, one command to find what runs on hardware
Performance-optimized AI inference on your GPUs
The free, Open Source alternative to OpenAI, Claude and others
AirLLM 70B inference with single 4GB GPU
High-speed Large Language Model Serving for Local Deployment
How to optimize some algorithm in cuda
Parallax is a distributed model serving framework
Run Local LLMs on Any Device. Open-source
TT-NN operator library, and TT-Metalium low level kernel programming
Run AI models locally on your machine with node.js bindings for llama
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
State-of-the-art Parameter-Efficient Fine-Tuning
A high-performance inference engine for AI models
ChatGLM-6B: An Open Bilingual Dialogue Language Model
UCCL is an efficient communication library for GPUs
950 line, minimal, extensible LLM inference engine built from scratch
WebAssembly binding for llama.cpp - Enabling on-browser LLM inference
Bringing large-language models and chat to web browsers
A simple, performant and scalable Jax LLM
Run LLMs locally on Cloud Workstations
Calculate token/s & GPU memory requirement for any LLM
Run Mixtral-8x7B models in Colab or consumer desktops
Official release of InternLM series
Database system for building simpler and faster AI-powered application