High-speed Large Language Model Serving for Local Deployment
Real-time NVIDIA GPU dashboard
AirLLM 70B inference with single 4GB GPU
Accessible large language models via k-bit quantization for PyTorch
Unified KV Cache Compression Methods for Auto-Regressive Models
Neural Network architecture based on ideas of the original LSTM
Run a 1-billion parameter LLM on a $10 board with 256MB RAM
Open-source large language model family from Tencent Hunyuan
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Redundancy-aware KV Cache Compression for Reasoning Models
ChatGLM2-6B: An Open Bilingual Chat LLM
157 models, 30 providers, one command to find what runs on hardware
Drag & drop UI to build your customized LLM flow
LLM training in simple, raw C/CUDA
Tools for merging pretrained large language models
AI Agent Development Guide, LangGraph in Action, Advanced RAG
High-performance inference framework for large language models
On the Structural Pruning of Large Language Models
Fast and efficient unstructured data extraction
DepGraph: Towards Any Structural Pruning
LangChain4j is an open-source Java library
Tensor search for humans
Capable of understanding text, audio, vision, video
Run Mixtral-8x7B models in Colab or consumer desktops
A large model training tool that supports training large models