A personal AI assistant, easy to install
Unified KV Cache Compression Methods for Auto-Regressive Models
A lightweight, powerful framework for multi-agent workflows
Neural Network architecture based on ideas of the original LSTM
Unified web UI for training and running open models locally
Run a 1-billion parameter LLM on a $10 board with 256MB RAM
Demo of a customer service use case implemented with the OpenAI Agents
Open-source large language model family from Tencent Hunyuan
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Redundancy-aware KV Cache Compression for Reasoning Models
AI Agent Source Code Deep Research Report
ChatGLM2-6B: An Open Bilingual Chat LLM
Faster Whisper transcription with CTranslate2
High-performance neural network inference framework for mobile
Memory-efficient and performant finetuning of Mistral's models
157 models, 30 providers, one command to find what runs on hardware
Persistent context and multi-instance coordination
A Python library for audio
Developer friendly Natural Language Processing
ReFT: Representation Finetuning for Language Models
Agent framework and applications built upon Qwen>=3.0
QVAC Fabric: cross-platform LLM inference and fine-tuning
The repository provides code for running inference with SAM 2
Drag & drop UI to build your customized LLM flow
Self-evolving autonomous agent framework