A high-throughput and memory-efficient inference and serving engine
State of the art LLM and coding model
State-of-the-art Parameter-Efficient Fine-Tuning
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
New set of lightweight state-of-the-art, open foundation models
MobileLLM Optimizing Sub-billion Parameter Language Models
Multilingual sentence & image embeddings with BERT
GLM-5: From Vibe Coding to Agentic Engineering
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Gemma open-weight LLM library, from Google DeepMind
Zep: A long-term memory store for LLM / Chatbot applications
Qwen3-Coder is the code version of Qwen3
Operating LLMs in production
Run models like Kimi-K2.5, GLM-5, DeepSeek, gpt-oss, Gemma, Qwen etc.
MiniMax M2.1, a SOTA model for real-world dev & agents.
A series of math-specific large language models of our Qwen2 series
Unified KV Cache Compression Methods for Auto-Regressive Models
Toolkit for conversational AI
Low-code framework for building custom LLMs, neural networks
LLM Frontend for Power Users
Framework and no-code GUI for fine-tuning LLMs
Port of Facebook's LLaMA model in C/C++
Replace OpenAI GPT with another LLM in your app
Project aimed at extracting, exporting, and analyzing chat records
A Simple and Universal Swarm Intelligence Engine