A high-throughput and memory-efficient inference and serving engine
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
Build multimodal language agents for fast prototype and production
Integrating LLMs into structured NLP pipelines
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible
Low-latency REST API for serving text-embeddings
LLM training code for MosaicML foundation models
Tensor search for humans
From Paper to Presentation in One Click
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Analyzing Hacker News discussions from a decade ago in hindsight
A New Axis of Sparsity for Large Language Models
Scalable data pre processing and curation toolkit for LLMs
LightLLM is a Python-based LLM (Large Language Model) inference
A lightweight vLLM implementation built from scratch
Building Mixture-of-Experts from LLaMA with Continual Pre-training
Framework that is dedicated to making neural data processing
Editing large language models within 10 seconds