Structured outputs for llms
Python bindings for llama.cpp
Qwen3-omni is a natively end-to-end, omni-modal LLM
Run Local LLMs on Any Device. Open-source
Port of Facebook's LLaMA model in C/C++
Agentic, Reasoning, and Coding (ARC) foundation models
Low-code app builder for RAG and multi-agent AI applications
A high-throughput and memory-efficient inference and serving engine
Qwen3 is the large language model series developed by Qwen team
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
Powerful AI language model (MoE) optimized for efficiency/performance
Interact with your documents using the power of GPT
Open-source, high-performance AI model with advanced reasoning
Access large language models from the command-line
Scalable data pre processing and curation toolkit for LLMs
Operating LLMs in production
ChatGLM-6B: An Open Bilingual Dialogue Language Model
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Technical principles related to large models
Simple, Pythonic building blocks to evaluate LLM applications
lightweight package to simplify LLM API calls
Inference code for CodeLlama models
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)