Structured outputs for llms
Python bindings for llama.cpp
Qwen3-omni is a natively end-to-end, omni-modal LLM
Run Local LLMs on Any Device. Open-source
Agentic, Reasoning, and Coding (ARC) foundation models
A high-throughput and memory-efficient inference and serving engine
Qwen3 is the large language model series developed by Qwen team
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
Powerful AI language model (MoE) optimized for efficiency/performance
Interact with your documents using the power of GPT
Open-source, high-performance AI model with advanced reasoning
Access large language models from the command-line
Scalable data pre processing and curation toolkit for LLMs
Operating LLMs in production
ChatGLM-6B: An Open Bilingual Dialogue Language Model
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Technical principles related to large models
Simple, Pythonic building blocks to evaluate LLM applications
lightweight package to simplify LLM API calls
Inference code for CodeLlama models
PandasAI is a Python library that integrates generative AI
Easy-to-use LLM fine-tuning framework (LLaMA-2, BLOOM, Falcon
A high-performance ML model serving framework, offers dynamic batching