Structured outputs for llms
Accelerate local LLM inference and finetuning
A high-throughput and memory-efficient inference and serving engine
Gemma open-weight LLM library, from Google DeepMind
Uncertainty Quantification for Language Models, is a Python package
A python module to repair invalid JSON from LLMs
Scalable data pre processing and curation toolkit for LLMs
PandasAI is a Python library that integrates generative AI
Access large language models from the command-line
Synthetic data curation for post-training and data extraction
PyTorch library of curated Transformer models and their components
Open source libraries and APIs to build custom preprocessing pipelines
Accessible large language models via k-bit quantization for PyTorch
AirLLM 70B inference with single 4GB GPU
Tools for merging pretrained large language models
Easy token price estimates for 400+ LLMs. TokenOps
LLM abstractions that aren't obstructions
⚡ Building applications with LLMs through composability ⚡
The Security Toolkit for LLM Interactions
NeurIPS2025 Spotlight] Quantized Attention
Schema-Guided Reasoning (SGR) has agentic system design
A simple, performant and scalable Jax LLM
A New Axis of Sparsity for Large Language Models
DepGraph: Towards Any Structural Pruning
Advanced techniques for RAG systems