Structured outputs for llms
Accelerate local LLM inference and finetuning
A high-throughput and memory-efficient inference and serving engine
A python module to repair invalid JSON from LLMs
Uncertainty Quantification for Language Models, is a Python package
Scalable data pre processing and curation toolkit for LLMs
Access large language models from the command-line
PandasAI is a Python library that integrates generative AI
PyTorch library of curated Transformer models and their components
Gemma open-weight LLM library, from Google DeepMind
Synthetic data curation for post-training and data extraction
Building applications with LLMs through composability
Open source libraries and APIs to build custom preprocessing pipelines
Accessible large language models via k-bit quantization for PyTorch
AirLLM 70B inference with single 4GB GPU
Tools for merging pretrained large language models
⚡ Building applications with LLMs through composability ⚡
Easy token price estimates for 400+ LLMs. TokenOps
LLM abstractions that aren't obstructions
The Security Toolkit for LLM Interactions
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
NeurIPS2025 Spotlight] Quantized Attention
A New Axis of Sparsity for Large Language Models
Schema-Guided Reasoning (SGR) has agentic system design
A simple, performant and scalable Jax LLM