The summarize-from-feedback repository implements the methods from the paper “Learning to Summarize from Human Feedback”. Its purpose is to train a summarization model that better aligns with human preferences by first collecting human feedback (comparisons between summaries) to train a reward model, and then fine-tuning a policy (summarizer) to maximize that learned reward. The code includes different stages: a supervised baseline (i.e. standard summarization training), the reward modeling component, and the reinforcement learning (or preference-based fine-tuning) phase. The repo also includes utilities for dataset handling, modeling architectures, inference, and evaluation. Because the codebase is experimental, parts of it may not run out-of-box depending on dependencies or environment, but it remains a canonical reference for how to implement summarization via human feedback.
Features
- Supervised baseline summarization model to initialize performance
- Reward model trained from human comparisons of summary pairs
- Preference-based fine-tuning / RL stage to optimize summarizer toward human judgments
- Dataset handling modules (loading, comparisons, splits)
- Inference and evaluation scripts to generate and score summaries
- Architecture layout files (e.g. model_layout.py) supporting modular model definitions