Weak-to-Strong is an OpenAI research codebase that implements the concept of weak-to-strong generalization, as described in the accompanying paper. The project provides tools for training larger “strong” models using labels or guidance generated by smaller “weak” models. Its core functionality focuses on binary classification tasks, with support for fine-tuning pretrained language models and experimenting with different loss functions, including confidence-based auxiliary losses. The repository also includes a dedicated vision module for applying weak-to-strong training setups in computer vision, demonstrated with models such as AlexNet and DINO on ImageNet. Although the code is not fully production-tested, it reproduces qualitatively similar results to the experiments presented in the paper, especially when comparing large model size gaps.
Features
- Implements weak-to-strong training setups for language models
- Supports binary classification tasks with pretrained models
- Provides auxiliary loss functions such as confidence loss
- Includes a vision module for applying the method to image models
- Scripts for sweeping across different model size comparisons
- Tools for fine-tuning and training models with weak model labels