History LLMs serves as the central information hub for a research project focused on training large language models exclusively on historical texts up to specified cutoff dates, essentially creating time-locked AI that speaks from within a particular era’s worldview. The History LLMs aim to be trained on massive curated datasets of time-stamped documents so that the resulting models can offer responses grounded only in the knowledge available before their cutoff, such as 1913, thereby avoiding hindsight contamination common in modern LLMs. This approach enables researchers in the humanities and social sciences to explore how people at different historical moments would have discussed world events, norms, and ideas without later developments influencing the model. It contains documentation about model families like Ranke-4B, which are trained from scratch with historical corpora and can act as “aggregate witnesses” to the textual culture of their era.
Features
- Time-locked language models trained on pre-cutoff historical corpora
- Curated datasets for specific historical periods
- Model families like Ranke-4B with configurable knowledge cutoffs
- Documentation hub for research and reproducibility
- Community input and academic collaboration workflows
- Sample dialogues showing era-specific reasoning