The IRST Language Modeling Toolkit features algorithms and data structures suitable to estimate, store, and access very large n-gram language models. Our software has been integrated into a popular open source Statistical Machine Translation decoder called Moses, and is compatible with language models created with other tools, such as the SRILM Tooolkit.
- IRSTLM offers the most advanced n-gram smoothing methods to estimate large LMs and approximated smoothing methods to estimate gigantic LMs
- IRSTLM includes methods for pruning and quantization of LMs, efficiently storing LMs on disk and in memory.
- IRSTLM offers several language adaptation methods: linear interpolation, minimum discrimination information, and probabilistic latent semantic analysis.
The source code for the toolkit, as well as binaries for different architectures, can be downloaded from the IRSTLM project page. Download the latest binary release. The data for the examples described in this manual are available here. If you want to have access and compile the source code by yourself, please download the complete snapshot via SVN. This repository also contains regression tests, should you be interested in enhancing the toolkit.
Users of this toolkit might cite in their publications:
M. Federico, N. Bertoldi, M. Cettolo, IRSTLM: an Open Source Toolkit for Handling Large Scale Language Models, Proceedings of Interspeech, Brisbane, Australia, 2008.
The development of IRSTLM has been supported by the European Commission under Framework Programme 7 and 6 by the projects MateCat, MosesCore, EuroMatrixPlus, TC-STAR, META-NET, and received support from:
- Fondazione Bruno Kessler, Trento, Italy
- University of Edinburgh, UK
Open Source License
IRSTLM is licensed under the LGPL.
Consult the User's Guide for information on using the wiki software.