Download Latest Version lastrings-1.25.zip (102.9 MB)
Email in envelope

Get an email when there's a new version of Language-Aware String Extractor

Home / Language-Data
Name Modified Size InfoDownloads / Week
Parent folder
Old-Data 2014-08-22
README 2023-07-25 596 Bytes
LTI-LangID-rel5.txz 2023-07-25 753.8 MB
LTI-LangID-rel4.txz 2020-06-12 669.8 MB
LTI-LangID-rel3.txz 2018-02-22 461.3 MB
LTI-LangID-rel2.txz 2014-11-21 395.5 MB
LTI-LangID-rel1.txz 2014-08-22 373.4 MB
Totals: 7 Items   2.7 GB 9
The files in this directory contain the various releases of the LTI LangID Corpus.

Release 1 contains 781 "core" languages and 1091 overall, and is the
version to use if you wish to replicate the EMNLP 2014 experiments.

Release 3 contains 970 "core" languages and 1279 overall.

Release 4 contains 1152 "core" languages and 1547 overall (Note that
the 00README inside the archive accidentally omitted counting one
Wikipedia language).

Release 5 contains 1266 "core" languages and 1706 overall, and
includes scripts to download non-redistributable text for more
than 1000 additional languages.
Source: README, updated 2023-07-25